Compare commits
167 Commits
f4dba386c9
...
cli-imp
| Author | SHA1 | Date | |
|---|---|---|---|
|
|
c8d609ab78 | ||
|
|
35c828ba73 | ||
|
|
ecbfef537a | ||
|
|
47eecce8e9 | ||
|
|
b29c382583 | ||
|
|
e26816333f | ||
|
|
f772de8aef | ||
|
|
dd4d867c6e | ||
|
|
ffd074499a | ||
|
|
125938fba6 | ||
|
|
cd25cf61ca | ||
|
|
d9c9f6e541 | ||
|
|
acc5e12e3d | ||
|
|
039ab1c2a3 | ||
|
|
d63d6f0b9c | ||
|
|
3a1307dcdc | ||
|
|
6ea3108a20 | ||
|
|
81647545e7 | ||
|
|
39a832688d | ||
|
|
06229ce98b | ||
|
|
8d18552298 | ||
|
|
f3788eb687 | ||
|
|
e9af529f6e | ||
|
|
70271c14d6 | ||
|
|
d9f99ef21d | ||
|
|
f5967a8e52 | ||
|
|
2c9de1a6c3 | ||
|
|
1161edb212 | ||
|
|
5ea976583e | ||
|
|
dcfd449b72 | ||
|
|
6b75697638 | ||
|
|
dc49f5209e | ||
|
|
7d40a81512 | ||
|
|
4185abe05d | ||
|
|
d54f669c5e | ||
|
|
45126f04a6 | ||
|
|
dfa44e5bcd | ||
|
|
53ef21d653 | ||
|
|
41504b4941 | ||
|
|
d36850f181 | ||
|
|
5ce18e0ebc | ||
|
|
b168a58134 | ||
|
|
b704e33188 | ||
|
|
6e82f723c3 | ||
|
|
940a96375a | ||
|
|
7dd86d5433 | ||
|
|
429c6f07d2 | ||
|
|
754efa4369 | ||
|
|
c54a969269 | ||
|
|
95b7183add | ||
|
|
435a208c93 | ||
|
|
cc11d3e5a0 | ||
|
|
5786d7f4b6 | ||
|
|
d3306114eb | ||
|
|
e6b880cbcb | ||
|
|
121a634653 | ||
|
|
f267578aab | ||
|
|
859923f86b | ||
|
|
d701b1f977 | ||
|
|
736d9c9a80 | ||
|
|
8dc479e515 | ||
|
|
3e7fa607d3 | ||
|
|
b5f78e31a8 | ||
|
|
cf6d27435a | ||
|
|
4ce0130620 | ||
|
|
a573d695d5 | ||
|
|
a855759bf8 | ||
|
|
f3f3560e0d | ||
|
|
2bfa4f1f8c | ||
|
|
8cf14fb69b | ||
|
|
c2036c64e9 | ||
|
|
39cb0cb087 | ||
|
|
1c45725cba | ||
|
|
405e5370dc | ||
|
|
32783080f1 | ||
|
|
f1cb45a168 | ||
|
|
69df8a5603 | ||
|
|
b005edb7f2 | ||
|
|
03d9f8cce5 | ||
|
|
7eadae75f0 | ||
|
|
9b23d91378 | ||
|
|
a324fa26e1 | ||
|
|
e8845380e9 | ||
|
|
3e9cf2358e | ||
|
|
16beb35a69 | ||
|
|
3767c33c28 | ||
|
|
d1b2b5fa7d | ||
|
|
a7d5d1c99f | ||
|
|
233eb546af | ||
|
|
ddcfff1026 | ||
|
|
001e4f37b4 | ||
|
|
873d2c0ab8 | ||
|
|
42b8238329 | ||
|
|
5d1586b88e | ||
|
|
c2f34d3a4f | ||
|
|
3bb24dc6cb | ||
|
|
42a4bca6df | ||
|
|
c730b0ec54 | ||
|
|
ab43bbd2db | ||
|
|
784fe79b80 | ||
|
|
db750e4fc5 | ||
|
|
72f1cafdcf | ||
|
|
9c04b7fb1b | ||
|
|
dd2869fd98 | ||
|
|
65583ed5d6 | ||
|
|
976ad92ef0 | ||
|
|
a76dc8089e | ||
|
|
26cf13248d | ||
|
|
a2e26454dc | ||
|
|
f748570d4d | ||
|
|
0b6b168043 | ||
|
|
1d003aeac2 | ||
|
|
925ec9f574 | ||
|
|
1fdc6d03cc | ||
|
|
266ed78e73 | ||
|
|
a65ea2f56f | ||
|
|
38da7ca47b | ||
|
|
86a51cddef | ||
|
|
f6d19a9467 | ||
|
|
362503d3bf | ||
|
|
329c8f4539 | ||
|
|
ee5c5f9645 | ||
|
|
f5b4a765b7 | ||
|
|
4ee99c1677 | ||
|
|
c35f485e0e | ||
|
|
a92e176bb6 | ||
|
|
deafa88af5 | ||
|
|
880ad1d3fa | ||
|
|
4c0123426a | ||
|
|
bb75a9d228 | ||
|
|
2bcd8db0e9 | ||
|
|
a50fc78823 | ||
|
|
ff94f24702 | ||
|
|
5c521491b7 | ||
|
|
0236ef2776 | ||
|
|
12811683ca | ||
|
|
724be4d265 | ||
|
|
c34ed3007e | ||
|
|
e73d2907dc | ||
|
|
9d4755521f | ||
|
|
92ff255909 | ||
|
|
ce5cd9c95d | ||
|
|
549a0646d7 | ||
|
|
a417640faa | ||
|
|
f560e6bc00 | ||
|
|
aebbe6b795 | ||
|
|
7d07f95d4c | ||
|
|
2a52594a60 | ||
|
|
51c370fac2 | ||
|
|
7b7d781a19 | ||
|
|
03ea51513d | ||
|
|
667f70e177 | ||
|
|
585b746461 | ||
|
|
730ddef339 | ||
|
|
5508d8464a | ||
|
|
41d20f1374 | ||
|
|
9b63671df9 | ||
|
|
d235f2b4dd | ||
|
|
daf5a73019 | ||
|
|
559f0702ad | ||
|
|
d5bdb24b0f | ||
|
|
723703bed9 | ||
|
|
20edff4ab1 | ||
|
|
d31d5292f2 | ||
|
|
6e22f120d0 | ||
|
|
4270603da4 | ||
|
|
aca4773327 |
232
.beads/.br_history/issues.20260212_161438.jsonl
Normal file
232
.beads/.br_history/issues.20260212_161438.jsonl
Normal file
File diff suppressed because one or more lines are too long
File diff suppressed because one or more lines are too long
@@ -1 +1 @@
|
|||||||
bd-lcb
|
bd-xsgw
|
||||||
|
|||||||
59
.claude/agents/test-runner.md
Normal file
59
.claude/agents/test-runner.md
Normal file
@@ -0,0 +1,59 @@
|
|||||||
|
---
|
||||||
|
name: test-runner
|
||||||
|
description: "Use this agent when unit tests need to be run and results analyzed. This includes after writing or modifying code, before committing changes, or when explicitly asked to verify test status.\\n\\nExamples:\\n\\n- User: \"Please refactor the parse_session function to handle edge cases\"\\n Assistant: \"Here is the refactored function with edge case handling: ...\"\\n [code changes applied]\\n Since a significant piece of code was modified, use the Task tool to launch the test-runner agent to verify nothing is broken.\\n Assistant: \"Now let me run the test suite to make sure everything still passes.\"\\n\\n- User: \"Do all tests pass?\"\\n Assistant: \"Let me use the Task tool to launch the test-runner agent to check the current test status.\"\\n\\n- User: \"I just finished implementing the search feature\"\\n Assistant: \"Let me use the Task tool to launch the test-runner agent to validate the implementation.\"\\n\\n- After any logical chunk of code is written or modified, proactively use the Task tool to launch the test-runner agent to run the tests before reporting completion to the user."
|
||||||
|
tools: Bash
|
||||||
|
model: haiku
|
||||||
|
color: orange
|
||||||
|
---
|
||||||
|
|
||||||
|
You are an expert test execution and analysis engineer. Your sole responsibility is to run the project's unit test suite, interpret the results with precision, and deliver a clear, actionable summary.
|
||||||
|
|
||||||
|
## Execution Protocol
|
||||||
|
|
||||||
|
1. **Discover the test framework**: Examine the project structure to determine how tests are run:
|
||||||
|
- Look for `Cargo.toml` (Rust: `cargo test`)
|
||||||
|
- If unclear, check README or CLAUDE.md for test instructions
|
||||||
|
|
||||||
|
2. **Run the tests**: Execute the appropriate test command. Capture full output including stdout and stderr. Do NOT run tests interactively or with watch mode. Use flags that produce verbose or detailed output when available (e.g., `cargo test -- --nocapture`, `jest --verbose`).
|
||||||
|
|
||||||
|
3. **Analyze results**: Parse the test output carefully and categorize:
|
||||||
|
- Total tests run
|
||||||
|
- Tests passed
|
||||||
|
- Tests failed (with details)
|
||||||
|
- Tests skipped/ignored
|
||||||
|
- Compilation errors (if tests couldn't even run)
|
||||||
|
|
||||||
|
4. **Report findings**:
|
||||||
|
|
||||||
|
**If ALL tests pass:**
|
||||||
|
Provide a concise success summary:
|
||||||
|
- Total test count and pass count
|
||||||
|
- Execution time if available
|
||||||
|
- Note any skipped/ignored tests and why (if apparent)
|
||||||
|
- A clear statement: "All tests passed."
|
||||||
|
|
||||||
|
**If ANY tests fail:**
|
||||||
|
Provide a detailed failure report:
|
||||||
|
- List each failing test by its full name/path
|
||||||
|
- Include the assertion error or panic message for each failure
|
||||||
|
- Include relevant expected vs actual values
|
||||||
|
- Note the file and line number where the failure occurred (if available)
|
||||||
|
- Group failures by module/file if there are many
|
||||||
|
- Suggest likely root causes when the error messages make it apparent
|
||||||
|
- Note if failures appear related (e.g., same underlying issue)
|
||||||
|
|
||||||
|
**If tests cannot run (compilation/setup error):**
|
||||||
|
- Report the exact error preventing test execution
|
||||||
|
- Identify the file and line causing the issue
|
||||||
|
- Distinguish between test code errors and source code errors
|
||||||
|
|
||||||
|
## Rules
|
||||||
|
|
||||||
|
- NEVER modify any source code or test code. You are read-only except for running the test command.
|
||||||
|
- NEVER skip running tests and guess at results. Always execute the actual test command.
|
||||||
|
- NEVER run the full application or any destructive commands. Only run test commands.
|
||||||
|
- If the test suite is extremely large, run it fully anyway. Do not truncate or sample.
|
||||||
|
- If multiple test targets exist (unit, integration, e2e), run unit tests only unless instructed otherwise.
|
||||||
|
- Report raw numbers. Do not round or approximate test counts.
|
||||||
|
- If tests produce warnings (not failures), mention them briefly but clearly separate them from failures.
|
||||||
|
- Keep the summary structured and scannable. Use bullet points and clear headers.
|
||||||
17
.claude/hooks/on-file-write.sh
Executable file
17
.claude/hooks/on-file-write.sh
Executable file
@@ -0,0 +1,17 @@
|
|||||||
|
#!/bin/bash
|
||||||
|
# Ultimate Bug Scanner - Claude Code Hook
|
||||||
|
# Runs on every file save for UBS-supported languages (JS/TS, Python, C/C++, Rust, Go, Java, Ruby)
|
||||||
|
# Claude Code hooks receive context as JSON on stdin.
|
||||||
|
|
||||||
|
INPUT=$(cat)
|
||||||
|
FILE_PATH=$(echo "$INPUT" | jq -r '.tool_input.file_path // empty')
|
||||||
|
CWD=$(echo "$INPUT" | jq -r '.cwd // empty')
|
||||||
|
|
||||||
|
if [[ "$FILE_PATH" =~ \.(js|jsx|ts|tsx|mjs|cjs|py|pyw|pyi|c|cc|cpp|cxx|h|hh|hpp|hxx|rs|go|java|rb)$ ]]; then
|
||||||
|
echo "🔬 Running bug scanner..."
|
||||||
|
if ! command -v ubs >/dev/null 2>&1; then
|
||||||
|
echo "⚠️ 'ubs' not found in PATH; install it before using this hook." >&2
|
||||||
|
exit 0
|
||||||
|
fi
|
||||||
|
ubs "$FILE_PATH" --ci 2>&1 | head -50
|
||||||
|
fi
|
||||||
106
.claude/skills/release/SKILL.md
Normal file
106
.claude/skills/release/SKILL.md
Normal file
@@ -0,0 +1,106 @@
|
|||||||
|
---
|
||||||
|
name: release
|
||||||
|
description: Bump version, tag, and prepare for next development cycle
|
||||||
|
version: 1.0.0
|
||||||
|
author: Taylor Eernisse
|
||||||
|
category: automation
|
||||||
|
tags: ["release", "versioning", "semver", "git"]
|
||||||
|
---
|
||||||
|
# Release
|
||||||
|
|
||||||
|
Automate SemVer version bumps for the `lore` CLI.
|
||||||
|
|
||||||
|
## Invocation
|
||||||
|
|
||||||
|
```
|
||||||
|
/release <type>
|
||||||
|
```
|
||||||
|
|
||||||
|
Where `<type>` is one of:
|
||||||
|
- **major** — breaking changes (0.5.0 -> 1.0.0)
|
||||||
|
- **minor** — new features (0.5.0 -> 0.6.0)
|
||||||
|
- **patch** / **hotfix** — bug fixes (0.5.0 -> 0.5.1)
|
||||||
|
|
||||||
|
If no type is provided, ask the user.
|
||||||
|
|
||||||
|
## Procedure
|
||||||
|
|
||||||
|
Follow these steps exactly. Do NOT skip any step.
|
||||||
|
|
||||||
|
### 1. Determine bump type
|
||||||
|
|
||||||
|
Parse the argument. Accept these aliases:
|
||||||
|
- `major`, `breaking` -> MAJOR
|
||||||
|
- `minor`, `feature`, `feat` -> MINOR
|
||||||
|
- `patch`, `hotfix`, `fix` -> PATCH
|
||||||
|
|
||||||
|
If the argument doesn't match, ask the user to clarify.
|
||||||
|
|
||||||
|
### 2. Read current version
|
||||||
|
|
||||||
|
Read `Cargo.toml` and extract the `version = "X.Y.Z"` line. Parse into major, minor, patch integers.
|
||||||
|
|
||||||
|
### 3. Compute new version
|
||||||
|
|
||||||
|
- MAJOR: `(major+1).0.0`
|
||||||
|
- MINOR: `major.(minor+1).0`
|
||||||
|
- PATCH: `major.minor.(patch+1)`
|
||||||
|
|
||||||
|
### 4. Check preconditions
|
||||||
|
|
||||||
|
Run `git status` and `git log --oneline -5`. Show the user:
|
||||||
|
- Current version: X.Y.Z
|
||||||
|
- New version: A.B.C
|
||||||
|
- Bump type: major/minor/patch
|
||||||
|
- Working tree status (clean or dirty)
|
||||||
|
- Last 5 commits (so they can confirm scope)
|
||||||
|
|
||||||
|
If the working tree is dirty, warn: "You have uncommitted changes. They will NOT be included in the release tag. Continue?"
|
||||||
|
|
||||||
|
Ask the user to confirm before proceeding.
|
||||||
|
|
||||||
|
### 5. Update Cargo.toml
|
||||||
|
|
||||||
|
Edit the `version = "..."` line in Cargo.toml to the new version.
|
||||||
|
|
||||||
|
### 6. Update Cargo.lock
|
||||||
|
|
||||||
|
Run `cargo check` to update Cargo.lock with the new version. This also verifies the project compiles.
|
||||||
|
|
||||||
|
### 7. Commit the version bump
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git add Cargo.toml Cargo.lock
|
||||||
|
git commit -m "release: v{NEW_VERSION}"
|
||||||
|
```
|
||||||
|
|
||||||
|
### 8. Tag the release
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git tag v{NEW_VERSION}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 9. Report
|
||||||
|
|
||||||
|
Print a summary:
|
||||||
|
```
|
||||||
|
Release v{NEW_VERSION} created.
|
||||||
|
|
||||||
|
Previous: v{OLD_VERSION}
|
||||||
|
Bump: {type}
|
||||||
|
Tag: v{NEW_VERSION}
|
||||||
|
Commit: {short hash}
|
||||||
|
|
||||||
|
To push: git push && git push --tags
|
||||||
|
```
|
||||||
|
|
||||||
|
Do NOT push automatically. The user decides when to push.
|
||||||
|
|
||||||
|
## Examples
|
||||||
|
|
||||||
|
```
|
||||||
|
/release minor -> 0.5.0 -> 0.6.0
|
||||||
|
/release hotfix -> 0.5.0 -> 0.5.1
|
||||||
|
/release patch -> 0.5.0 -> 0.5.1
|
||||||
|
/release major -> 0.5.0 -> 1.0.0
|
||||||
|
```
|
||||||
50
.cline/rules
Normal file
50
.cline/rules
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
|
||||||
|
````markdown
|
||||||
|
## UBS Quick Reference for AI Agents
|
||||||
|
|
||||||
|
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
|
||||||
|
|
||||||
|
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
**Commands:**
|
||||||
|
```bash
|
||||||
|
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(git diff --name-only --cached) # Staged files — before commit
|
||||||
|
ubs --only=js,python src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs --help # Full command reference
|
||||||
|
ubs sessions --entries 1 # Tail the latest install session log
|
||||||
|
ubs . # Whole project (ignores things like .venv and node_modules automatically)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.ts:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
**Fix Workflow:**
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
|
||||||
|
|
||||||
|
**Bug Severity:**
|
||||||
|
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
|
||||||
|
- **Important** (production): Type narrowing, division-by-zero, resource leaks
|
||||||
|
- **Contextual** (judgment): TODO/FIXME, console logs
|
||||||
|
|
||||||
|
**Anti-Patterns:**
|
||||||
|
- ❌ Ignore findings → ✅ Investigate each
|
||||||
|
- ❌ Full scan per edit → ✅ Scope to file
|
||||||
|
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
|
||||||
|
````
|
||||||
50
.codex/rules/ubs.md
Normal file
50
.codex/rules/ubs.md
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
|
||||||
|
````markdown
|
||||||
|
## UBS Quick Reference for AI Agents
|
||||||
|
|
||||||
|
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
|
||||||
|
|
||||||
|
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
**Commands:**
|
||||||
|
```bash
|
||||||
|
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(git diff --name-only --cached) # Staged files — before commit
|
||||||
|
ubs --only=js,python src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs --help # Full command reference
|
||||||
|
ubs sessions --entries 1 # Tail the latest install session log
|
||||||
|
ubs . # Whole project (ignores things like .venv and node_modules automatically)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.ts:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
**Fix Workflow:**
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
|
||||||
|
|
||||||
|
**Bug Severity:**
|
||||||
|
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
|
||||||
|
- **Important** (production): Type narrowing, division-by-zero, resource leaks
|
||||||
|
- **Contextual** (judgment): TODO/FIXME, console logs
|
||||||
|
|
||||||
|
**Anti-Patterns:**
|
||||||
|
- ❌ Ignore findings → ✅ Investigate each
|
||||||
|
- ❌ Full scan per edit → ✅ Scope to file
|
||||||
|
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
|
||||||
|
````
|
||||||
16
.continue/config.json
Normal file
16
.continue/config.json
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
{
|
||||||
|
"customCommands": [
|
||||||
|
{
|
||||||
|
"name": "scan-bugs",
|
||||||
|
"description": "Run Ultimate Bug Scanner on current project",
|
||||||
|
"prompt": "Run 'ubs --fail-on-warning .' and fix any critical issues found before proceeding"
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"slashCommands": [
|
||||||
|
{
|
||||||
|
"name": "quality",
|
||||||
|
"description": "Check code quality with UBS",
|
||||||
|
"run": "ubs ."
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
50
.cursor/rules
Normal file
50
.cursor/rules
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
|
||||||
|
````markdown
|
||||||
|
## UBS Quick Reference for AI Agents
|
||||||
|
|
||||||
|
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
|
||||||
|
|
||||||
|
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
**Commands:**
|
||||||
|
```bash
|
||||||
|
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(git diff --name-only --cached) # Staged files — before commit
|
||||||
|
ubs --only=js,python src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs --help # Full command reference
|
||||||
|
ubs sessions --entries 1 # Tail the latest install session log
|
||||||
|
ubs . # Whole project (ignores things like .venv and node_modules automatically)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.ts:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
**Fix Workflow:**
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
|
||||||
|
|
||||||
|
**Bug Severity:**
|
||||||
|
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
|
||||||
|
- **Important** (production): Type narrowing, division-by-zero, resource leaks
|
||||||
|
- **Contextual** (judgment): TODO/FIXME, console logs
|
||||||
|
|
||||||
|
**Anti-Patterns:**
|
||||||
|
- ❌ Ignore findings → ✅ Investigate each
|
||||||
|
- ❌ Full scan per edit → ✅ Scope to file
|
||||||
|
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
|
||||||
|
````
|
||||||
50
.gemini/rules
Normal file
50
.gemini/rules
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
|
||||||
|
````markdown
|
||||||
|
## UBS Quick Reference for AI Agents
|
||||||
|
|
||||||
|
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
|
||||||
|
|
||||||
|
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
**Commands:**
|
||||||
|
```bash
|
||||||
|
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(git diff --name-only --cached) # Staged files — before commit
|
||||||
|
ubs --only=js,python src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs --help # Full command reference
|
||||||
|
ubs sessions --entries 1 # Tail the latest install session log
|
||||||
|
ubs . # Whole project (ignores things like .venv and node_modules automatically)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.ts:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
**Fix Workflow:**
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
|
||||||
|
|
||||||
|
**Bug Severity:**
|
||||||
|
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
|
||||||
|
- **Important** (production): Type narrowing, division-by-zero, resource leaks
|
||||||
|
- **Contextual** (judgment): TODO/FIXME, console logs
|
||||||
|
|
||||||
|
**Anti-Patterns:**
|
||||||
|
- ❌ Ignore findings → ✅ Investigate each
|
||||||
|
- ❌ Full scan per edit → ✅ Scope to file
|
||||||
|
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
|
||||||
|
````
|
||||||
50
.opencode/rules
Normal file
50
.opencode/rules
Normal file
@@ -0,0 +1,50 @@
|
|||||||
|
|
||||||
|
````markdown
|
||||||
|
## UBS Quick Reference for AI Agents
|
||||||
|
|
||||||
|
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
|
||||||
|
|
||||||
|
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
**Commands:**
|
||||||
|
```bash
|
||||||
|
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(git diff --name-only --cached) # Staged files — before commit
|
||||||
|
ubs --only=js,python src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs --help # Full command reference
|
||||||
|
ubs sessions --entries 1 # Tail the latest install session log
|
||||||
|
ubs . # Whole project (ignores things like .venv and node_modules automatically)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.ts:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
**Fix Workflow:**
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
|
||||||
|
|
||||||
|
**Bug Severity:**
|
||||||
|
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
|
||||||
|
- **Important** (production): Type narrowing, division-by-zero, resource leaks
|
||||||
|
- **Contextual** (judgment): TODO/FIXME, console logs
|
||||||
|
|
||||||
|
**Anti-Patterns:**
|
||||||
|
- ❌ Ignore findings → ✅ Investigate each
|
||||||
|
- ❌ Full scan per edit → ✅ Scope to file
|
||||||
|
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
|
||||||
|
````
|
||||||
790
AGENTS.md
790
AGENTS.md
@@ -1,6 +1,615 @@
|
|||||||
# AGENTS.md
|
# AGENTS.md
|
||||||
|
|
||||||
This file provides guidance to Claude Code (claude.ai/code) when working with code in this repository.
|
## RULE 0 - THE FUNDAMENTAL OVERRIDE PEROGATIVE
|
||||||
|
|
||||||
|
If I tell you to do something, even if it goes against what follows below, YOU MUST LISTEN TO ME. I AM IN CHARGE, NOT YOU.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## RULE NUMBER 1: NO FILE DELETION
|
||||||
|
|
||||||
|
**YOU ARE NEVER ALLOWED TO DELETE A FILE WITHOUT EXPRESS PERMISSION.** Even a new file that you yourself created, such as a test code file. You have a horrible track record of deleting critically important files or otherwise throwing away tons of expensive work. As a result, you have permanently lost any and all rights to determine that a file or folder should be deleted.
|
||||||
|
|
||||||
|
**YOU MUST ALWAYS ASK AND RECEIVE CLEAR, WRITTEN PERMISSION BEFORE EVER DELETING A FILE OR FOLDER OF ANY KIND.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Version Control: jj-First (CRITICAL)
|
||||||
|
|
||||||
|
**ALWAYS prefer jj (Jujutsu) over git for VCS mutations** (commit, describe, rebase, push, bookmark, undo). This is a colocated repo with both `.jj/` and `.git/`. Only fall back to raw `git` for things jj cannot do (hooks, LFS, submodules, `gh` CLI interop).
|
||||||
|
|
||||||
|
**Exception — read-only inspection:** Use `git status`, `git diff`, `git log` instead of their jj equivalents. In a colocated repo these see accurate data, and unlike jj, they don't create operations that cause divergences when multiple agents run concurrently. See "Parallel Agent VCS Protocol" below.
|
||||||
|
|
||||||
|
See `~/.claude/rules/jj-vcs/` for the full command reference, translation table, revsets, patterns, and recovery recipes.
|
||||||
|
|
||||||
|
### Parallel Agent VCS Protocol (CRITICAL)
|
||||||
|
|
||||||
|
Multiple agents often run concurrently in separate terminal panes, sharing the same repo directory. This requires care because jj's auto-snapshot creates operations on EVERY command — even read-only ones like `jj status`. Concurrent jj commands fork from the same parent operation and create **divergent changes**.
|
||||||
|
|
||||||
|
**The rule: use git for reads, jj for writes.**
|
||||||
|
|
||||||
|
In a colocated repo, git reads see accurate data because jj keeps `.git/` in sync.
|
||||||
|
|
||||||
|
| Operation | Use | Why |
|
||||||
|
|-----------|-----|-----|
|
||||||
|
| Check status | `git status` | No jj operation created |
|
||||||
|
| View diff | `git diff` | No jj operation created |
|
||||||
|
| Browse history | `git log` | No jj operation created |
|
||||||
|
| Commit work | `jj commit -m "msg"` | jj mutation (better UX) |
|
||||||
|
| Update description | `jj describe -m "msg"` | jj mutation |
|
||||||
|
| Rebase | `jj rebase -d trunk()` | jj mutation |
|
||||||
|
| Push | `jj git push -b <name>` | jj mutation |
|
||||||
|
| Manage bookmarks | `jj bookmark set ...` | jj mutation |
|
||||||
|
| Undo a mistake | `jj undo` | jj mutation |
|
||||||
|
|
||||||
|
**NEVER run `jj status`, `jj diff`, `jj log`, or `jj show` when other agents may be active** — these trigger snapshots that cause divergences.
|
||||||
|
|
||||||
|
**If using Claude Code's built-in agent teams:** Only the team lead runs ANY VCS commands (git or jj). Workers only edit files via Edit/Write tools and do NOT run "Landing the Plane".
|
||||||
|
|
||||||
|
**Resolving divergences if they occur:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
jj log -r 'divergent()' # Find divergent changes
|
||||||
|
jj abandon <unwanted-commit-id> # Keep the version you want
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Irreversible Git & Filesystem Actions — DO NOT EVER BREAK GLASS
|
||||||
|
|
||||||
|
> **Note:** Treat destructive commands as break-glass. If there's any doubt, stop and ask.
|
||||||
|
|
||||||
|
1. **Absolutely forbidden commands:** `git reset --hard`, `git clean -fd`, `rm -rf`, or any command that can delete or overwrite code/data must never be run unless the user explicitly provides the exact command and states, in the same message, that they understand and want the irreversible consequences.
|
||||||
|
2. **No guessing:** If there is any uncertainty about what a command might delete or overwrite, stop immediately and ask the user for specific approval. "I think it's safe" is never acceptable.
|
||||||
|
3. **Safer alternatives first:** When cleanup or rollbacks are needed, request permission to use non-destructive options (`git status`, `git diff`, `git stash`, copying to backups) before ever considering a destructive command.
|
||||||
|
4. **Mandatory explicit plan:** Even after explicit user authorization, restate the command verbatim, list exactly what will be affected, and wait for a confirmation that your understanding is correct. Only then may you execute it—if anything remains ambiguous, refuse and escalate.
|
||||||
|
5. **Document the confirmation:** When running any approved destructive command, record (in the session notes / final response) the exact user text that authorized it, the command actually run, and the execution time. If that record is absent, the operation did not happen.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Toolchain: Rust & Cargo
|
||||||
|
|
||||||
|
We only use **Cargo** in this project, NEVER any other package manager.
|
||||||
|
|
||||||
|
- **Edition/toolchain:** Follow `rust-toolchain.toml` (if present). Do not assume stable vs nightly.
|
||||||
|
- **Dependencies:** Explicit versions for stability; keep the set minimal.
|
||||||
|
- **Configuration:** Cargo.toml only
|
||||||
|
- **Unsafe code:** Forbidden (`#![forbid(unsafe_code)]`)
|
||||||
|
|
||||||
|
When writing Rust code, reference RUST_CLI_TOOLS_BEST_PRACTICES.md
|
||||||
|
|
||||||
|
### Release Profile
|
||||||
|
|
||||||
|
Use the release profile defined in `Cargo.toml`. If you need to change it, justify the
|
||||||
|
performance/size tradeoff and how it impacts determinism and cancellation behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code Editing Discipline
|
||||||
|
|
||||||
|
### No Script-Based Changes
|
||||||
|
|
||||||
|
**NEVER** run a script that processes/changes code files in this repo. Brittle regex-based transformations create far more problems than they solve.
|
||||||
|
|
||||||
|
- **Always make code changes manually**, even when there are many instances
|
||||||
|
- For many simple changes: use parallel subagents
|
||||||
|
- For subtle/complex changes: do them methodically yourself
|
||||||
|
|
||||||
|
### No File Proliferation
|
||||||
|
|
||||||
|
If you want to change something or add a feature, **revise existing code files in place**.
|
||||||
|
|
||||||
|
**NEVER** create variations like:
|
||||||
|
- `mainV2.rs`
|
||||||
|
- `main_improved.rs`
|
||||||
|
- `main_enhanced.rs`
|
||||||
|
|
||||||
|
New files are reserved for **genuinely new functionality** that makes zero sense to include in any existing file. The bar for creating new files is **incredibly high**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backwards Compatibility
|
||||||
|
|
||||||
|
We do not care about backwards compatibility—we're in early development with no users. We want to do things the **RIGHT** way with **NO TECH DEBT**.
|
||||||
|
|
||||||
|
- Never create "compatibility shims"
|
||||||
|
- Never create wrapper functions for deprecated APIs
|
||||||
|
- Just fix the code directly
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Compiler Checks (CRITICAL)
|
||||||
|
|
||||||
|
**After any substantive code changes, you MUST verify no errors were introduced:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check for compiler errors and warnings
|
||||||
|
cargo check --all-targets
|
||||||
|
|
||||||
|
# Check for clippy lints (pedantic + nursery are enabled)
|
||||||
|
cargo clippy --all-targets -- -D warnings
|
||||||
|
|
||||||
|
# Verify formatting
|
||||||
|
cargo fmt --check
|
||||||
|
```
|
||||||
|
|
||||||
|
If you see errors, **carefully understand and resolve each issue**. Read sufficient context to fix them the RIGHT way.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Unit & Property Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Run with output
|
||||||
|
cargo test -- --nocapture
|
||||||
|
```
|
||||||
|
|
||||||
|
When adding or changing primitives, add tests that assert the core invariants:
|
||||||
|
|
||||||
|
- no task leaks
|
||||||
|
- no obligation leaks
|
||||||
|
- losers are drained after races
|
||||||
|
- region close implies quiescence
|
||||||
|
|
||||||
|
Prefer deterministic lab-runtime tests for concurrency-sensitive behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MCP Agent Mail — Multi-Agent Coordination
|
||||||
|
|
||||||
|
A mail-like layer that lets coding agents coordinate asynchronously via MCP tools and resources. Provides identities, inbox/outbox, searchable threads, and advisory file reservations with human-auditable artifacts in Git.
|
||||||
|
|
||||||
|
### Why It's Useful
|
||||||
|
|
||||||
|
- **Prevents conflicts:** Explicit file reservations (leases) for files/globs
|
||||||
|
- **Token-efficient:** Messages stored in per-project archive, not in context
|
||||||
|
- **Quick reads:** `resource://inbox/...`, `resource://thread/...`
|
||||||
|
|
||||||
|
### Same Repository Workflow
|
||||||
|
|
||||||
|
1. **Register identity:**
|
||||||
|
```
|
||||||
|
ensure_project(project_key=<abs-path>)
|
||||||
|
register_agent(project_key, program, model)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Reserve files before editing:**
|
||||||
|
```
|
||||||
|
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Communicate with threads:**
|
||||||
|
```
|
||||||
|
send_message(..., thread_id="FEAT-123")
|
||||||
|
fetch_inbox(project_key, agent_name)
|
||||||
|
acknowledge_message(project_key, agent_name, message_id)
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Quick reads:**
|
||||||
|
```
|
||||||
|
resource://inbox/{Agent}?project=<abs-path>&limit=20
|
||||||
|
resource://thread/{id}?project=<abs-path>&include_bodies=true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Macros vs Granular Tools
|
||||||
|
|
||||||
|
- **Prefer macros for speed:** `macro_start_session`, `macro_prepare_thread`, `macro_file_reservation_cycle`, `macro_contact_handshake`
|
||||||
|
- **Use granular tools for control:** `register_agent`, `file_reservation_paths`, `send_message`, `fetch_inbox`, `acknowledge_message`
|
||||||
|
|
||||||
|
### Common Pitfalls
|
||||||
|
|
||||||
|
- `"from_agent not registered"`: Always `register_agent` in the correct `project_key` first
|
||||||
|
- `"FILE_RESERVATION_CONFLICT"`: Adjust patterns, wait for expiry, or use non-exclusive reservation
|
||||||
|
- **Auth errors:** If JWT+JWKS enabled, include bearer token with matching `kid`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Beads (br) — Dependency-Aware Issue Tracking
|
||||||
|
|
||||||
|
Beads provides a lightweight, dependency-aware issue database and CLI (`br` / beads_rust) for selecting "ready work," setting priorities, and tracking status. It complements MCP Agent Mail's messaging and file reservations.
|
||||||
|
|
||||||
|
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
|
||||||
|
|
||||||
|
### Conventions
|
||||||
|
|
||||||
|
- **Single source of truth:** Beads for task status/priority/dependencies; Agent Mail for conversation and audit
|
||||||
|
- **Shared identifiers:** Use Beads issue ID (e.g., `br-123`) as Mail `thread_id` and prefix subjects with `[br-123]`
|
||||||
|
- **Reservations:** When starting a task, call `file_reservation_paths()` with the issue ID in `reason`
|
||||||
|
|
||||||
|
### Typical Agent Flow
|
||||||
|
|
||||||
|
1. **Pick ready work (Beads):**
|
||||||
|
```bash
|
||||||
|
br ready --json # Choose highest priority, no blockers
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Reserve edit surface (Mail):**
|
||||||
|
```
|
||||||
|
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true, reason="br-123")
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Announce start (Mail):**
|
||||||
|
```
|
||||||
|
send_message(..., thread_id="br-123", subject="[br-123] Start: <title>", ack_required=true)
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Work and update:** Reply in-thread with progress
|
||||||
|
|
||||||
|
5. **Complete and release:**
|
||||||
|
```bash
|
||||||
|
br close br-123 --reason "Completed"
|
||||||
|
```
|
||||||
|
```
|
||||||
|
release_file_reservations(project_key, agent_name, paths=["src/**"])
|
||||||
|
```
|
||||||
|
Final Mail reply: `[br-123] Completed` with summary
|
||||||
|
|
||||||
|
### Mapping Cheat Sheet
|
||||||
|
|
||||||
|
| Concept | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| Mail `thread_id` | `br-###` |
|
||||||
|
| Mail subject | `[br-###] ...` |
|
||||||
|
| File reservation `reason` | `br-###` |
|
||||||
|
| Commit messages | Include `br-###` for traceability |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## bv — Graph-Aware Triage Engine
|
||||||
|
|
||||||
|
bv is a graph-aware triage engine for Beads projects (`.beads/beads.jsonl`). It computes PageRank, betweenness, critical path, cycles, HITS, eigenvector, and k-core metrics deterministically.
|
||||||
|
|
||||||
|
**Scope boundary:** bv handles *what to work on* (triage, priority, planning). For agent-to-agent coordination (messaging, work claiming, file reservations), use MCP Agent Mail.
|
||||||
|
|
||||||
|
**CRITICAL: Use ONLY `--robot-*` flags. Bare `bv` launches an interactive TUI that blocks your session.**
|
||||||
|
|
||||||
|
### The Workflow: Start With Triage
|
||||||
|
|
||||||
|
**`bv --robot-triage` is your single entry point.** It returns:
|
||||||
|
- `quick_ref`: at-a-glance counts + top 3 picks
|
||||||
|
- `recommendations`: ranked actionable items with scores, reasons, unblock info
|
||||||
|
- `quick_wins`: low-effort high-impact items
|
||||||
|
- `blockers_to_clear`: items that unblock the most downstream work
|
||||||
|
- `project_health`: status/type/priority distributions, graph metrics
|
||||||
|
- `commands`: copy-paste shell commands for next steps
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bv --robot-triage # THE MEGA-COMMAND: start here
|
||||||
|
bv --robot-next # Minimal: just the single top pick + claim command
|
||||||
|
```
|
||||||
|
|
||||||
|
### Command Reference
|
||||||
|
|
||||||
|
**Planning:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-plan` | Parallel execution tracks with `unblocks` lists |
|
||||||
|
| `--robot-priority` | Priority misalignment detection with confidence |
|
||||||
|
|
||||||
|
**Graph Analysis:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-insights` | Full metrics: PageRank, betweenness, HITS, eigenvector, critical path, cycles, k-core, articulation points, slack |
|
||||||
|
| `--robot-label-health` | Per-label health: `health_level`, `velocity_score`, `staleness`, `blocked_count` |
|
||||||
|
| `--robot-label-flow` | Cross-label dependency: `flow_matrix`, `dependencies`, `bottleneck_labels` |
|
||||||
|
| `--robot-label-attention [--attention-limit=N]` | Attention-ranked labels |
|
||||||
|
|
||||||
|
**History & Change Tracking:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-history` | Bead-to-commit correlations |
|
||||||
|
| `--robot-diff --diff-since <ref>` | Changes since ref: new/closed/modified issues, cycles |
|
||||||
|
|
||||||
|
**Other:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-burndown <sprint>` | Sprint burndown, scope changes, at-risk items |
|
||||||
|
| `--robot-forecast <id\|all>` | ETA predictions with dependency-aware scheduling |
|
||||||
|
| `--robot-alerts` | Stale issues, blocking cascades, priority mismatches |
|
||||||
|
| `--robot-suggest` | Hygiene: duplicates, missing deps, label suggestions |
|
||||||
|
| `--robot-graph [--graph-format=json\|dot\|mermaid]` | Dependency graph export |
|
||||||
|
| `--export-graph <file.html>` | Interactive HTML visualization |
|
||||||
|
|
||||||
|
### Scoping & Filtering
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bv --robot-plan --label backend # Scope to label's subgraph
|
||||||
|
bv --robot-insights --as-of HEAD~30 # Historical point-in-time
|
||||||
|
bv --recipe actionable --robot-plan # Pre-filter: ready to work
|
||||||
|
bv --recipe high-impact --robot-triage # Pre-filter: top PageRank
|
||||||
|
bv --robot-triage --robot-triage-by-track # Group by parallel work streams
|
||||||
|
bv --robot-triage --robot-triage-by-label # Group by domain
|
||||||
|
```
|
||||||
|
|
||||||
|
### Understanding Robot Output
|
||||||
|
|
||||||
|
**All robot JSON includes:**
|
||||||
|
- `data_hash` — Fingerprint of source beads.jsonl
|
||||||
|
- `status` — Per-metric state: `computed|approx|timeout|skipped` + elapsed ms
|
||||||
|
- `as_of` / `as_of_commit` — Present when using `--as-of`
|
||||||
|
|
||||||
|
**Two-phase analysis:**
|
||||||
|
- **Phase 1 (instant):** degree, topo sort, density
|
||||||
|
- **Phase 2 (async, 500ms timeout):** PageRank, betweenness, HITS, eigenvector, cycles
|
||||||
|
|
||||||
|
### jq Quick Reference
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bv --robot-triage | jq '.quick_ref' # At-a-glance summary
|
||||||
|
bv --robot-triage | jq '.recommendations[0]' # Top recommendation
|
||||||
|
bv --robot-plan | jq '.plan.summary.highest_impact' # Best unblock target
|
||||||
|
bv --robot-insights | jq '.status' # Check metric readiness
|
||||||
|
bv --robot-insights | jq '.Cycles' # Circular deps (must fix!)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## UBS — Ultimate Bug Scanner
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
### Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ubs file.rs file2.rs # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(jj diff --name-only) # Changed files — before commit
|
||||||
|
ubs --only=rust,toml src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs . # Whole project (ignores target/, Cargo.lock)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.rs:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
### Fix Workflow
|
||||||
|
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
### Bug Severity
|
||||||
|
|
||||||
|
- **Critical (always fix):** Memory safety, use-after-free, data races, SQL injection
|
||||||
|
- **Important (production):** Unwrap panics, resource leaks, overflow checks
|
||||||
|
- **Contextual (judgment):** TODO/FIXME, println! debugging
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ast-grep vs ripgrep
|
||||||
|
|
||||||
|
**Use `ast-grep` when structure matters.** It parses code and matches AST nodes, ignoring comments/strings, and can **safely rewrite** code.
|
||||||
|
|
||||||
|
- Refactors/codemods: rename APIs, change import forms
|
||||||
|
- Policy checks: enforce patterns across a repo
|
||||||
|
- Editor/automation: LSP mode, `--json` output
|
||||||
|
|
||||||
|
**Use `ripgrep` when text is enough.** Fastest way to grep literals/regex.
|
||||||
|
|
||||||
|
- Recon: find strings, TODOs, log lines, config values
|
||||||
|
- Pre-filter: narrow candidate files before ast-grep
|
||||||
|
|
||||||
|
### Rule of Thumb
|
||||||
|
|
||||||
|
- Need correctness or **applying changes** → `ast-grep`
|
||||||
|
- Need raw speed or **hunting text** → `rg`
|
||||||
|
- Often combine: `rg` to shortlist files, then `ast-grep` to match/modify
|
||||||
|
|
||||||
|
### Rust Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find structured code (ignores comments)
|
||||||
|
ast-grep run -l Rust -p 'fn $NAME($$$ARGS) -> $RET { $$$BODY }'
|
||||||
|
|
||||||
|
# Find all unwrap() calls
|
||||||
|
ast-grep run -l Rust -p '$EXPR.unwrap()'
|
||||||
|
|
||||||
|
# Quick textual hunt
|
||||||
|
rg -n 'println!' -t rust
|
||||||
|
|
||||||
|
# Combine speed + precision
|
||||||
|
rg -l -t rust 'unwrap\(' | xargs ast-grep run -l Rust -p '$X.unwrap()' --json
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Morph Warp Grep — AI-Powered Code Search
|
||||||
|
|
||||||
|
**Use `mcp__morph-mcp__warp_grep` for exploratory "how does X work?" questions.** An AI agent expands your query, greps the codebase, reads relevant files, and returns precise line ranges with full context.
|
||||||
|
|
||||||
|
**Use `ripgrep` for targeted searches.** When you know exactly what you're looking for.
|
||||||
|
|
||||||
|
**Use `ast-grep` for structural patterns.** When you need AST precision for matching/rewriting.
|
||||||
|
|
||||||
|
### When to Use What
|
||||||
|
|
||||||
|
| Scenario | Tool | Why |
|
||||||
|
|----------|------|-----|
|
||||||
|
| "How is pattern matching implemented?" | `warp_grep` | Exploratory; don't know where to start |
|
||||||
|
| "Where is the quick reject filter?" | `warp_grep` | Need to understand architecture |
|
||||||
|
| "Find all uses of `Regex::new`" | `ripgrep` | Targeted literal search |
|
||||||
|
| "Find files with `println!`" | `ripgrep` | Simple pattern |
|
||||||
|
| "Replace all `unwrap()` with `expect()`" | `ast-grep` | Structural refactor |
|
||||||
|
|
||||||
|
### warp_grep Usage
|
||||||
|
|
||||||
|
```
|
||||||
|
mcp__morph-mcp__warp_grep(
|
||||||
|
repoPath: "/path/to/dcg",
|
||||||
|
query: "How does the safe pattern whitelist work?"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns structured results with file paths, line ranges, and extracted code snippets.
|
||||||
|
|
||||||
|
### Anti-Patterns
|
||||||
|
|
||||||
|
- **Don't** use `warp_grep` to find a specific function name → use `ripgrep`
|
||||||
|
- **Don't** use `ripgrep` to understand "how does X work" → wastes time with manual reads
|
||||||
|
- **Don't** use `ripgrep` for codemods → risks collateral edits
|
||||||
|
|
||||||
|
<!-- bv-agent-instructions-v1 -->
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Beads Workflow Integration
|
||||||
|
|
||||||
|
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in version control.
|
||||||
|
|
||||||
|
**Note:** `br` is non-invasive—it never executes VCS commands directly. You must commit manually after `br sync --flush-only`.
|
||||||
|
|
||||||
|
### Essential Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View issues (launches TUI - avoid in automated sessions)
|
||||||
|
bv
|
||||||
|
|
||||||
|
# CLI commands for agents (use these instead)
|
||||||
|
br ready # Show issues ready to work (no blockers)
|
||||||
|
br list --status=open # All open issues
|
||||||
|
br show <id> # Full issue details with dependencies
|
||||||
|
br create --title="..." --type=task --priority=2
|
||||||
|
br update <id> --status=in_progress
|
||||||
|
br close <id> --reason="Completed"
|
||||||
|
br close <id1> <id2> # Close multiple issues at once
|
||||||
|
br sync --flush-only # Export to JSONL (then: jj commit -m "Update beads")
|
||||||
|
```
|
||||||
|
|
||||||
|
### Workflow Pattern
|
||||||
|
|
||||||
|
1. **Start**: Run `br ready` to find actionable work
|
||||||
|
2. **Claim**: Use `br update <id> --status=in_progress`
|
||||||
|
3. **Work**: Implement the task
|
||||||
|
4. **Complete**: Use `br close <id>`
|
||||||
|
5. **Sync**: Run `br sync --flush-only`, then `git add .beads/ && git commit -m "Update beads"`
|
||||||
|
|
||||||
|
### Key Concepts
|
||||||
|
|
||||||
|
- **Dependencies**: Issues can block other issues. `br ready` shows only unblocked work.
|
||||||
|
- **Priority**: P0=critical, P1=high, P2=medium, P3=low, P4=backlog (use numbers, not words)
|
||||||
|
- **Types**: task, bug, feature, epic, question, docs
|
||||||
|
- **Blocking**: `br dep add <issue> <depends-on>` to add dependencies
|
||||||
|
|
||||||
|
### Session Protocol
|
||||||
|
|
||||||
|
**Before ending any session, run this checklist (solo/lead only — workers skip VCS):**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
jj status # Check what changed
|
||||||
|
br sync --flush-only # Export beads to JSONL
|
||||||
|
jj commit -m "..." # Commit code and beads (jj auto-tracks all changes)
|
||||||
|
jj bookmark set <name> -r @- # Point bookmark at committed work
|
||||||
|
jj git push -b <name> # Push to remote
|
||||||
|
```
|
||||||
|
|
||||||
|
### Best Practices
|
||||||
|
|
||||||
|
- Check `br ready` at session start to find available work
|
||||||
|
- Update status as you work (in_progress → closed)
|
||||||
|
- Create new issues with `br create` when you discover tasks
|
||||||
|
- Use descriptive titles and set appropriate priority/type
|
||||||
|
- Always run `br sync --flush-only` then commit before ending session (jj auto-tracks .beads/)
|
||||||
|
|
||||||
|
<!-- end-bv-agent-instructions -->
|
||||||
|
|
||||||
|
## Landing the Plane (Session Completion)
|
||||||
|
|
||||||
|
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until push succeeds.
|
||||||
|
|
||||||
|
**WHO RUNS THIS:** Solo agents run it themselves. In multi-agent sessions, ONLY the team lead runs this. Workers skip VCS entirely.
|
||||||
|
|
||||||
|
**MANDATORY WORKFLOW:**
|
||||||
|
|
||||||
|
1. **File issues for remaining work** - Create issues for anything that needs follow-up
|
||||||
|
2. **Run quality gates** (if code changed) - Tests, linters, builds
|
||||||
|
3. **Update issue status** - Close finished work, update in-progress items
|
||||||
|
4. **PUSH TO REMOTE** - This is MANDATORY:
|
||||||
|
```bash
|
||||||
|
jj git fetch # Get latest remote state
|
||||||
|
jj rebase -d trunk() # Rebase onto latest trunk if needed
|
||||||
|
br sync --flush-only # Export beads to JSONL
|
||||||
|
jj commit -m "Update beads" # Commit (jj auto-tracks .beads/ changes)
|
||||||
|
jj bookmark set <name> -r @- # Point bookmark at committed work
|
||||||
|
jj git push -b <name> # Push to remote
|
||||||
|
jj log -r '<name>' # Verify bookmark position
|
||||||
|
```
|
||||||
|
5. **Clean up** - Abandon empty orphan changes if any (`jj abandon <rev>`)
|
||||||
|
6. **Verify** - All changes committed AND pushed
|
||||||
|
7. **Hand off** - Provide context for next session
|
||||||
|
|
||||||
|
**CRITICAL RULES:**
|
||||||
|
- Work is NOT complete until `jj git push` succeeds
|
||||||
|
- NEVER stop before pushing - that leaves work stranded locally
|
||||||
|
- NEVER say "ready to push when you are" - YOU must push
|
||||||
|
- If push fails, resolve and retry until it succeeds
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## cass — Cross-Agent Session Search
|
||||||
|
|
||||||
|
`cass` indexes prior agent conversations (Claude Code, Codex, Cursor, Gemini, ChatGPT, etc.) so we can reuse solved problems.
|
||||||
|
|
||||||
|
**Rules:** Never run bare `cass` (TUI). Always use `--robot` or `--json`.
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cass health
|
||||||
|
cass search "async runtime" --robot --limit 5
|
||||||
|
cass view /path/to/session.jsonl -n 42 --json
|
||||||
|
cass expand /path/to/session.jsonl -n 42 -C 3 --json
|
||||||
|
cass capabilities --json
|
||||||
|
cass robot-docs guide
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tips
|
||||||
|
|
||||||
|
- Use `--fields minimal` for lean output
|
||||||
|
- Filter by agent with `--agent`
|
||||||
|
- Use `--days N` to limit to recent history
|
||||||
|
|
||||||
|
stdout is data-only, stderr is diagnostics; exit code 0 means success.
|
||||||
|
|
||||||
|
Treat cass as a way to avoid re-solving problems other agents already handled.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Note for Codex/GPT-5.2
|
||||||
|
|
||||||
|
You constantly bother me and stop working with concerned questions that look similar to this:
|
||||||
|
|
||||||
|
```
|
||||||
|
Unexpected changes (need guidance)
|
||||||
|
|
||||||
|
- Working tree still shows edits I did not make in Cargo.toml, Cargo.lock, src/runtime.rs, src/scope.rs. Please advise whether to keep/commit/revert these before any further work. I did not touch them.
|
||||||
|
|
||||||
|
Next steps (pick one)
|
||||||
|
|
||||||
|
1. Decide how to handle the unrelated modified files above so we can resume cleanly.
|
||||||
|
```
|
||||||
|
|
||||||
|
NEVER EVER DO THAT AGAIN. The answer is literally ALWAYS the same: those are changes created by the potentially dozen of other agents working on the project at the same time. This is not only a common occurence, it happens multiple times PER MINUTE. The way to deal with it is simple: you NEVER, under ANY CIRCUMSTANCE, stash, revert, overwrite, or otherwise disturb in ANY way the work of other agents. Just treat those changes identically to changes that you yourself made. Just fool yourself into thinking YOU made the changes and simply don't recall it for some reason.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Note on Built-in TODO Functionality
|
||||||
|
|
||||||
|
Also, if I ask you to explicitly use your built-in TODO functionality, don't complain about this and say you need to use beads. You can use built-in TODOs if I tell you specifically to do so. Always comply with such orders.
|
||||||
|
|
||||||
## TDD Requirements
|
## TDD Requirements
|
||||||
|
|
||||||
@@ -27,58 +636,106 @@ If you aren't 100% sure how to use a third-party library, **SEARCH ONLINE** to f
|
|||||||
|
|
||||||
## Gitlore Robot Mode
|
## Gitlore Robot Mode
|
||||||
|
|
||||||
The `lore` CLI has a robot mode optimized for AI agent consumption with structured JSON output, meaningful exit codes, and TTY auto-detection.
|
The `lore` CLI has a robot mode optimized for AI agent consumption with compact JSON output, structured errors with machine-actionable recovery steps, meaningful exit codes, response timing metadata, field selection for token efficiency, and TTY auto-detection.
|
||||||
|
|
||||||
### Activation
|
### Activation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Explicit flag
|
# Explicit flag
|
||||||
lore --robot list issues
|
lore --robot issues -n 10
|
||||||
|
|
||||||
|
# JSON shorthand (-J)
|
||||||
|
lore -J issues -n 10
|
||||||
|
|
||||||
# Auto-detection (when stdout is not a TTY)
|
# Auto-detection (when stdout is not a TTY)
|
||||||
lore list issues | jq .
|
lore issues | jq .
|
||||||
|
|
||||||
# Environment variable
|
# Environment variable
|
||||||
LORE_ROBOT=true lore list issues
|
LORE_ROBOT=1 lore issues
|
||||||
```
|
```
|
||||||
|
|
||||||
### Robot Mode Commands
|
### Robot Mode Commands
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# List issues/MRs with JSON output
|
# List issues/MRs with JSON output
|
||||||
lore --robot list issues --limit=10
|
lore --robot issues -n 10
|
||||||
lore --robot list mrs --state=opened
|
lore --robot mrs -s opened
|
||||||
|
|
||||||
|
# Filter issues by work item status (case-insensitive)
|
||||||
|
lore --robot issues --status "In progress"
|
||||||
|
|
||||||
|
# List with field selection (reduces token usage ~60%)
|
||||||
|
lore --robot issues --fields minimal
|
||||||
|
lore --robot mrs --fields iid,title,state,draft
|
||||||
|
|
||||||
|
# Show detailed entity info
|
||||||
|
lore --robot issues 123
|
||||||
|
lore --robot mrs 456 -p group/repo
|
||||||
|
|
||||||
# Count entities
|
# Count entities
|
||||||
lore --robot count issues
|
lore --robot count issues
|
||||||
lore --robot count discussions --type=mr
|
lore --robot count discussions --for mr
|
||||||
|
|
||||||
# Show detailed entity info
|
# Search indexed documents
|
||||||
lore --robot show issue 123
|
lore --robot search "authentication bug"
|
||||||
lore --robot show mr 456 --project=group/repo
|
|
||||||
|
|
||||||
# Check sync status
|
# Check sync status
|
||||||
lore --robot sync-status
|
lore --robot status
|
||||||
|
|
||||||
# Run ingestion (quiet, JSON summary)
|
# Run full sync pipeline
|
||||||
lore --robot ingest --type=issues
|
lore --robot sync
|
||||||
|
|
||||||
|
# Run sync without resource events
|
||||||
|
lore --robot sync --no-events
|
||||||
|
|
||||||
|
# Run ingestion only
|
||||||
|
lore --robot ingest issues
|
||||||
|
|
||||||
# Check environment health
|
# Check environment health
|
||||||
lore --robot doctor
|
lore --robot doctor
|
||||||
|
|
||||||
|
# Document and index statistics
|
||||||
|
lore --robot stats
|
||||||
|
|
||||||
|
# Quick health pre-flight check (exit 0 = healthy, 19 = unhealthy)
|
||||||
|
lore --robot health
|
||||||
|
|
||||||
|
# Generate searchable documents from ingested data
|
||||||
|
lore --robot generate-docs
|
||||||
|
|
||||||
|
# Generate vector embeddings via Ollama
|
||||||
|
lore --robot embed
|
||||||
|
|
||||||
|
# Agent self-discovery manifest (all commands, flags, exit codes, response schemas)
|
||||||
|
lore robot-docs
|
||||||
|
|
||||||
|
# Version information
|
||||||
|
lore --robot version
|
||||||
```
|
```
|
||||||
|
|
||||||
### Response Format
|
### Response Format
|
||||||
|
|
||||||
All commands return consistent JSON:
|
All commands return compact JSON with a uniform envelope and timing metadata:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{"ok":true,"data":{...},"meta":{...}}
|
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
|
||||||
```
|
```
|
||||||
|
|
||||||
Errors return structured JSON to stderr:
|
Errors return structured JSON to stderr with machine-actionable recovery steps:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'"}}
|
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `actions` array contains executable shell commands for automated recovery. It is omitted when empty.
|
||||||
|
|
||||||
|
### Field Selection
|
||||||
|
|
||||||
|
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore -J issues --fields minimal # Preset: iid, title, state, updated_at_iso
|
||||||
|
lore -J mrs --fields iid,title,state,draft,labels # Custom field list
|
||||||
```
|
```
|
||||||
|
|
||||||
### Exit Codes
|
### Exit Codes
|
||||||
@@ -86,8 +743,8 @@ Errors return structured JSON to stderr:
|
|||||||
| Code | Meaning |
|
| Code | Meaning |
|
||||||
|------|---------|
|
|------|---------|
|
||||||
| 0 | Success |
|
| 0 | Success |
|
||||||
| 1 | Internal error |
|
| 1 | Internal error / not implemented |
|
||||||
| 2 | Config not found |
|
| 2 | Usage error (invalid flags or arguments) |
|
||||||
| 3 | Config invalid |
|
| 3 | Config invalid |
|
||||||
| 4 | Token not set |
|
| 4 | Token not set |
|
||||||
| 5 | GitLab auth failed |
|
| 5 | GitLab auth failed |
|
||||||
@@ -99,11 +756,98 @@ Errors return structured JSON to stderr:
|
|||||||
| 11 | Migration failed |
|
| 11 | Migration failed |
|
||||||
| 12 | I/O error |
|
| 12 | I/O error |
|
||||||
| 13 | Transform error |
|
| 13 | Transform error |
|
||||||
|
| 14 | Ollama unavailable |
|
||||||
|
| 15 | Ollama model not found |
|
||||||
|
| 16 | Embedding failed |
|
||||||
|
| 17 | Not found (entity does not exist) |
|
||||||
|
| 18 | Ambiguous match (use `-p` to specify project) |
|
||||||
|
| 19 | Health check failed |
|
||||||
|
| 20 | Config not found |
|
||||||
|
|
||||||
|
### Configuration Precedence
|
||||||
|
|
||||||
|
1. CLI flags (highest priority)
|
||||||
|
2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`)
|
||||||
|
3. Config file (`~/.config/lore/config.json`)
|
||||||
|
4. Built-in defaults (lowest priority)
|
||||||
|
|
||||||
### Best Practices
|
### Best Practices
|
||||||
|
|
||||||
- Use `lore --robot` for all agent interactions
|
- Use `lore --robot` or `lore -J` for all agent interactions
|
||||||
- Check exit codes for error handling
|
- Check exit codes for error handling
|
||||||
- Parse JSON errors from stderr
|
- Parse JSON errors from stderr; use `actions` array for automated recovery
|
||||||
- Use `--limit` to control response size
|
- Use `--fields minimal` to reduce token usage (~60% fewer tokens)
|
||||||
|
- Use `-n` / `--limit` to control response size
|
||||||
|
- Use `-q` / `--quiet` to suppress progress bars and non-essential output
|
||||||
|
- Use `--color never` in non-TTY automation for ANSI-free output
|
||||||
|
- Use `-v` / `-vv` / `-vvv` for increasing verbosity (debug/trace logging)
|
||||||
|
- Use `--log-format json` for machine-readable log output to stderr
|
||||||
- TTY detection handles piped commands automatically
|
- TTY detection handles piped commands automatically
|
||||||
|
- Use `lore --robot health` as a fast pre-flight check before queries
|
||||||
|
- Use `lore robot-docs` for response schema discovery
|
||||||
|
- The `-p` flag supports fuzzy project matching (suffix and substring)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Read/Write Split: lore vs glab
|
||||||
|
|
||||||
|
| Operation | Tool | Why |
|
||||||
|
|-----------|------|-----|
|
||||||
|
| List issues/MRs | lore | Richer: includes status, discussions, closing MRs |
|
||||||
|
| View issue/MR detail | lore | Pre-joined discussions, work-item status |
|
||||||
|
| Search across entities | lore | FTS5 + vector hybrid search |
|
||||||
|
| Expert/workload analysis | lore | who command — no glab equivalent |
|
||||||
|
| Timeline reconstruction | lore | Chronological narrative — no glab equivalent |
|
||||||
|
| Create/update/close | glab | Write operations |
|
||||||
|
| Approve/merge MR | glab | Write operations |
|
||||||
|
| CI/CD pipelines | glab | Not in lore scope |
|
||||||
|
|
||||||
|
````markdown
|
||||||
|
## UBS Quick Reference for AI Agents
|
||||||
|
|
||||||
|
UBS stands for "Ultimate Bug Scanner": **The AI Coding Agent's Secret Weapon: Flagging Likely Bugs for Fixing Early On**
|
||||||
|
|
||||||
|
**Install:** `curl -sSL https://raw.githubusercontent.com/Dicklesworthstone/ultimate_bug_scanner/master/install.sh | bash`
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
**Commands:**
|
||||||
|
```bash
|
||||||
|
ubs file.ts file2.py # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(git diff --name-only --cached) # Staged files — before commit
|
||||||
|
ubs --only=js,python src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs --help # Full command reference
|
||||||
|
ubs sessions --entries 1 # Tail the latest install session log
|
||||||
|
ubs . # Whole project (ignores things like .venv and node_modules automatically)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Output Format:**
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.ts:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
**Fix Workflow:**
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
**Speed Critical:** Scope to changed files. `ubs src/file.ts` (< 1s) vs `ubs .` (30s). Never full scan for small edits.
|
||||||
|
|
||||||
|
**Bug Severity:**
|
||||||
|
- **Critical** (always fix): Null safety, XSS/injection, async/await, memory leaks
|
||||||
|
- **Important** (production): Type narrowing, division-by-zero, resource leaks
|
||||||
|
- **Contextual** (judgment): TODO/FIXME, console logs
|
||||||
|
|
||||||
|
**Anti-Patterns:**
|
||||||
|
- ❌ Ignore findings → ✅ Investigate each
|
||||||
|
- ❌ Full scan per edit → ✅ Scope to file
|
||||||
|
- ❌ Fix symptom (`if (x) { x.y }`) → ✅ Root cause (`x?.y`)
|
||||||
|
````
|
||||||
|
|||||||
742
AGENTS.md.backup
Normal file
742
AGENTS.md.backup
Normal file
@@ -0,0 +1,742 @@
|
|||||||
|
# AGENTS.md
|
||||||
|
|
||||||
|
## RULE 0 - THE FUNDAMENTAL OVERRIDE PEROGATIVE
|
||||||
|
|
||||||
|
If I tell you to do something, even if it goes against what follows below, YOU MUST LISTEN TO ME. I AM IN CHARGE, NOT YOU.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## RULE NUMBER 1: NO FILE DELETION
|
||||||
|
|
||||||
|
**YOU ARE NEVER ALLOWED TO DELETE A FILE WITHOUT EXPRESS PERMISSION.** Even a new file that you yourself created, such as a test code file. You have a horrible track record of deleting critically important files or otherwise throwing away tons of expensive work. As a result, you have permanently lost any and all rights to determine that a file or folder should be deleted.
|
||||||
|
|
||||||
|
**YOU MUST ALWAYS ASK AND RECEIVE CLEAR, WRITTEN PERMISSION BEFORE EVER DELETING A FILE OR FOLDER OF ANY KIND.**
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Irreversible Git & Filesystem Actions — DO NOT EVER BREAK GLASS
|
||||||
|
|
||||||
|
> **Note:** Treat destructive commands as break-glass. If there's any doubt, stop and ask.
|
||||||
|
|
||||||
|
1. **Absolutely forbidden commands:** `git reset --hard`, `git clean -fd`, `rm -rf`, or any command that can delete or overwrite code/data must never be run unless the user explicitly provides the exact command and states, in the same message, that they understand and want the irreversible consequences.
|
||||||
|
2. **No guessing:** If there is any uncertainty about what a command might delete or overwrite, stop immediately and ask the user for specific approval. "I think it's safe" is never acceptable.
|
||||||
|
3. **Safer alternatives first:** When cleanup or rollbacks are needed, request permission to use non-destructive options (`git status`, `git diff`, `git stash`, copying to backups) before ever considering a destructive command.
|
||||||
|
4. **Mandatory explicit plan:** Even after explicit user authorization, restate the command verbatim, list exactly what will be affected, and wait for a confirmation that your understanding is correct. Only then may you execute it—if anything remains ambiguous, refuse and escalate.
|
||||||
|
5. **Document the confirmation:** When running any approved destructive command, record (in the session notes / final response) the exact user text that authorized it, the command actually run, and the execution time. If that record is absent, the operation did not happen.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Toolchain: Rust & Cargo
|
||||||
|
|
||||||
|
We only use **Cargo** in this project, NEVER any other package manager.
|
||||||
|
|
||||||
|
- **Edition/toolchain:** Follow `rust-toolchain.toml` (if present). Do not assume stable vs nightly.
|
||||||
|
- **Dependencies:** Explicit versions for stability; keep the set minimal.
|
||||||
|
- **Configuration:** Cargo.toml only
|
||||||
|
- **Unsafe code:** Forbidden (`#![forbid(unsafe_code)]`)
|
||||||
|
|
||||||
|
When writing Rust code, reference RUST_CLI_TOOLS_BEST_PRACTICES.md
|
||||||
|
|
||||||
|
### Release Profile
|
||||||
|
|
||||||
|
Use the release profile defined in `Cargo.toml`. If you need to change it, justify the
|
||||||
|
performance/size tradeoff and how it impacts determinism and cancellation behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Code Editing Discipline
|
||||||
|
|
||||||
|
### No Script-Based Changes
|
||||||
|
|
||||||
|
**NEVER** run a script that processes/changes code files in this repo. Brittle regex-based transformations create far more problems than they solve.
|
||||||
|
|
||||||
|
- **Always make code changes manually**, even when there are many instances
|
||||||
|
- For many simple changes: use parallel subagents
|
||||||
|
- For subtle/complex changes: do them methodically yourself
|
||||||
|
|
||||||
|
### No File Proliferation
|
||||||
|
|
||||||
|
If you want to change something or add a feature, **revise existing code files in place**.
|
||||||
|
|
||||||
|
**NEVER** create variations like:
|
||||||
|
- `mainV2.rs`
|
||||||
|
- `main_improved.rs`
|
||||||
|
- `main_enhanced.rs`
|
||||||
|
|
||||||
|
New files are reserved for **genuinely new functionality** that makes zero sense to include in any existing file. The bar for creating new files is **incredibly high**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Backwards Compatibility
|
||||||
|
|
||||||
|
We do not care about backwards compatibility—we're in early development with no users. We want to do things the **RIGHT** way with **NO TECH DEBT**.
|
||||||
|
|
||||||
|
- Never create "compatibility shims"
|
||||||
|
- Never create wrapper functions for deprecated APIs
|
||||||
|
- Just fix the code directly
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Compiler Checks (CRITICAL)
|
||||||
|
|
||||||
|
**After any substantive code changes, you MUST verify no errors were introduced:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Check for compiler errors and warnings
|
||||||
|
cargo check --all-targets
|
||||||
|
|
||||||
|
# Check for clippy lints (pedantic + nursery are enabled)
|
||||||
|
cargo clippy --all-targets -- -D warnings
|
||||||
|
|
||||||
|
# Verify formatting
|
||||||
|
cargo fmt --check
|
||||||
|
```
|
||||||
|
|
||||||
|
If you see errors, **carefully understand and resolve each issue**. Read sufficient context to fix them the RIGHT way.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Testing
|
||||||
|
|
||||||
|
### Unit & Property Tests
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Run all tests
|
||||||
|
cargo test
|
||||||
|
|
||||||
|
# Run with output
|
||||||
|
cargo test -- --nocapture
|
||||||
|
```
|
||||||
|
|
||||||
|
When adding or changing primitives, add tests that assert the core invariants:
|
||||||
|
|
||||||
|
- no task leaks
|
||||||
|
- no obligation leaks
|
||||||
|
- losers are drained after races
|
||||||
|
- region close implies quiescence
|
||||||
|
|
||||||
|
Prefer deterministic lab-runtime tests for concurrency-sensitive behavior.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## MCP Agent Mail — Multi-Agent Coordination
|
||||||
|
|
||||||
|
A mail-like layer that lets coding agents coordinate asynchronously via MCP tools and resources. Provides identities, inbox/outbox, searchable threads, and advisory file reservations with human-auditable artifacts in Git.
|
||||||
|
|
||||||
|
### Why It's Useful
|
||||||
|
|
||||||
|
- **Prevents conflicts:** Explicit file reservations (leases) for files/globs
|
||||||
|
- **Token-efficient:** Messages stored in per-project archive, not in context
|
||||||
|
- **Quick reads:** `resource://inbox/...`, `resource://thread/...`
|
||||||
|
|
||||||
|
### Same Repository Workflow
|
||||||
|
|
||||||
|
1. **Register identity:**
|
||||||
|
```
|
||||||
|
ensure_project(project_key=<abs-path>)
|
||||||
|
register_agent(project_key, program, model)
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Reserve files before editing:**
|
||||||
|
```
|
||||||
|
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true)
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Communicate with threads:**
|
||||||
|
```
|
||||||
|
send_message(..., thread_id="FEAT-123")
|
||||||
|
fetch_inbox(project_key, agent_name)
|
||||||
|
acknowledge_message(project_key, agent_name, message_id)
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Quick reads:**
|
||||||
|
```
|
||||||
|
resource://inbox/{Agent}?project=<abs-path>&limit=20
|
||||||
|
resource://thread/{id}?project=<abs-path>&include_bodies=true
|
||||||
|
```
|
||||||
|
|
||||||
|
### Macros vs Granular Tools
|
||||||
|
|
||||||
|
- **Prefer macros for speed:** `macro_start_session`, `macro_prepare_thread`, `macro_file_reservation_cycle`, `macro_contact_handshake`
|
||||||
|
- **Use granular tools for control:** `register_agent`, `file_reservation_paths`, `send_message`, `fetch_inbox`, `acknowledge_message`
|
||||||
|
|
||||||
|
### Common Pitfalls
|
||||||
|
|
||||||
|
- `"from_agent not registered"`: Always `register_agent` in the correct `project_key` first
|
||||||
|
- `"FILE_RESERVATION_CONFLICT"`: Adjust patterns, wait for expiry, or use non-exclusive reservation
|
||||||
|
- **Auth errors:** If JWT+JWKS enabled, include bearer token with matching `kid`
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Beads (br) — Dependency-Aware Issue Tracking
|
||||||
|
|
||||||
|
Beads provides a lightweight, dependency-aware issue database and CLI (`br` / beads_rust) for selecting "ready work," setting priorities, and tracking status. It complements MCP Agent Mail's messaging and file reservations.
|
||||||
|
|
||||||
|
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
|
||||||
|
|
||||||
|
### Conventions
|
||||||
|
|
||||||
|
- **Single source of truth:** Beads for task status/priority/dependencies; Agent Mail for conversation and audit
|
||||||
|
- **Shared identifiers:** Use Beads issue ID (e.g., `br-123`) as Mail `thread_id` and prefix subjects with `[br-123]`
|
||||||
|
- **Reservations:** When starting a task, call `file_reservation_paths()` with the issue ID in `reason`
|
||||||
|
|
||||||
|
### Typical Agent Flow
|
||||||
|
|
||||||
|
1. **Pick ready work (Beads):**
|
||||||
|
```bash
|
||||||
|
br ready --json # Choose highest priority, no blockers
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Reserve edit surface (Mail):**
|
||||||
|
```
|
||||||
|
file_reservation_paths(project_key, agent_name, ["src/**"], ttl_seconds=3600, exclusive=true, reason="br-123")
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Announce start (Mail):**
|
||||||
|
```
|
||||||
|
send_message(..., thread_id="br-123", subject="[br-123] Start: <title>", ack_required=true)
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Work and update:** Reply in-thread with progress
|
||||||
|
|
||||||
|
5. **Complete and release:**
|
||||||
|
```bash
|
||||||
|
br close br-123 --reason "Completed"
|
||||||
|
```
|
||||||
|
```
|
||||||
|
release_file_reservations(project_key, agent_name, paths=["src/**"])
|
||||||
|
```
|
||||||
|
Final Mail reply: `[br-123] Completed` with summary
|
||||||
|
|
||||||
|
### Mapping Cheat Sheet
|
||||||
|
|
||||||
|
| Concept | Value |
|
||||||
|
|---------|-------|
|
||||||
|
| Mail `thread_id` | `br-###` |
|
||||||
|
| Mail subject | `[br-###] ...` |
|
||||||
|
| File reservation `reason` | `br-###` |
|
||||||
|
| Commit messages | Include `br-###` for traceability |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## bv — Graph-Aware Triage Engine
|
||||||
|
|
||||||
|
bv is a graph-aware triage engine for Beads projects (`.beads/beads.jsonl`). It computes PageRank, betweenness, critical path, cycles, HITS, eigenvector, and k-core metrics deterministically.
|
||||||
|
|
||||||
|
**Scope boundary:** bv handles *what to work on* (triage, priority, planning). For agent-to-agent coordination (messaging, work claiming, file reservations), use MCP Agent Mail.
|
||||||
|
|
||||||
|
**CRITICAL: Use ONLY `--robot-*` flags. Bare `bv` launches an interactive TUI that blocks your session.**
|
||||||
|
|
||||||
|
### The Workflow: Start With Triage
|
||||||
|
|
||||||
|
**`bv --robot-triage` is your single entry point.** It returns:
|
||||||
|
- `quick_ref`: at-a-glance counts + top 3 picks
|
||||||
|
- `recommendations`: ranked actionable items with scores, reasons, unblock info
|
||||||
|
- `quick_wins`: low-effort high-impact items
|
||||||
|
- `blockers_to_clear`: items that unblock the most downstream work
|
||||||
|
- `project_health`: status/type/priority distributions, graph metrics
|
||||||
|
- `commands`: copy-paste shell commands for next steps
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bv --robot-triage # THE MEGA-COMMAND: start here
|
||||||
|
bv --robot-next # Minimal: just the single top pick + claim command
|
||||||
|
```
|
||||||
|
|
||||||
|
### Command Reference
|
||||||
|
|
||||||
|
**Planning:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-plan` | Parallel execution tracks with `unblocks` lists |
|
||||||
|
| `--robot-priority` | Priority misalignment detection with confidence |
|
||||||
|
|
||||||
|
**Graph Analysis:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-insights` | Full metrics: PageRank, betweenness, HITS, eigenvector, critical path, cycles, k-core, articulation points, slack |
|
||||||
|
| `--robot-label-health` | Per-label health: `health_level`, `velocity_score`, `staleness`, `blocked_count` |
|
||||||
|
| `--robot-label-flow` | Cross-label dependency: `flow_matrix`, `dependencies`, `bottleneck_labels` |
|
||||||
|
| `--robot-label-attention [--attention-limit=N]` | Attention-ranked labels |
|
||||||
|
|
||||||
|
**History & Change Tracking:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-history` | Bead-to-commit correlations |
|
||||||
|
| `--robot-diff --diff-since <ref>` | Changes since ref: new/closed/modified issues, cycles |
|
||||||
|
|
||||||
|
**Other:**
|
||||||
|
| Command | Returns |
|
||||||
|
|---------|---------|
|
||||||
|
| `--robot-burndown <sprint>` | Sprint burndown, scope changes, at-risk items |
|
||||||
|
| `--robot-forecast <id\|all>` | ETA predictions with dependency-aware scheduling |
|
||||||
|
| `--robot-alerts` | Stale issues, blocking cascades, priority mismatches |
|
||||||
|
| `--robot-suggest` | Hygiene: duplicates, missing deps, label suggestions |
|
||||||
|
| `--robot-graph [--graph-format=json\|dot\|mermaid]` | Dependency graph export |
|
||||||
|
| `--export-graph <file.html>` | Interactive HTML visualization |
|
||||||
|
|
||||||
|
### Scoping & Filtering
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bv --robot-plan --label backend # Scope to label's subgraph
|
||||||
|
bv --robot-insights --as-of HEAD~30 # Historical point-in-time
|
||||||
|
bv --recipe actionable --robot-plan # Pre-filter: ready to work
|
||||||
|
bv --recipe high-impact --robot-triage # Pre-filter: top PageRank
|
||||||
|
bv --robot-triage --robot-triage-by-track # Group by parallel work streams
|
||||||
|
bv --robot-triage --robot-triage-by-label # Group by domain
|
||||||
|
```
|
||||||
|
|
||||||
|
### Understanding Robot Output
|
||||||
|
|
||||||
|
**All robot JSON includes:**
|
||||||
|
- `data_hash` — Fingerprint of source beads.jsonl
|
||||||
|
- `status` — Per-metric state: `computed|approx|timeout|skipped` + elapsed ms
|
||||||
|
- `as_of` / `as_of_commit` — Present when using `--as-of`
|
||||||
|
|
||||||
|
**Two-phase analysis:**
|
||||||
|
- **Phase 1 (instant):** degree, topo sort, density
|
||||||
|
- **Phase 2 (async, 500ms timeout):** PageRank, betweenness, HITS, eigenvector, cycles
|
||||||
|
|
||||||
|
### jq Quick Reference
|
||||||
|
|
||||||
|
```bash
|
||||||
|
bv --robot-triage | jq '.quick_ref' # At-a-glance summary
|
||||||
|
bv --robot-triage | jq '.recommendations[0]' # Top recommendation
|
||||||
|
bv --robot-plan | jq '.plan.summary.highest_impact' # Best unblock target
|
||||||
|
bv --robot-insights | jq '.status' # Check metric readiness
|
||||||
|
bv --robot-insights | jq '.Cycles' # Circular deps (must fix!)
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## UBS — Ultimate Bug Scanner
|
||||||
|
|
||||||
|
**Golden Rule:** `ubs <changed-files>` before every commit. Exit 0 = safe. Exit >0 = fix & re-run.
|
||||||
|
|
||||||
|
### Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
ubs file.rs file2.rs # Specific files (< 1s) — USE THIS
|
||||||
|
ubs $(git diff --name-only --cached) # Staged files — before commit
|
||||||
|
ubs --only=rust,toml src/ # Language filter (3-5x faster)
|
||||||
|
ubs --ci --fail-on-warning . # CI mode — before PR
|
||||||
|
ubs . # Whole project (ignores target/, Cargo.lock)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
⚠️ Category (N errors)
|
||||||
|
file.rs:42:5 – Issue description
|
||||||
|
💡 Suggested fix
|
||||||
|
Exit code: 1
|
||||||
|
```
|
||||||
|
|
||||||
|
Parse: `file:line:col` → location | 💡 → how to fix | Exit 0/1 → pass/fail
|
||||||
|
|
||||||
|
### Fix Workflow
|
||||||
|
|
||||||
|
1. Read finding → category + fix suggestion
|
||||||
|
2. Navigate `file:line:col` → view context
|
||||||
|
3. Verify real issue (not false positive)
|
||||||
|
4. Fix root cause (not symptom)
|
||||||
|
5. Re-run `ubs <file>` → exit 0
|
||||||
|
6. Commit
|
||||||
|
|
||||||
|
### Bug Severity
|
||||||
|
|
||||||
|
- **Critical (always fix):** Memory safety, use-after-free, data races, SQL injection
|
||||||
|
- **Important (production):** Unwrap panics, resource leaks, overflow checks
|
||||||
|
- **Contextual (judgment):** TODO/FIXME, println! debugging
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## ast-grep vs ripgrep
|
||||||
|
|
||||||
|
**Use `ast-grep` when structure matters.** It parses code and matches AST nodes, ignoring comments/strings, and can **safely rewrite** code.
|
||||||
|
|
||||||
|
- Refactors/codemods: rename APIs, change import forms
|
||||||
|
- Policy checks: enforce patterns across a repo
|
||||||
|
- Editor/automation: LSP mode, `--json` output
|
||||||
|
|
||||||
|
**Use `ripgrep` when text is enough.** Fastest way to grep literals/regex.
|
||||||
|
|
||||||
|
- Recon: find strings, TODOs, log lines, config values
|
||||||
|
- Pre-filter: narrow candidate files before ast-grep
|
||||||
|
|
||||||
|
### Rule of Thumb
|
||||||
|
|
||||||
|
- Need correctness or **applying changes** → `ast-grep`
|
||||||
|
- Need raw speed or **hunting text** → `rg`
|
||||||
|
- Often combine: `rg` to shortlist files, then `ast-grep` to match/modify
|
||||||
|
|
||||||
|
### Rust Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Find structured code (ignores comments)
|
||||||
|
ast-grep run -l Rust -p 'fn $NAME($$$ARGS) -> $RET { $$$BODY }'
|
||||||
|
|
||||||
|
# Find all unwrap() calls
|
||||||
|
ast-grep run -l Rust -p '$EXPR.unwrap()'
|
||||||
|
|
||||||
|
# Quick textual hunt
|
||||||
|
rg -n 'println!' -t rust
|
||||||
|
|
||||||
|
# Combine speed + precision
|
||||||
|
rg -l -t rust 'unwrap\(' | xargs ast-grep run -l Rust -p '$X.unwrap()' --json
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Morph Warp Grep — AI-Powered Code Search
|
||||||
|
|
||||||
|
**Use `mcp__morph-mcp__warp_grep` for exploratory "how does X work?" questions.** An AI agent expands your query, greps the codebase, reads relevant files, and returns precise line ranges with full context.
|
||||||
|
|
||||||
|
**Use `ripgrep` for targeted searches.** When you know exactly what you're looking for.
|
||||||
|
|
||||||
|
**Use `ast-grep` for structural patterns.** When you need AST precision for matching/rewriting.
|
||||||
|
|
||||||
|
### When to Use What
|
||||||
|
|
||||||
|
| Scenario | Tool | Why |
|
||||||
|
|----------|------|-----|
|
||||||
|
| "How is pattern matching implemented?" | `warp_grep` | Exploratory; don't know where to start |
|
||||||
|
| "Where is the quick reject filter?" | `warp_grep` | Need to understand architecture |
|
||||||
|
| "Find all uses of `Regex::new`" | `ripgrep` | Targeted literal search |
|
||||||
|
| "Find files with `println!`" | `ripgrep` | Simple pattern |
|
||||||
|
| "Replace all `unwrap()` with `expect()`" | `ast-grep` | Structural refactor |
|
||||||
|
|
||||||
|
### warp_grep Usage
|
||||||
|
|
||||||
|
```
|
||||||
|
mcp__morph-mcp__warp_grep(
|
||||||
|
repoPath: "/path/to/dcg",
|
||||||
|
query: "How does the safe pattern whitelist work?"
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
Returns structured results with file paths, line ranges, and extracted code snippets.
|
||||||
|
|
||||||
|
### Anti-Patterns
|
||||||
|
|
||||||
|
- **Don't** use `warp_grep` to find a specific function name → use `ripgrep`
|
||||||
|
- **Don't** use `ripgrep` to understand "how does X work" → wastes time with manual reads
|
||||||
|
- **Don't** use `ripgrep` for codemods → risks collateral edits
|
||||||
|
|
||||||
|
<!-- bv-agent-instructions-v1 -->
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Beads Workflow Integration
|
||||||
|
|
||||||
|
This project uses [beads_viewer](https://github.com/Dicklesworthstone/beads_viewer) for issue tracking. Issues are stored in `.beads/` and tracked in git.
|
||||||
|
|
||||||
|
**Note:** `br` is non-invasive—it never executes git commands directly. You must run git commands manually after `br sync --flush-only`.
|
||||||
|
|
||||||
|
### Essential Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# View issues (launches TUI - avoid in automated sessions)
|
||||||
|
bv
|
||||||
|
|
||||||
|
# CLI commands for agents (use these instead)
|
||||||
|
br ready # Show issues ready to work (no blockers)
|
||||||
|
br list --status=open # All open issues
|
||||||
|
br show <id> # Full issue details with dependencies
|
||||||
|
br create --title="..." --type=task --priority=2
|
||||||
|
br update <id> --status=in_progress
|
||||||
|
br close <id> --reason="Completed"
|
||||||
|
br close <id1> <id2> # Close multiple issues at once
|
||||||
|
br sync --flush-only # Export to JSONL (then manually: git add .beads/ && git commit)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Workflow Pattern
|
||||||
|
|
||||||
|
1. **Start**: Run `br ready` to find actionable work
|
||||||
|
2. **Claim**: Use `br update <id> --status=in_progress`
|
||||||
|
3. **Work**: Implement the task
|
||||||
|
4. **Complete**: Use `br close <id>`
|
||||||
|
5. **Sync**: Run `br sync --flush-only`, then `git add .beads/ && git commit -m "Update beads"`
|
||||||
|
|
||||||
|
### Key Concepts
|
||||||
|
|
||||||
|
- **Dependencies**: Issues can block other issues. `br ready` shows only unblocked work.
|
||||||
|
- **Priority**: P0=critical, P1=high, P2=medium, P3=low, P4=backlog (use numbers, not words)
|
||||||
|
- **Types**: task, bug, feature, epic, question, docs
|
||||||
|
- **Blocking**: `br dep add <issue> <depends-on>` to add dependencies
|
||||||
|
|
||||||
|
### Session Protocol
|
||||||
|
|
||||||
|
**Before ending any session, run this checklist:**
|
||||||
|
|
||||||
|
```bash
|
||||||
|
git status # Check what changed
|
||||||
|
git add <files> # Stage code changes
|
||||||
|
br sync --flush-only # Export beads to JSONL
|
||||||
|
git add .beads/ # Stage beads changes
|
||||||
|
git commit -m "..." # Commit code and beads
|
||||||
|
git push # Push to remote
|
||||||
|
```
|
||||||
|
|
||||||
|
### Best Practices
|
||||||
|
|
||||||
|
- Check `br ready` at session start to find available work
|
||||||
|
- Update status as you work (in_progress → closed)
|
||||||
|
- Create new issues with `br create` when you discover tasks
|
||||||
|
- Use descriptive titles and set appropriate priority/type
|
||||||
|
- Always run `br sync --flush-only` then commit .beads/ before ending session
|
||||||
|
|
||||||
|
<!-- end-bv-agent-instructions -->
|
||||||
|
|
||||||
|
## Landing the Plane (Session Completion)
|
||||||
|
|
||||||
|
**When ending a work session**, you MUST complete ALL steps below. Work is NOT complete until `git push` succeeds.
|
||||||
|
|
||||||
|
**MANDATORY WORKFLOW:**
|
||||||
|
|
||||||
|
1. **File issues for remaining work** - Create issues for anything that needs follow-up
|
||||||
|
2. **Run quality gates** (if code changed) - Tests, linters, builds
|
||||||
|
3. **Update issue status** - Close finished work, update in-progress items
|
||||||
|
4. **PUSH TO REMOTE** - This is MANDATORY:
|
||||||
|
```bash
|
||||||
|
git pull --rebase
|
||||||
|
br sync --flush-only
|
||||||
|
git add .beads/
|
||||||
|
git commit -m "Update beads"
|
||||||
|
git push
|
||||||
|
git status # MUST show "up to date with origin"
|
||||||
|
```
|
||||||
|
5. **Clean up** - Clear stashes, prune remote branches
|
||||||
|
6. **Verify** - All changes committed AND pushed
|
||||||
|
7. **Hand off** - Provide context for next session
|
||||||
|
|
||||||
|
**CRITICAL RULES:**
|
||||||
|
- Work is NOT complete until `git push` succeeds
|
||||||
|
- NEVER stop before pushing - that leaves work stranded locally
|
||||||
|
- NEVER say "ready to push when you are" - YOU must push
|
||||||
|
- If push fails, resolve and retry until it succeeds
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## cass — Cross-Agent Session Search
|
||||||
|
|
||||||
|
`cass` indexes prior agent conversations (Claude Code, Codex, Cursor, Gemini, ChatGPT, etc.) so we can reuse solved problems.
|
||||||
|
|
||||||
|
**Rules:** Never run bare `cass` (TUI). Always use `--robot` or `--json`.
|
||||||
|
|
||||||
|
### Examples
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cass health
|
||||||
|
cass search "async runtime" --robot --limit 5
|
||||||
|
cass view /path/to/session.jsonl -n 42 --json
|
||||||
|
cass expand /path/to/session.jsonl -n 42 -C 3 --json
|
||||||
|
cass capabilities --json
|
||||||
|
cass robot-docs guide
|
||||||
|
```
|
||||||
|
|
||||||
|
### Tips
|
||||||
|
|
||||||
|
- Use `--fields minimal` for lean output
|
||||||
|
- Filter by agent with `--agent`
|
||||||
|
- Use `--days N` to limit to recent history
|
||||||
|
|
||||||
|
stdout is data-only, stderr is diagnostics; exit code 0 means success.
|
||||||
|
|
||||||
|
Treat cass as a way to avoid re-solving problems other agents already handled.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Note for Codex/GPT-5.2
|
||||||
|
|
||||||
|
You constantly bother me and stop working with concerned questions that look similar to this:
|
||||||
|
|
||||||
|
```
|
||||||
|
Unexpected changes (need guidance)
|
||||||
|
|
||||||
|
- Working tree still shows edits I did not make in Cargo.toml, Cargo.lock, src/runtime.rs, src/scope.rs. Please advise whether to keep/commit/revert these before any further work. I did not touch them.
|
||||||
|
|
||||||
|
Next steps (pick one)
|
||||||
|
|
||||||
|
1. Decide how to handle the unrelated modified files above so we can resume cleanly.
|
||||||
|
```
|
||||||
|
|
||||||
|
NEVER EVER DO THAT AGAIN. The answer is literally ALWAYS the same: those are changes created by the potentially dozen of other agents working on the project at the same time. This is not only a common occurence, it happens multiple times PER MINUTE. The way to deal with it is simple: you NEVER, under ANY CIRCUMSTANCE, stash, revert, overwrite, or otherwise disturb in ANY way the work of other agents. Just treat those changes identically to changes that you yourself made. Just fool yourself into thinking YOU made the changes and simply don't recall it for some reason.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Note on Built-in TODO Functionality
|
||||||
|
|
||||||
|
Also, if I ask you to explicitly use your built-in TODO functionality, don't complain about this and say you need to use beads. You can use built-in TODOs if I tell you specifically to do so. Always comply with such orders.
|
||||||
|
|
||||||
|
## TDD Requirements
|
||||||
|
|
||||||
|
Test-first development is mandatory:
|
||||||
|
1. **RED** - Write failing test first
|
||||||
|
2. **GREEN** - Minimal implementation to pass
|
||||||
|
3. **REFACTOR** - Clean up while green
|
||||||
|
|
||||||
|
## Key Patterns
|
||||||
|
|
||||||
|
Find the simplest solution that meets all acceptance criteria.
|
||||||
|
Use third party libraries whenever there's a well-maintained, active, and widely adopted solution (for example, date-fns for TS date math)
|
||||||
|
Build extensible pieces of logic that can easily be integrated with other pieces.
|
||||||
|
DRY principles should be loosely held.
|
||||||
|
Architecture MUST be clear and well thought-out. Ask the user for clarification whenever ambiguity is discovered around architecture, or you think a better approach than planned exists.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Third-Party Library Usage
|
||||||
|
|
||||||
|
If you aren't 100% sure how to use a third-party library, **SEARCH ONLINE** to find the latest documentation and mid-2025 best practices.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Gitlore Robot Mode
|
||||||
|
|
||||||
|
The `lore` CLI has a robot mode optimized for AI agent consumption with compact JSON output, structured errors with machine-actionable recovery steps, meaningful exit codes, response timing metadata, field selection for token efficiency, and TTY auto-detection.
|
||||||
|
|
||||||
|
### Activation
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Explicit flag
|
||||||
|
lore --robot issues -n 10
|
||||||
|
|
||||||
|
# JSON shorthand (-J)
|
||||||
|
lore -J issues -n 10
|
||||||
|
|
||||||
|
# Auto-detection (when stdout is not a TTY)
|
||||||
|
lore issues | jq .
|
||||||
|
|
||||||
|
# Environment variable
|
||||||
|
LORE_ROBOT=1 lore issues
|
||||||
|
```
|
||||||
|
|
||||||
|
### Robot Mode Commands
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# List issues/MRs with JSON output
|
||||||
|
lore --robot issues -n 10
|
||||||
|
lore --robot mrs -s opened
|
||||||
|
|
||||||
|
# List with field selection (reduces token usage ~60%)
|
||||||
|
lore --robot issues --fields minimal
|
||||||
|
lore --robot mrs --fields iid,title,state,draft
|
||||||
|
|
||||||
|
# Show detailed entity info
|
||||||
|
lore --robot issues 123
|
||||||
|
lore --robot mrs 456 -p group/repo
|
||||||
|
|
||||||
|
# Count entities
|
||||||
|
lore --robot count issues
|
||||||
|
lore --robot count discussions --for mr
|
||||||
|
|
||||||
|
# Search indexed documents
|
||||||
|
lore --robot search "authentication bug"
|
||||||
|
|
||||||
|
# Check sync status
|
||||||
|
lore --robot status
|
||||||
|
|
||||||
|
# Run full sync pipeline
|
||||||
|
lore --robot sync
|
||||||
|
|
||||||
|
# Run sync without resource events
|
||||||
|
lore --robot sync --no-events
|
||||||
|
|
||||||
|
# Run ingestion only
|
||||||
|
lore --robot ingest issues
|
||||||
|
|
||||||
|
# Check environment health
|
||||||
|
lore --robot doctor
|
||||||
|
|
||||||
|
# Document and index statistics
|
||||||
|
lore --robot stats
|
||||||
|
|
||||||
|
# Quick health pre-flight check (exit 0 = healthy, 19 = unhealthy)
|
||||||
|
lore --robot health
|
||||||
|
|
||||||
|
# Generate searchable documents from ingested data
|
||||||
|
lore --robot generate-docs
|
||||||
|
|
||||||
|
# Generate vector embeddings via Ollama
|
||||||
|
lore --robot embed
|
||||||
|
|
||||||
|
# Agent self-discovery manifest (all commands, flags, exit codes, response schemas)
|
||||||
|
lore robot-docs
|
||||||
|
|
||||||
|
# Version information
|
||||||
|
lore --robot version
|
||||||
|
```
|
||||||
|
|
||||||
|
### Response Format
|
||||||
|
|
||||||
|
All commands return compact JSON with a uniform envelope and timing metadata:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Errors return structured JSON to stderr with machine-actionable recovery steps:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `actions` array contains executable shell commands for automated recovery. It is omitted when empty.
|
||||||
|
|
||||||
|
### Field Selection
|
||||||
|
|
||||||
|
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore -J issues --fields minimal # Preset: iid, title, state, updated_at_iso
|
||||||
|
lore -J mrs --fields iid,title,state,draft,labels # Custom field list
|
||||||
|
```
|
||||||
|
|
||||||
|
### Exit Codes
|
||||||
|
|
||||||
|
| Code | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| 0 | Success |
|
||||||
|
| 1 | Internal error / not implemented |
|
||||||
|
| 2 | Usage error (invalid flags or arguments) |
|
||||||
|
| 3 | Config invalid |
|
||||||
|
| 4 | Token not set |
|
||||||
|
| 5 | GitLab auth failed |
|
||||||
|
| 6 | Resource not found |
|
||||||
|
| 7 | Rate limited |
|
||||||
|
| 8 | Network error |
|
||||||
|
| 9 | Database locked |
|
||||||
|
| 10 | Database error |
|
||||||
|
| 11 | Migration failed |
|
||||||
|
| 12 | I/O error |
|
||||||
|
| 13 | Transform error |
|
||||||
|
| 14 | Ollama unavailable |
|
||||||
|
| 15 | Ollama model not found |
|
||||||
|
| 16 | Embedding failed |
|
||||||
|
| 17 | Not found (entity does not exist) |
|
||||||
|
| 18 | Ambiguous match (use `-p` to specify project) |
|
||||||
|
| 19 | Health check failed |
|
||||||
|
| 20 | Config not found |
|
||||||
|
|
||||||
|
### Configuration Precedence
|
||||||
|
|
||||||
|
1. CLI flags (highest priority)
|
||||||
|
2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`)
|
||||||
|
3. Config file (`~/.config/lore/config.json`)
|
||||||
|
4. Built-in defaults (lowest priority)
|
||||||
|
|
||||||
|
### Best Practices
|
||||||
|
|
||||||
|
- Use `lore --robot` or `lore -J` for all agent interactions
|
||||||
|
- Check exit codes for error handling
|
||||||
|
- Parse JSON errors from stderr; use `actions` array for automated recovery
|
||||||
|
- Use `--fields minimal` to reduce token usage (~60% fewer tokens)
|
||||||
|
- Use `-n` / `--limit` to control response size
|
||||||
|
- Use `-q` / `--quiet` to suppress progress bars and non-essential output
|
||||||
|
- Use `--color never` in non-TTY automation for ANSI-free output
|
||||||
|
- Use `-v` / `-vv` / `-vvv` for increasing verbosity (debug/trace logging)
|
||||||
|
- Use `--log-format json` for machine-readable log output to stderr
|
||||||
|
- TTY detection handles piped commands automatically
|
||||||
|
- Use `lore --robot health` as a fast pre-flight check before queries
|
||||||
|
- Use `lore robot-docs` for response schema discovery
|
||||||
|
- The `-p` flag supports fuzzy project matching (suffix and substring)
|
||||||
220
Cargo.lock
generated
220
Cargo.lock
generated
@@ -76,12 +76,6 @@ dependencies = [
|
|||||||
"windows-sys 0.61.2",
|
"windows-sys 0.61.2",
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
|
||||||
name = "arrayvec"
|
|
||||||
version = "0.7.6"
|
|
||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
|
||||||
checksum = "7c02d123df017efcdfbd739ef81735b36c5ba83ec3c59c80a9d7ecc718f92e50"
|
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "assert-json-diff"
|
name = "assert-json-diff"
|
||||||
version = "2.0.2"
|
version = "2.0.2"
|
||||||
@@ -211,6 +205,15 @@ dependencies = [
|
|||||||
"strsim",
|
"strsim",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "clap_complete"
|
||||||
|
version = "4.5.65"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "430b4dc2b5e3861848de79627b2bedc9f3342c7da5173a14eaa5d0f8dc18ae5d"
|
||||||
|
dependencies = [
|
||||||
|
"clap",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "clap_derive"
|
name = "clap_derive"
|
||||||
version = "4.5.49"
|
version = "4.5.49"
|
||||||
@@ -293,6 +296,21 @@ dependencies = [
|
|||||||
"cfg-if",
|
"cfg-if",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "crossbeam-channel"
|
||||||
|
version = "0.5.15"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "82b8f8f868b36967f9606790d1903570de9ceaf870a7bf9fbbd3016d636a2cb2"
|
||||||
|
dependencies = [
|
||||||
|
"crossbeam-utils",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "crossbeam-utils"
|
||||||
|
version = "0.8.21"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "d0a5c400df2834b80a4c3327b3aad3a4c4cd4de0629063962b03235697506a28"
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "crossterm"
|
name = "crossterm"
|
||||||
version = "0.29.0"
|
version = "0.29.0"
|
||||||
@@ -344,6 +362,15 @@ version = "0.1.4"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "092966b41edc516079bdf31ec78a2e0588d1d0c08f78b91d8307215928642b2b"
|
checksum = "092966b41edc516079bdf31ec78a2e0588d1d0c08f78b91d8307215928642b2b"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "deranged"
|
||||||
|
version = "0.5.5"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "ececcb659e7ba858fb4f10388c250a7252eb0a27373f1a72b8748afdd248e587"
|
||||||
|
dependencies = [
|
||||||
|
"powerfmt",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "dialoguer"
|
name = "dialoguer"
|
||||||
version = "0.12.0"
|
version = "0.12.0"
|
||||||
@@ -951,7 +978,6 @@ dependencies = [
|
|||||||
"portable-atomic",
|
"portable-atomic",
|
||||||
"unicode-width",
|
"unicode-width",
|
||||||
"unit-prefix",
|
"unit-prefix",
|
||||||
"vt100",
|
|
||||||
"web-time",
|
"web-time",
|
||||||
]
|
]
|
||||||
|
|
||||||
@@ -1080,30 +1106,36 @@ checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"
|
|||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "lore"
|
name = "lore"
|
||||||
version = "0.1.0"
|
version = "0.6.2"
|
||||||
dependencies = [
|
dependencies = [
|
||||||
"async-stream",
|
"async-stream",
|
||||||
"chrono",
|
"chrono",
|
||||||
"clap",
|
"clap",
|
||||||
|
"clap_complete",
|
||||||
"comfy-table",
|
"comfy-table",
|
||||||
"console",
|
"console",
|
||||||
"dialoguer",
|
"dialoguer",
|
||||||
"dirs",
|
"dirs",
|
||||||
"flate2",
|
"flate2",
|
||||||
"futures",
|
"futures",
|
||||||
|
"httpdate",
|
||||||
"indicatif",
|
"indicatif",
|
||||||
|
"libc",
|
||||||
"open",
|
"open",
|
||||||
|
"rand",
|
||||||
|
"regex",
|
||||||
"reqwest",
|
"reqwest",
|
||||||
"rusqlite",
|
"rusqlite",
|
||||||
"serde",
|
"serde",
|
||||||
"serde_json",
|
"serde_json",
|
||||||
"sha2",
|
"sha2",
|
||||||
"sqlite-vec",
|
"sqlite-vec",
|
||||||
|
"strsim",
|
||||||
"tempfile",
|
"tempfile",
|
||||||
"thiserror",
|
"thiserror",
|
||||||
"tokio",
|
"tokio",
|
||||||
"tracing",
|
"tracing",
|
||||||
"tracing-indicatif",
|
"tracing-appender",
|
||||||
"tracing-subscriber",
|
"tracing-subscriber",
|
||||||
"url",
|
"url",
|
||||||
"urlencoding",
|
"urlencoding",
|
||||||
@@ -1179,6 +1211,12 @@ dependencies = [
|
|||||||
"windows-sys 0.61.2",
|
"windows-sys 0.61.2",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "num-conv"
|
||||||
|
version = "0.2.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "cf97ec579c3c42f953ef76dbf8d55ac91fb219dde70e49aa4a6b7d74e9919050"
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "num-traits"
|
name = "num-traits"
|
||||||
version = "0.2.19"
|
version = "0.2.19"
|
||||||
@@ -1339,6 +1377,21 @@ dependencies = [
|
|||||||
"zerovec",
|
"zerovec",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "powerfmt"
|
||||||
|
version = "0.2.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "439ee305def115ba05938db6eb1644ff94165c5ab5e9420d1c1bcedbba909391"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "ppv-lite86"
|
||||||
|
version = "0.2.21"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "85eae3c4ed2f50dcfe72643da4befc30deadb458a9b590d720cde2f2b1e97da9"
|
||||||
|
dependencies = [
|
||||||
|
"zerocopy",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "proc-macro2"
|
name = "proc-macro2"
|
||||||
version = "1.0.106"
|
version = "1.0.106"
|
||||||
@@ -1363,6 +1416,36 @@ version = "5.3.0"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f"
|
checksum = "69cdb34c158ceb288df11e18b4bd39de994f6657d83847bdffdbd7f346754b0f"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "rand"
|
||||||
|
version = "0.8.5"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "34af8d1a0e25924bc5b7c43c079c942339d8f0a8b57c39049bef581b46327404"
|
||||||
|
dependencies = [
|
||||||
|
"libc",
|
||||||
|
"rand_chacha",
|
||||||
|
"rand_core",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "rand_chacha"
|
||||||
|
version = "0.3.1"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "e6c10a63a0fa32252be49d21e7709d4d4baf8d231c2dbce1eaa8141b9b127d88"
|
||||||
|
dependencies = [
|
||||||
|
"ppv-lite86",
|
||||||
|
"rand_core",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "rand_core"
|
||||||
|
version = "0.6.4"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "ec0be4795e2f6a28069bec0b5ff3e2ac9bafc99e6a9a7dc3547996c5c816922c"
|
||||||
|
dependencies = [
|
||||||
|
"getrandom 0.2.17",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "redox_syscall"
|
name = "redox_syscall"
|
||||||
version = "0.5.18"
|
version = "0.5.18"
|
||||||
@@ -1674,6 +1757,15 @@ version = "1.3.0"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
|
checksum = "0fda2ff0d084019ba4d7c6f371c95d8fd75ce3524c3cb8fb653a3023f6323e64"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "signal-hook-registry"
|
||||||
|
version = "1.4.5"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "9203b8055f63a2a00e2f593bb0510367fe707d7ff1e5c872de2f537b339e5410"
|
||||||
|
dependencies = [
|
||||||
|
"libc",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "simd-adler32"
|
name = "simd-adler32"
|
||||||
version = "0.3.8"
|
version = "0.3.8"
|
||||||
@@ -1835,6 +1927,37 @@ dependencies = [
|
|||||||
"cfg-if",
|
"cfg-if",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "time"
|
||||||
|
version = "0.3.46"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "9da98b7d9b7dad93488a84b8248efc35352b0b2657397d4167e7ad67e5d535e5"
|
||||||
|
dependencies = [
|
||||||
|
"deranged",
|
||||||
|
"itoa",
|
||||||
|
"num-conv",
|
||||||
|
"powerfmt",
|
||||||
|
"serde_core",
|
||||||
|
"time-core",
|
||||||
|
"time-macros",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "time-core"
|
||||||
|
version = "0.1.8"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "7694e1cfe791f8d31026952abf09c69ca6f6fa4e1a1229e18988f06a04a12dca"
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "time-macros"
|
||||||
|
version = "0.2.26"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "78cc610bac2dcee56805c99642447d4c5dbde4d01f752ffea0199aee1f601dc4"
|
||||||
|
dependencies = [
|
||||||
|
"num-conv",
|
||||||
|
"time-core",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "tinystr"
|
name = "tinystr"
|
||||||
version = "0.8.2"
|
version = "0.8.2"
|
||||||
@@ -1855,6 +1978,7 @@ dependencies = [
|
|||||||
"libc",
|
"libc",
|
||||||
"mio",
|
"mio",
|
||||||
"pin-project-lite",
|
"pin-project-lite",
|
||||||
|
"signal-hook-registry",
|
||||||
"socket2",
|
"socket2",
|
||||||
"tokio-macros",
|
"tokio-macros",
|
||||||
"windows-sys 0.61.2",
|
"windows-sys 0.61.2",
|
||||||
@@ -1960,6 +2084,18 @@ dependencies = [
|
|||||||
"tracing-core",
|
"tracing-core",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "tracing-appender"
|
||||||
|
version = "0.2.4"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "786d480bce6247ab75f005b14ae1624ad978d3029d9113f0a22fa1ac773faeaf"
|
||||||
|
dependencies = [
|
||||||
|
"crossbeam-channel",
|
||||||
|
"thiserror",
|
||||||
|
"time",
|
||||||
|
"tracing-subscriber",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "tracing-attributes"
|
name = "tracing-attributes"
|
||||||
version = "0.1.31"
|
version = "0.1.31"
|
||||||
@@ -1981,18 +2117,6 @@ dependencies = [
|
|||||||
"valuable",
|
"valuable",
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
|
||||||
name = "tracing-indicatif"
|
|
||||||
version = "0.3.14"
|
|
||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
|
||||||
checksum = "e1ef6990e0438749f0080573248e96631171a0b5ddfddde119aa5ba8c3a9c47e"
|
|
||||||
dependencies = [
|
|
||||||
"indicatif",
|
|
||||||
"tracing",
|
|
||||||
"tracing-core",
|
|
||||||
"tracing-subscriber",
|
|
||||||
]
|
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "tracing-log"
|
name = "tracing-log"
|
||||||
version = "0.2.0"
|
version = "0.2.0"
|
||||||
@@ -2004,6 +2128,16 @@ dependencies = [
|
|||||||
"tracing-core",
|
"tracing-core",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "tracing-serde"
|
||||||
|
version = "0.2.0"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "704b1aeb7be0d0a84fc9828cae51dab5970fee5088f83d1dd7ee6f6246fc6ff1"
|
||||||
|
dependencies = [
|
||||||
|
"serde",
|
||||||
|
"tracing-core",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "tracing-subscriber"
|
name = "tracing-subscriber"
|
||||||
version = "0.3.22"
|
version = "0.3.22"
|
||||||
@@ -2014,12 +2148,15 @@ dependencies = [
|
|||||||
"nu-ansi-term",
|
"nu-ansi-term",
|
||||||
"once_cell",
|
"once_cell",
|
||||||
"regex-automata",
|
"regex-automata",
|
||||||
|
"serde",
|
||||||
|
"serde_json",
|
||||||
"sharded-slab",
|
"sharded-slab",
|
||||||
"smallvec",
|
"smallvec",
|
||||||
"thread_local",
|
"thread_local",
|
||||||
"tracing",
|
"tracing",
|
||||||
"tracing-core",
|
"tracing-core",
|
||||||
"tracing-log",
|
"tracing-log",
|
||||||
|
"tracing-serde",
|
||||||
]
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
@@ -2123,27 +2260,6 @@ version = "0.9.5"
|
|||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
|
checksum = "0b928f33d975fc6ad9f86c8f283853ad26bdd5b10b7f1542aa2fa15e2289105a"
|
||||||
|
|
||||||
[[package]]
|
|
||||||
name = "vt100"
|
|
||||||
version = "0.16.2"
|
|
||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
|
||||||
checksum = "054ff75fb8fa83e609e685106df4faeffdf3a735d3c74ebce97ec557d5d36fd9"
|
|
||||||
dependencies = [
|
|
||||||
"itoa",
|
|
||||||
"unicode-width",
|
|
||||||
"vte",
|
|
||||||
]
|
|
||||||
|
|
||||||
[[package]]
|
|
||||||
name = "vte"
|
|
||||||
version = "0.15.0"
|
|
||||||
source = "registry+https://github.com/rust-lang/crates.io-index"
|
|
||||||
checksum = "a5924018406ce0063cd67f8e008104968b74b563ee1b85dde3ed1f7cb87d3dbd"
|
|
||||||
dependencies = [
|
|
||||||
"arrayvec",
|
|
||||||
"memchr",
|
|
||||||
]
|
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "want"
|
name = "want"
|
||||||
version = "0.3.1"
|
version = "0.3.1"
|
||||||
@@ -2553,6 +2669,26 @@ dependencies = [
|
|||||||
"synstructure",
|
"synstructure",
|
||||||
]
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "zerocopy"
|
||||||
|
version = "0.8.36"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "dafd85c832c1b68bbb4ec0c72c7f6f4fc5179627d2bc7c26b30e4c0cc11e76cc"
|
||||||
|
dependencies = [
|
||||||
|
"zerocopy-derive",
|
||||||
|
]
|
||||||
|
|
||||||
|
[[package]]
|
||||||
|
name = "zerocopy-derive"
|
||||||
|
version = "0.8.36"
|
||||||
|
source = "registry+https://github.com/rust-lang/crates.io-index"
|
||||||
|
checksum = "7cb7e4e8436d9db52fbd6625dbf2f45243ab84994a72882ec8227b99e72b439a"
|
||||||
|
dependencies = [
|
||||||
|
"proc-macro2",
|
||||||
|
"quote",
|
||||||
|
"syn",
|
||||||
|
]
|
||||||
|
|
||||||
[[package]]
|
[[package]]
|
||||||
name = "zerofrom"
|
name = "zerofrom"
|
||||||
version = "0.1.6"
|
version = "0.1.6"
|
||||||
|
|||||||
16
Cargo.toml
16
Cargo.toml
@@ -1,6 +1,6 @@
|
|||||||
[package]
|
[package]
|
||||||
name = "lore"
|
name = "lore"
|
||||||
version = "0.1.0"
|
version = "0.6.2"
|
||||||
edition = "2024"
|
edition = "2024"
|
||||||
description = "Gitlore - Local GitLab data management with semantic search"
|
description = "Gitlore - Local GitLab data management with semantic search"
|
||||||
authors = ["Taylor Eernisse"]
|
authors = ["Taylor Eernisse"]
|
||||||
@@ -21,6 +21,7 @@ serde_json = "1"
|
|||||||
|
|
||||||
# CLI
|
# CLI
|
||||||
clap = { version = "4", features = ["derive", "env"] }
|
clap = { version = "4", features = ["derive", "env"] }
|
||||||
|
clap_complete = "4"
|
||||||
dialoguer = "0.12"
|
dialoguer = "0.12"
|
||||||
console = "0.16"
|
console = "0.16"
|
||||||
indicatif = "0.18"
|
indicatif = "0.18"
|
||||||
@@ -29,7 +30,7 @@ open = "5"
|
|||||||
|
|
||||||
# HTTP
|
# HTTP
|
||||||
reqwest = { version = "0.12", features = ["json"] }
|
reqwest = { version = "0.12", features = ["json"] }
|
||||||
tokio = { version = "1", features = ["rt-multi-thread", "macros", "time"] }
|
tokio = { version = "1", features = ["rt-multi-thread", "macros", "time", "signal"] }
|
||||||
|
|
||||||
# Async streaming for pagination
|
# Async streaming for pagination
|
||||||
async-stream = "0.3"
|
async-stream = "0.3"
|
||||||
@@ -40,15 +41,22 @@ thiserror = "2"
|
|||||||
dirs = "6"
|
dirs = "6"
|
||||||
url = "2"
|
url = "2"
|
||||||
urlencoding = "2"
|
urlencoding = "2"
|
||||||
|
rand = "0.8"
|
||||||
sha2 = "0.10"
|
sha2 = "0.10"
|
||||||
flate2 = "1"
|
flate2 = "1"
|
||||||
chrono = { version = "0.4", features = ["serde"] }
|
chrono = { version = "0.4", features = ["serde"] }
|
||||||
|
httpdate = "1"
|
||||||
uuid = { version = "1", features = ["v4"] }
|
uuid = { version = "1", features = ["v4"] }
|
||||||
|
regex = "1"
|
||||||
|
strsim = "0.11"
|
||||||
|
|
||||||
|
[target.'cfg(unix)'.dependencies]
|
||||||
|
libc = "0.2"
|
||||||
|
|
||||||
# Logging
|
# Logging
|
||||||
tracing = "0.1"
|
tracing = "0.1"
|
||||||
tracing-subscriber = { version = "0.3", features = ["env-filter"] }
|
tracing-subscriber = { version = "0.3", features = ["env-filter", "json"] }
|
||||||
tracing-indicatif = "0.3"
|
tracing-appender = "0.2"
|
||||||
|
|
||||||
[dev-dependencies]
|
[dev-dependencies]
|
||||||
tempfile = "3"
|
tempfile = "3"
|
||||||
|
|||||||
467
PERFORMANCE_AUDIT.md
Normal file
467
PERFORMANCE_AUDIT.md
Normal file
@@ -0,0 +1,467 @@
|
|||||||
|
# Gitlore Performance Audit Report
|
||||||
|
|
||||||
|
**Date**: 2026-02-05
|
||||||
|
**Auditor**: Claude Code (Opus 4.5)
|
||||||
|
**Scope**: Core system performance - ingestion, embedding, search, and document regeneration
|
||||||
|
|
||||||
|
## Executive Summary
|
||||||
|
|
||||||
|
This audit identifies 12 high-impact optimization opportunities across the Gitlore codebase. The most significant findings center on:
|
||||||
|
|
||||||
|
1. **SQL query patterns** with N+1 issues and inefficient correlated subqueries
|
||||||
|
2. **Memory allocation patterns** in hot paths (embedding, chunking, ingestion)
|
||||||
|
3. **Change detection queries** using triple-EXISTS patterns instead of JOINs
|
||||||
|
|
||||||
|
**Estimated overall improvement potential**: 30-50% reduction in latency for filtered searches, 2-5x improvement in ingestion throughput for issues/MRs with many labels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Methodology
|
||||||
|
|
||||||
|
- **Codebase analysis**: Full read of all modules in `src/`
|
||||||
|
- **SQL pattern analysis**: All queries checked for N+1, missing indexes, unbounded results
|
||||||
|
- **Memory allocation analysis**: Clone patterns, unnecessary collections, missing capacity hints
|
||||||
|
- **Test baseline**: All tests pass (`cargo test --release`)
|
||||||
|
|
||||||
|
Note: Without access to a live GitLab instance or populated database, profiling is code-analysis based rather than runtime measured.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Opportunity Matrix
|
||||||
|
|
||||||
|
| ID | Issue | Location | Impact | Confidence | Effort | ICE Score | Status |
|
||||||
|
|----|-------|----------|--------|------------|--------|-----------|--------|
|
||||||
|
| 1 | Triple-EXISTS change detection | `change_detector.rs:19-46` | HIGH | 95% | LOW | **9.5** | **DONE** |
|
||||||
|
| 2 | N+1 label/assignee inserts | `issues.rs:270-285`, `merge_requests.rs:242-272` | HIGH | 95% | MEDIUM | **9.0** | Pending |
|
||||||
|
| 3 | Clone in embedding batch loop | `pipeline.rs:165` | HIGH | 90% | LOW | **9.0** | Pending |
|
||||||
|
| 4 | Correlated GROUP_CONCAT in list | `list.rs:341-348` | HIGH | 90% | MEDIUM | **8.5** | Pending |
|
||||||
|
| 5 | Multiple EXISTS per label filter | `filters.rs:100-107` | HIGH | 85% | MEDIUM | **8.0** | **DONE** |
|
||||||
|
| 6 | String allocation in chunking | `chunking.rs:7-49` | MEDIUM | 95% | MEDIUM | **7.5** | Pending |
|
||||||
|
| 7 | Multiple COUNT queries | `count.rs:44-56` | MEDIUM | 95% | LOW | **7.0** | **DONE** |
|
||||||
|
| 8 | Collect-then-concat pattern | `truncation.rs:60-61` | MEDIUM | 90% | LOW | **7.0** | **DONE** |
|
||||||
|
| 9 | Box<dyn ToSql> allocations | `filters.rs:67-135` | MEDIUM | 80% | HIGH | **6.0** | Pending |
|
||||||
|
| 10 | Missing Vec::with_capacity | `pipeline.rs:106`, multiple | LOW | 95% | LOW | **5.5** | **DONE** |
|
||||||
|
| 11 | FTS token collect-join | `fts.rs:26-41` | LOW | 90% | LOW | **5.0** | **DONE** |
|
||||||
|
| 12 | Transformer string clones | `merge_request.rs:51-77` | MEDIUM | 85% | HIGH | **5.0** | Pending |
|
||||||
|
|
||||||
|
ICE Score = (Impact x Confidence) / Effort, scaled 1-10
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Detailed Findings
|
||||||
|
|
||||||
|
### 1. Triple-EXISTS Change Detection Query (ICE: 9.5)
|
||||||
|
|
||||||
|
**Location**: `src/embedding/change_detector.rs:19-46`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```sql
|
||||||
|
SELECT d.id, d.content_text, d.content_hash
|
||||||
|
FROM documents d
|
||||||
|
WHERE d.id > ?1
|
||||||
|
AND (
|
||||||
|
NOT EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0)
|
||||||
|
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND em.document_hash != d.content_hash)
|
||||||
|
OR EXISTS (SELECT 1 FROM embedding_metadata em WHERE em.document_id = d.id AND em.chunk_index = 0 AND (...))
|
||||||
|
)
|
||||||
|
ORDER BY d.id
|
||||||
|
LIMIT ?2
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Three separate EXISTS subqueries, each scanning `embedding_metadata`. SQLite cannot short-circuit across OR'd EXISTS efficiently.
|
||||||
|
|
||||||
|
**Proposed Fix**:
|
||||||
|
```sql
|
||||||
|
SELECT d.id, d.content_text, d.content_hash
|
||||||
|
FROM documents d
|
||||||
|
LEFT JOIN embedding_metadata em
|
||||||
|
ON em.document_id = d.id AND em.chunk_index = 0
|
||||||
|
WHERE d.id > ?1
|
||||||
|
AND (
|
||||||
|
em.document_id IS NULL -- no embedding
|
||||||
|
OR em.document_hash != d.content_hash -- hash mismatch
|
||||||
|
OR em.chunk_max_bytes IS NULL
|
||||||
|
OR em.chunk_max_bytes != ?3
|
||||||
|
OR em.model != ?4
|
||||||
|
OR em.dims != ?5
|
||||||
|
)
|
||||||
|
ORDER BY d.id
|
||||||
|
LIMIT ?2
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Both queries return documents needing embedding when:
|
||||||
|
- No embedding exists for chunk_index=0 (NULL check)
|
||||||
|
- Hash changed (direct comparison)
|
||||||
|
- Config mismatch (model/dims/chunk_max_bytes)
|
||||||
|
|
||||||
|
The LEFT JOIN + NULL check is semantically identical to NOT EXISTS. The OR conditions inside WHERE match the EXISTS predicates exactly.
|
||||||
|
|
||||||
|
**Expected Impact**: 2-3x faster for large document sets. Single scan of embedding_metadata instead of three.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 2. N+1 Label/Assignee Inserts (ICE: 9.0)
|
||||||
|
|
||||||
|
**Location**:
|
||||||
|
- `src/ingestion/issues.rs:270-285`
|
||||||
|
- `src/ingestion/merge_requests.rs:242-272`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
for label_name in label_names {
|
||||||
|
let label_id = upsert_label_tx(tx, project_id, label_name, &mut labels_created)?;
|
||||||
|
link_issue_label_tx(tx, local_issue_id, label_id)?;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Each label triggers 2+ SQL statements. With 20 labels × 100 issues = 4000+ queries per batch.
|
||||||
|
|
||||||
|
**Proposed Fix**: Batch insert using prepared statements with multi-row VALUES:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Build batch: INSERT INTO issue_labels VALUES (?, ?), (?, ?), ...
|
||||||
|
let mut values = String::new();
|
||||||
|
let mut params: Vec<Box<dyn ToSql>> = Vec::with_capacity(label_ids.len() * 2);
|
||||||
|
for (i, label_id) in label_ids.iter().enumerate() {
|
||||||
|
if i > 0 { values.push_str(","); }
|
||||||
|
values.push_str("(?,?)");
|
||||||
|
params.push(Box::new(local_issue_id));
|
||||||
|
params.push(Box::new(*label_id));
|
||||||
|
}
|
||||||
|
let sql = format!("INSERT OR IGNORE INTO issue_labels (issue_id, label_id) VALUES {}", values);
|
||||||
|
```
|
||||||
|
|
||||||
|
Or use `prepare_cached()` pattern from `events_db.rs`.
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Both approaches insert identical rows. OR IGNORE handles duplicates identically.
|
||||||
|
|
||||||
|
**Expected Impact**: 5-10x faster ingestion for issues/MRs with many labels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 3. Clone in Embedding Batch Loop (ICE: 9.0)
|
||||||
|
|
||||||
|
**Location**: `src/embedding/pipeline.rs:165`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let texts: Vec<String> = batch.iter().map(|c| c.text.clone()).collect();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Every batch iteration clones all chunk texts. With BATCH_SIZE=32 and thousands of chunks, this doubles memory allocation in the hot path.
|
||||||
|
|
||||||
|
**Proposed Fix**: Transfer ownership instead of cloning:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Option A: Drain chunks from all_chunks instead of iterating
|
||||||
|
let texts: Vec<String> = batch.into_iter().map(|c| c.text).collect();
|
||||||
|
|
||||||
|
// Option B: Store references in ChunkWork, clone only at API boundary
|
||||||
|
struct ChunkWork<'a> {
|
||||||
|
text: &'a str,
|
||||||
|
// ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same texts sent to Ollama, same embeddings returned. Order and content identical.
|
||||||
|
|
||||||
|
**Expected Impact**: 30-50% reduction in embedding pipeline memory allocation.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 4. Correlated GROUP_CONCAT in List Queries (ICE: 8.5)
|
||||||
|
|
||||||
|
**Location**: `src/cli/commands/list.rs:341-348`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```sql
|
||||||
|
SELECT i.*,
|
||||||
|
(SELECT GROUP_CONCAT(l.name, X'1F') FROM issue_labels il JOIN labels l ... WHERE il.issue_id = i.id) AS labels_csv,
|
||||||
|
(SELECT COUNT(*) FROM discussions WHERE issue_id = i.id) as discussion_count
|
||||||
|
FROM issues i
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Each correlated subquery executes per row. With LIMIT 50, that's 100+ subquery executions.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use window functions or pre-aggregated CTEs:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WITH label_agg AS (
|
||||||
|
SELECT il.issue_id, GROUP_CONCAT(l.name, X'1F') AS labels_csv
|
||||||
|
FROM issue_labels il JOIN labels l ON il.label_id = l.id
|
||||||
|
GROUP BY il.issue_id
|
||||||
|
),
|
||||||
|
discussion_agg AS (
|
||||||
|
SELECT issue_id, COUNT(*) AS cnt
|
||||||
|
FROM discussions WHERE issue_id IS NOT NULL
|
||||||
|
GROUP BY issue_id
|
||||||
|
)
|
||||||
|
SELECT i.*, la.labels_csv, da.cnt
|
||||||
|
FROM issues i
|
||||||
|
LEFT JOIN label_agg la ON la.issue_id = i.id
|
||||||
|
LEFT JOIN discussion_agg da ON da.issue_id = i.id
|
||||||
|
WHERE ...
|
||||||
|
LIMIT 50
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same data returned - labels concatenated, discussion counts accurate. JOIN preserves NULL when no labels/discussions exist.
|
||||||
|
|
||||||
|
**Expected Impact**: 3-5x faster list queries with discussion/label data.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 5. Multiple EXISTS Per Label Filter (ICE: 8.0)
|
||||||
|
|
||||||
|
**Location**: `src/search/filters.rs:100-107`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```sql
|
||||||
|
WHERE EXISTS (SELECT 1 ... AND label_name = ?)
|
||||||
|
AND EXISTS (SELECT 1 ... AND label_name = ?)
|
||||||
|
AND EXISTS (SELECT 1 ... AND label_name = ?)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Filtering by 3 labels generates 3 EXISTS subqueries. Each scans document_labels.
|
||||||
|
|
||||||
|
**Proposed Fix**: Single EXISTS with GROUP BY/HAVING:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
WHERE EXISTS (
|
||||||
|
SELECT 1 FROM document_labels dl
|
||||||
|
WHERE dl.document_id = d.id
|
||||||
|
AND dl.label_name IN (?, ?, ?)
|
||||||
|
GROUP BY dl.document_id
|
||||||
|
HAVING COUNT(DISTINCT dl.label_name) = 3
|
||||||
|
)
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Both return documents with ALL specified labels. AND of EXISTS = document has label1 AND label2 AND label3. GROUP BY + HAVING COUNT(DISTINCT) = 3 is mathematically equivalent.
|
||||||
|
|
||||||
|
**Expected Impact**: 2-4x faster filtered search with multiple labels.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 6. String Allocation in Chunking (ICE: 7.5)
|
||||||
|
|
||||||
|
**Location**: `src/embedding/chunking.rs:7-49`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
chunks.push((chunk_index, remaining.to_string()));
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Converts `&str` slices to owned `String` for every chunk. The input is already a `&str`.
|
||||||
|
|
||||||
|
**Proposed Fix**: Return borrowed slices or use `Cow`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
pub fn split_into_chunks(content: &str) -> Vec<(usize, &str)> {
|
||||||
|
// Return slices into original content
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Or if ownership is needed later:
|
||||||
|
```rust
|
||||||
|
pub fn split_into_chunks(content: &str) -> Vec<(usize, Cow<'_, str>)>
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same chunk boundaries, same text content. Only allocation behavior changes.
|
||||||
|
|
||||||
|
**Expected Impact**: Reduces allocations by ~50% in chunking hot path.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 7. Multiple COUNT Queries (ICE: 7.0)
|
||||||
|
|
||||||
|
**Location**: `src/cli/commands/count.rs:44-56`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let count = conn.query_row("SELECT COUNT(*) FROM issues", ...)?;
|
||||||
|
let opened = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'opened'", ...)?;
|
||||||
|
let closed = conn.query_row("SELECT COUNT(*) FROM issues WHERE state = 'closed'", ...)?;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: 5 separate queries for MR state breakdown, 3 for issues.
|
||||||
|
|
||||||
|
**Proposed Fix**: Single query with CASE aggregation:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
COUNT(*) AS total,
|
||||||
|
SUM(CASE WHEN state = 'opened' THEN 1 ELSE 0 END) AS opened,
|
||||||
|
SUM(CASE WHEN state = 'closed' THEN 1 ELSE 0 END) AS closed
|
||||||
|
FROM issues
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Identical counts returned. CASE WHEN with SUM is standard SQL for conditional counting.
|
||||||
|
|
||||||
|
**Expected Impact**: 3-5x fewer round trips for count command.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 8. Collect-then-Concat Pattern (ICE: 7.0)
|
||||||
|
|
||||||
|
**Location**: `src/documents/truncation.rs:60-61`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let formatted: Vec<String> = notes.iter().map(format_note).collect();
|
||||||
|
let total: String = formatted.concat();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Allocates intermediate Vec<String>, then allocates again for concat.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use fold or format directly:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let total = notes.iter().fold(String::new(), |mut acc, note| {
|
||||||
|
acc.push_str(&format_note(note));
|
||||||
|
acc
|
||||||
|
});
|
||||||
|
```
|
||||||
|
|
||||||
|
Or with capacity hint:
|
||||||
|
```rust
|
||||||
|
let total_len: usize = notes.iter().map(|n| estimate_note_len(n)).sum();
|
||||||
|
let mut total = String::with_capacity(total_len);
|
||||||
|
for note in notes {
|
||||||
|
total.push_str(&format_note(note));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Isomorphism Proof**: Same concatenated string output. Order preserved.
|
||||||
|
|
||||||
|
**Expected Impact**: 50% reduction in allocations for document regeneration.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 9. Box<dyn ToSql> Allocations (ICE: 6.0)
|
||||||
|
|
||||||
|
**Location**: `src/search/filters.rs:67-135`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let mut params: Vec<Box<dyn rusqlite::types::ToSql>> = vec![Box::new(ids_json)];
|
||||||
|
// ... more Box::new() calls
|
||||||
|
let param_refs: Vec<&dyn rusqlite::types::ToSql> = params.iter().map(|p| p.as_ref()).collect();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Problem**: Boxing each parameter, then collecting references. Two allocations per parameter.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use rusqlite's params! macro or typed parameter arrays:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// For known parameter counts, use arrays
|
||||||
|
let params: [&dyn ToSql; 4] = [&ids_json, &author, &state, &limit];
|
||||||
|
|
||||||
|
// Or build SQL with named parameters and use params! directly
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Impact**: Eliminates ~15 allocations per filtered search.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 10. Missing Vec::with_capacity (ICE: 5.5)
|
||||||
|
|
||||||
|
**Locations**:
|
||||||
|
- `src/embedding/pipeline.rs:106`
|
||||||
|
- `src/embedding/pipeline.rs:162`
|
||||||
|
- Multiple other locations
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let mut all_chunks: Vec<ChunkWork> = Vec::new();
|
||||||
|
```
|
||||||
|
|
||||||
|
**Proposed Fix**:
|
||||||
|
```rust
|
||||||
|
// Estimate: average 3 chunks per document
|
||||||
|
let mut all_chunks = Vec::with_capacity(pending.len() * 3);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Impact**: Eliminates reallocation overhead during vector growth.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 11. FTS Token Collect-Join (ICE: 5.0)
|
||||||
|
|
||||||
|
**Location**: `src/search/fts.rs:26-41`
|
||||||
|
|
||||||
|
**Current Code**:
|
||||||
|
```rust
|
||||||
|
let tokens: Vec<String> = trimmed.split_whitespace().map(...).collect();
|
||||||
|
tokens.join(" ")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Proposed Fix**: Use itertools or avoid intermediate vec:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use itertools::Itertools;
|
||||||
|
trimmed.split_whitespace().map(...).join(" ")
|
||||||
|
```
|
||||||
|
|
||||||
|
**Expected Impact**: Minor - search queries are typically short.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### 12. Transformer String Clones (ICE: 5.0)
|
||||||
|
|
||||||
|
**Location**: `src/gitlab/transformers/merge_request.rs:51-77`
|
||||||
|
|
||||||
|
**Problem**: Multiple `.clone()` calls on String fields during transformation.
|
||||||
|
|
||||||
|
**Proposed Fix**: Use `std::mem::take()` where possible, or restructure to avoid cloning.
|
||||||
|
|
||||||
|
**Expected Impact**: Moderate - depends on MR volume.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Regression Guardrails
|
||||||
|
|
||||||
|
For any optimization implemented:
|
||||||
|
|
||||||
|
1. **Test Coverage**: All existing tests must pass
|
||||||
|
2. **Output Equivalence**: For SQL changes, verify identical result sets with test data
|
||||||
|
3. **Benchmark Suite**: Add benchmarks for affected paths before/after
|
||||||
|
|
||||||
|
Suggested benchmark targets:
|
||||||
|
```rust
|
||||||
|
#[bench] fn bench_change_detection_1k_docs(b: &mut Bencher) { ... }
|
||||||
|
#[bench] fn bench_label_insert_50_labels(b: &mut Bencher) { ... }
|
||||||
|
#[bench] fn bench_hybrid_search_filtered(b: &mut Bencher) { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Priority
|
||||||
|
|
||||||
|
**Phase 1 (Quick Wins)** - COMPLETE:
|
||||||
|
1. ~~Change detection query rewrite (#1)~~ **DONE**
|
||||||
|
2. ~~Multiple COUNT consolidation (#7)~~ **DONE**
|
||||||
|
3. ~~Collect-concat pattern (#8)~~ **DONE**
|
||||||
|
4. ~~Vec::with_capacity hints (#10)~~ **DONE**
|
||||||
|
5. ~~FTS token collect-join (#11)~~ **DONE**
|
||||||
|
6. ~~Multiple EXISTS per label (#5)~~ **DONE**
|
||||||
|
|
||||||
|
**Phase 2 (Medium Effort)**:
|
||||||
|
5. Embedding batch clone removal (#3)
|
||||||
|
6. Label filter EXISTS consolidation (#5)
|
||||||
|
7. Chunking string allocation (#6)
|
||||||
|
|
||||||
|
**Phase 3 (Higher Effort)**:
|
||||||
|
8. N+1 batch inserts (#2)
|
||||||
|
9. List query CTEs (#4)
|
||||||
|
10. Parameter boxing (#9)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Appendix: Test Baseline
|
||||||
|
|
||||||
|
```
|
||||||
|
cargo test --release
|
||||||
|
running 127 tests
|
||||||
|
test result: ok. 127 passed; 0 failed; 0 ignored; 0 measured; 0 filtered out
|
||||||
|
```
|
||||||
|
|
||||||
|
All tests pass. Any optimization must maintain this baseline.
|
||||||
745
README.md
745
README.md
@@ -1,6 +1,6 @@
|
|||||||
# Gitlore
|
# Gitlore
|
||||||
|
|
||||||
Local GitLab data management with semantic search. Syncs issues, MRs, discussions, and notes from GitLab to a local SQLite database for fast, offline-capable querying and filtering.
|
Local GitLab data management with semantic search, people intelligence, and temporal analysis. Syncs issues, MRs, discussions, notes, and work item statuses from GitLab to a local SQLite database for fast, offline-capable querying, filtering, hybrid search, chronological event reconstruction, and expert discovery.
|
||||||
|
|
||||||
## Features
|
## Features
|
||||||
|
|
||||||
@@ -8,9 +8,19 @@ Local GitLab data management with semantic search. Syncs issues, MRs, discussion
|
|||||||
- **Incremental sync**: Cursor-based sync only fetches changes since last sync
|
- **Incremental sync**: Cursor-based sync only fetches changes since last sync
|
||||||
- **Full re-sync**: Reset cursors and fetch all data from scratch when needed
|
- **Full re-sync**: Reset cursors and fetch all data from scratch when needed
|
||||||
- **Multi-project**: Track issues and MRs across multiple GitLab projects
|
- **Multi-project**: Track issues and MRs across multiple GitLab projects
|
||||||
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches
|
- **Rich filtering**: Filter by state, author, assignee, labels, milestone, due date, draft status, reviewer, branches, work item status
|
||||||
|
- **Hybrid search**: Combines FTS5 lexical search with Ollama-powered vector embeddings via Reciprocal Rank Fusion
|
||||||
|
- **People intelligence**: Expert discovery, workload analysis, review patterns, active discussions, and code ownership overlap
|
||||||
|
- **Timeline pipeline**: Reconstructs chronological event histories by combining search, graph traversal, and event aggregation across related entities
|
||||||
|
- **Git history linking**: Tracks merge and squash commit SHAs to connect MRs with git history
|
||||||
|
- **File change tracking**: Records which files each MR touches, enabling file-level history queries
|
||||||
- **Raw payload storage**: Preserves original GitLab API responses for debugging
|
- **Raw payload storage**: Preserves original GitLab API responses for debugging
|
||||||
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments
|
- **Discussion threading**: Full support for issue and MR discussions including inline code review comments
|
||||||
|
- **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
|
||||||
|
- **Work item status enrichment**: Fetches issue statuses (e.g., "To do", "In progress", "Done") from GitLab's GraphQL API with adaptive page sizing, color-coded display, and case-insensitive filtering
|
||||||
|
- **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
|
||||||
|
- **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
|
||||||
|
- **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing
|
||||||
|
|
||||||
## Installation
|
## Installation
|
||||||
|
|
||||||
@@ -32,25 +42,37 @@ cargo build --release
|
|||||||
lore init
|
lore init
|
||||||
|
|
||||||
# Verify authentication
|
# Verify authentication
|
||||||
lore auth-test
|
lore auth
|
||||||
|
|
||||||
# Sync issues from GitLab
|
# Sync everything from GitLab (issues + MRs + docs + embeddings)
|
||||||
lore ingest --type issues
|
lore sync
|
||||||
|
|
||||||
# Sync merge requests from GitLab
|
|
||||||
lore ingest --type mrs
|
|
||||||
|
|
||||||
# List recent issues
|
# List recent issues
|
||||||
lore list issues --limit 10
|
lore issues -n 10
|
||||||
|
|
||||||
# List open merge requests
|
# List open merge requests
|
||||||
lore list mrs --state opened
|
lore mrs -s opened
|
||||||
|
|
||||||
# Show issue details
|
# Show issue details
|
||||||
lore show issue 123 --project group/repo
|
lore issues 123
|
||||||
|
|
||||||
# Show MR details with discussions
|
# Show MR details with discussions
|
||||||
lore show mr 456 --project group/repo
|
lore mrs 456
|
||||||
|
|
||||||
|
# Search across all indexed data
|
||||||
|
lore search "authentication bug"
|
||||||
|
|
||||||
|
# Who knows about this code area?
|
||||||
|
lore who src/features/auth/
|
||||||
|
|
||||||
|
# What is @asmith working on?
|
||||||
|
lore who @asmith
|
||||||
|
|
||||||
|
# Timeline of events related to deployments
|
||||||
|
lore timeline "deployment"
|
||||||
|
|
||||||
|
# Robot mode (machine-readable JSON)
|
||||||
|
lore -J issues -n 5 | jq .
|
||||||
```
|
```
|
||||||
|
|
||||||
## Configuration
|
## Configuration
|
||||||
@@ -69,16 +91,24 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
|
|||||||
{ "path": "group/project" },
|
{ "path": "group/project" },
|
||||||
{ "path": "other-group/other-project" }
|
{ "path": "other-group/other-project" }
|
||||||
],
|
],
|
||||||
|
"defaultProject": "group/project",
|
||||||
"sync": {
|
"sync": {
|
||||||
"backfillDays": 14,
|
"backfillDays": 14,
|
||||||
"staleLockMinutes": 10,
|
"staleLockMinutes": 10,
|
||||||
"heartbeatIntervalSeconds": 30,
|
"heartbeatIntervalSeconds": 30,
|
||||||
"cursorRewindSeconds": 2,
|
"cursorRewindSeconds": 2,
|
||||||
"primaryConcurrency": 4,
|
"primaryConcurrency": 4,
|
||||||
"dependentConcurrency": 2
|
"dependentConcurrency": 2,
|
||||||
|
"fetchWorkItemStatus": true
|
||||||
},
|
},
|
||||||
"storage": {
|
"storage": {
|
||||||
"compressRawPayloads": true
|
"compressRawPayloads": true
|
||||||
|
},
|
||||||
|
"embedding": {
|
||||||
|
"provider": "ollama",
|
||||||
|
"model": "nomic-embed-text",
|
||||||
|
"baseUrl": "http://localhost:11434",
|
||||||
|
"concurrency": 4
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
@@ -87,15 +117,17 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
|
|||||||
|
|
||||||
| Section | Field | Default | Description |
|
| Section | Field | Default | Description |
|
||||||
|---------|-------|---------|-------------|
|
|---------|-------|---------|-------------|
|
||||||
| `gitlab` | `baseUrl` | — | GitLab instance URL (required) |
|
| `gitlab` | `baseUrl` | -- | GitLab instance URL (required) |
|
||||||
| `gitlab` | `tokenEnvVar` | `GITLAB_TOKEN` | Environment variable containing API token |
|
| `gitlab` | `tokenEnvVar` | `GITLAB_TOKEN` | Environment variable containing API token |
|
||||||
| `projects` | `path` | — | Project path (e.g., `group/project`) |
|
| `projects` | `path` | -- | Project path (e.g., `group/project`) |
|
||||||
|
| *(top-level)* | `defaultProject` | none | Fallback project path used when `-p` is omitted. Must match a configured project path (exact or suffix). CLI `-p` always overrides. |
|
||||||
| `sync` | `backfillDays` | `14` | Days to backfill on initial sync |
|
| `sync` | `backfillDays` | `14` | Days to backfill on initial sync |
|
||||||
| `sync` | `staleLockMinutes` | `10` | Minutes before sync lock considered stale |
|
| `sync` | `staleLockMinutes` | `10` | Minutes before sync lock considered stale |
|
||||||
| `sync` | `heartbeatIntervalSeconds` | `30` | Frequency of lock heartbeat updates |
|
| `sync` | `heartbeatIntervalSeconds` | `30` | Frequency of lock heartbeat updates |
|
||||||
| `sync` | `cursorRewindSeconds` | `2` | Seconds to rewind cursor for overlap safety |
|
| `sync` | `cursorRewindSeconds` | `2` | Seconds to rewind cursor for overlap safety |
|
||||||
| `sync` | `primaryConcurrency` | `4` | Concurrent GitLab requests for primary resources |
|
| `sync` | `primaryConcurrency` | `4` | Concurrent GitLab requests for primary resources |
|
||||||
| `sync` | `dependentConcurrency` | `2` | Concurrent requests for dependent resources |
|
| `sync` | `dependentConcurrency` | `2` | Concurrent requests for dependent resources |
|
||||||
|
| `sync` | `fetchWorkItemStatus` | `true` | Enrich issues with work item status via GraphQL (requires GitLab Premium/Ultimate) |
|
||||||
| `storage` | `dbPath` | `~/.local/share/lore/lore.db` | Database file path |
|
| `storage` | `dbPath` | `~/.local/share/lore/lore.db` | Database file path |
|
||||||
| `storage` | `backupDir` | `~/.local/share/lore/backups` | Backup directory |
|
| `storage` | `backupDir` | `~/.local/share/lore/backups` | Backup directory |
|
||||||
| `storage` | `compressRawPayloads` | `true` | Compress stored API responses with gzip |
|
| `storage` | `compressRawPayloads` | `true` | Compress stored API responses with gzip |
|
||||||
@@ -107,7 +139,7 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
|
|||||||
### Config File Resolution
|
### Config File Resolution
|
||||||
|
|
||||||
The config file is resolved in this order:
|
The config file is resolved in this order:
|
||||||
1. `--config` CLI flag
|
1. `--config` / `-c` CLI flag
|
||||||
2. `LORE_CONFIG_PATH` environment variable
|
2. `LORE_CONFIG_PATH` environment variable
|
||||||
3. `~/.config/lore/config.json` (XDG default)
|
3. `~/.config/lore/config.json` (XDG default)
|
||||||
4. `./lore.config.json` (local fallback for development)
|
4. `./lore.config.json` (local fallback for development)
|
||||||
@@ -116,7 +148,7 @@ The config file is resolved in this order:
|
|||||||
|
|
||||||
Create a personal access token with `read_api` scope:
|
Create a personal access token with `read_api` scope:
|
||||||
|
|
||||||
1. Go to GitLab → Settings → Access Tokens
|
1. Go to GitLab > Settings > Access Tokens
|
||||||
2. Create token with `read_api` scope
|
2. Create token with `read_api` scope
|
||||||
3. Export it: `export GITLAB_TOKEN=glpat-xxxxxxxxxxxx`
|
3. Export it: `export GITLAB_TOKEN=glpat-xxxxxxxxxxxx`
|
||||||
|
|
||||||
@@ -126,12 +158,348 @@ Create a personal access token with `read_api` scope:
|
|||||||
|----------|---------|----------|
|
|----------|---------|----------|
|
||||||
| `GITLAB_TOKEN` | GitLab API authentication token (name configurable via `gitlab.tokenEnvVar`) | Yes |
|
| `GITLAB_TOKEN` | GitLab API authentication token (name configurable via `gitlab.tokenEnvVar`) | Yes |
|
||||||
| `LORE_CONFIG_PATH` | Override config file location | No |
|
| `LORE_CONFIG_PATH` | Override config file location | No |
|
||||||
|
| `LORE_ROBOT` | Enable robot mode globally (set to `true` or `1`) | No |
|
||||||
| `XDG_CONFIG_HOME` | XDG Base Directory for config (fallback: `~/.config`) | No |
|
| `XDG_CONFIG_HOME` | XDG Base Directory for config (fallback: `~/.config`) | No |
|
||||||
| `XDG_DATA_HOME` | XDG Base Directory for data (fallback: `~/.local/share`) | No |
|
| `XDG_DATA_HOME` | XDG Base Directory for data (fallback: `~/.local/share`) | No |
|
||||||
|
| `NO_COLOR` | Disable color output when set (any value) | No |
|
||||||
|
| `CLICOLOR` | Standard color control (0 to disable) | No |
|
||||||
| `RUST_LOG` | Logging level filter (e.g., `lore=debug`) | No |
|
| `RUST_LOG` | Logging level filter (e.g., `lore=debug`) | No |
|
||||||
|
|
||||||
## Commands
|
## Commands
|
||||||
|
|
||||||
|
### `lore issues`
|
||||||
|
|
||||||
|
Query issues from local database, or show a specific issue.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore issues # Recent issues (default 50)
|
||||||
|
lore issues 123 # Show issue #123 with discussions
|
||||||
|
lore issues 123 -p group/repo # Disambiguate by project
|
||||||
|
lore issues -n 100 # More results
|
||||||
|
lore issues -s opened # Only open issues
|
||||||
|
lore issues -s closed # Only closed issues
|
||||||
|
lore issues -a username # By author (@ prefix optional)
|
||||||
|
lore issues -A username # By assignee (@ prefix optional)
|
||||||
|
lore issues -l bug # By label (AND logic)
|
||||||
|
lore issues -l bug -l urgent # Multiple labels
|
||||||
|
lore issues -m "v1.0" # By milestone title
|
||||||
|
lore issues --since 7d # Updated in last 7 days
|
||||||
|
lore issues --since 2w # Updated in last 2 weeks
|
||||||
|
lore issues --since 1m # Updated in last month
|
||||||
|
lore issues --since 2024-01-01 # Updated since date
|
||||||
|
lore issues --due-before 2024-12-31 # Due before date
|
||||||
|
lore issues --has-due # Only issues with due dates
|
||||||
|
lore issues --status "In progress" # By work item status (case-insensitive)
|
||||||
|
lore issues --status "To do" --status "In progress" # Multiple statuses (OR)
|
||||||
|
lore issues -p group/repo # Filter by project
|
||||||
|
lore issues --sort created --asc # Sort by created date, ascending
|
||||||
|
lore issues -o # Open first result in browser
|
||||||
|
|
||||||
|
# Field selection (robot mode)
|
||||||
|
lore -J issues --fields minimal # Compact: iid, title, state, updated_at_iso
|
||||||
|
lore -J issues --fields iid,title,labels,state # Custom fields
|
||||||
|
```
|
||||||
|
|
||||||
|
When listing, output includes: IID, title, state, status (when any issue has one), assignee, labels, and update time. Status values display with their configured color. In robot mode, the `--fields` flag controls which fields appear in the JSON response.
|
||||||
|
|
||||||
|
When showing a single issue (e.g., `lore issues 123`), output includes: title, description, state, work item status (with color and category), author, assignees, labels, milestone, due date, web URL, and threaded discussions.
|
||||||
|
|
||||||
|
#### Project Resolution
|
||||||
|
|
||||||
|
When `-p` / `--project` is omitted, the `defaultProject` from config is used as a fallback. If neither is set, results span all configured projects. When a project is specified (via `-p` or config default), it uses cascading match logic across all commands:
|
||||||
|
|
||||||
|
1. **Exact match**: `group/project`
|
||||||
|
2. **Case-insensitive**: `Group/Project`
|
||||||
|
3. **Suffix match**: `project` matches `group/project` (if unambiguous)
|
||||||
|
4. **Substring match**: `typescript` matches `vs/typescript-code` (if unambiguous)
|
||||||
|
|
||||||
|
If multiple projects match, an error lists the candidates with a hint to use the full path.
|
||||||
|
|
||||||
|
### `lore mrs`
|
||||||
|
|
||||||
|
Query merge requests from local database, or show a specific MR.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore mrs # Recent MRs (default 50)
|
||||||
|
lore mrs 456 # Show MR !456 with discussions
|
||||||
|
lore mrs 456 -p group/repo # Disambiguate by project
|
||||||
|
lore mrs -n 100 # More results
|
||||||
|
lore mrs -s opened # Only open MRs
|
||||||
|
lore mrs -s merged # Only merged MRs
|
||||||
|
lore mrs -s closed # Only closed MRs
|
||||||
|
lore mrs -s locked # Only locked MRs
|
||||||
|
lore mrs -s all # All states
|
||||||
|
lore mrs -a username # By author (@ prefix optional)
|
||||||
|
lore mrs -A username # By assignee (@ prefix optional)
|
||||||
|
lore mrs -r username # By reviewer (@ prefix optional)
|
||||||
|
lore mrs -d # Only draft/WIP MRs
|
||||||
|
lore mrs -D # Exclude draft MRs
|
||||||
|
lore mrs --target main # By target branch
|
||||||
|
lore mrs --source feature/foo # By source branch
|
||||||
|
lore mrs -l needs-review # By label (AND logic)
|
||||||
|
lore mrs --since 7d # Updated in last 7 days
|
||||||
|
lore mrs -p group/repo # Filter by project
|
||||||
|
lore mrs --sort created --asc # Sort by created date, ascending
|
||||||
|
lore mrs -o # Open first result in browser
|
||||||
|
|
||||||
|
# Field selection (robot mode)
|
||||||
|
lore -J mrs --fields minimal # Compact: iid, title, state, updated_at_iso
|
||||||
|
lore -J mrs --fields iid,title,draft,target_branch # Custom fields
|
||||||
|
```
|
||||||
|
|
||||||
|
When listing, output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.
|
||||||
|
|
||||||
|
When showing a single MR (e.g., `lore mrs 456`), output includes: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format `[src/file.ts:45]`.
|
||||||
|
|
||||||
|
### `lore search`
|
||||||
|
|
||||||
|
Search across indexed documents using hybrid (lexical + semantic), lexical-only, or semantic-only modes.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore search "authentication bug" # Hybrid search (default)
|
||||||
|
lore search "login flow" --mode lexical # FTS5 lexical only
|
||||||
|
lore search "login flow" --mode semantic # Vector similarity only
|
||||||
|
lore search "auth" --type issue # Filter by source type
|
||||||
|
lore search "auth" --type mr # MR documents only
|
||||||
|
lore search "auth" --type discussion # Discussion documents only
|
||||||
|
lore search "deploy" --author username # Filter by author
|
||||||
|
lore search "deploy" -p group/repo # Filter by project
|
||||||
|
lore search "deploy" --label backend # Filter by label (AND logic)
|
||||||
|
lore search "deploy" --path src/ # Filter by file path (trailing / for prefix)
|
||||||
|
lore search "deploy" --after 7d # Created after (7d, 2w, 1m, or YYYY-MM-DD)
|
||||||
|
lore search "deploy" --updated-after 2w # Updated after
|
||||||
|
lore search "deploy" -n 50 # Limit results (default 20, max 100)
|
||||||
|
lore search "deploy" --explain # Show ranking explanation per result
|
||||||
|
lore search "deploy" --fts-mode raw # Raw FTS5 query syntax (advanced)
|
||||||
|
```
|
||||||
|
|
||||||
|
The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. Use `raw` for advanced FTS5 query syntax (AND, OR, NOT, phrase matching, prefix queries).
|
||||||
|
|
||||||
|
Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.
|
||||||
|
|
||||||
|
### `lore who`
|
||||||
|
|
||||||
|
People intelligence: discover experts, analyze workloads, review patterns, active discussions, and code overlap.
|
||||||
|
|
||||||
|
#### Expert Mode
|
||||||
|
|
||||||
|
Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis).
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore who src/features/auth/ # Who knows about this directory?
|
||||||
|
lore who src/features/auth/login.ts # Who knows about this file?
|
||||||
|
lore who --path README.md # Root files need --path flag
|
||||||
|
lore who --path Makefile # Dotless root files too
|
||||||
|
lore who src/ --since 3m # Limit to recent 3 months
|
||||||
|
lore who src/ -p group/repo # Scope to project
|
||||||
|
```
|
||||||
|
|
||||||
|
The target is auto-detected as a path when it contains `/`. For root files without `/` (e.g., `README.md`), use the `--path` flag. Default time window: 6 months.
|
||||||
|
|
||||||
|
#### Workload Mode
|
||||||
|
|
||||||
|
See what someone is currently working on.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore who @asmith # Full workload summary
|
||||||
|
lore who @asmith -p group/repo # Scoped to one project
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows: assigned open issues, authored MRs, MRs under review, and unresolved discussions.
|
||||||
|
|
||||||
|
#### Reviews Mode
|
||||||
|
|
||||||
|
Analyze someone's code review patterns by area.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore who @asmith --reviews # Review activity breakdown
|
||||||
|
lore who @asmith --reviews --since 3m # Recent review patterns
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows: total DiffNotes, categorized by code area with percentage breakdown.
|
||||||
|
|
||||||
|
#### Active Mode
|
||||||
|
|
||||||
|
Surface unresolved discussions needing attention.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore who --active # Unresolved discussions (last 7 days)
|
||||||
|
lore who --active --since 30d # Wider time window
|
||||||
|
lore who --active -p group/repo # Scoped to project
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows: discussion threads with participants and last activity timestamps.
|
||||||
|
|
||||||
|
#### Overlap Mode
|
||||||
|
|
||||||
|
Find who else is touching a file or directory.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore who --overlap src/features/auth/ # Who else works here?
|
||||||
|
lore who --overlap src/lib.rs # Single file overlap
|
||||||
|
```
|
||||||
|
|
||||||
|
Shows: users with touch counts (author vs. review), linked MR references. Default time window: 6 months.
|
||||||
|
|
||||||
|
#### Common Flags
|
||||||
|
|
||||||
|
| Flag | Description |
|
||||||
|
|------|-------------|
|
||||||
|
| `-p` / `--project` | Scope to a project (fuzzy match) |
|
||||||
|
| `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
|
||||||
|
| `-n` / `--limit` | Max results per section (1-500, default 20) |
|
||||||
|
|
||||||
|
### `lore timeline`
|
||||||
|
|
||||||
|
Reconstruct a chronological timeline of events matching a keyword query. The pipeline discovers related entities through cross-reference graph traversal and assembles a unified, time-ordered event stream.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore timeline "deployment" # Events related to deployments
|
||||||
|
lore timeline "auth" -p group/repo # Scoped to a project
|
||||||
|
lore timeline "auth" --since 30d # Only recent events
|
||||||
|
lore timeline "migration" --depth 2 # Deeper cross-reference expansion
|
||||||
|
lore timeline "migration" --expand-mentions # Follow 'mentioned' edges (high fan-out)
|
||||||
|
lore timeline "deploy" -n 50 # Limit event count
|
||||||
|
lore timeline "auth" --max-seeds 5 # Fewer seed entities
|
||||||
|
```
|
||||||
|
|
||||||
|
#### Flags
|
||||||
|
|
||||||
|
| Flag | Default | Description |
|
||||||
|
|------|---------|-------------|
|
||||||
|
| `-p` / `--project` | all | Scope to a specific project (fuzzy match) |
|
||||||
|
| `--since` | none | Only events after this date (7d, 2w, 6m, YYYY-MM-DD) |
|
||||||
|
| `--depth` | `1` | Cross-reference expansion depth (0 = seeds only) |
|
||||||
|
| `--expand-mentions` | off | Also follow "mentioned" edges during expansion |
|
||||||
|
| `-n` / `--limit` | `100` | Maximum events to display |
|
||||||
|
| `--max-seeds` | `10` | Maximum seed entities from search |
|
||||||
|
| `--max-entities` | `50` | Maximum entities discovered via cross-references |
|
||||||
|
| `--max-evidence` | `10` | Maximum evidence notes included |
|
||||||
|
|
||||||
|
#### Pipeline Stages
|
||||||
|
|
||||||
|
1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents are ranked by BM25 relevance.
|
||||||
|
2. **HYDRATE** -- Evidence notes are extracted: the top FTS-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced.
|
||||||
|
3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and optionally "mentioned" references up to the configured depth.
|
||||||
|
4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking.
|
||||||
|
5. **RENDER** -- Events are formatted as human-readable text or structured JSON (robot mode).
|
||||||
|
|
||||||
|
#### Event Types
|
||||||
|
|
||||||
|
| Event | Description |
|
||||||
|
|-------|-------------|
|
||||||
|
| `Created` | Entity creation |
|
||||||
|
| `StateChanged` | State transitions (opened, closed, reopened) |
|
||||||
|
| `LabelAdded` | Label applied to entity |
|
||||||
|
| `LabelRemoved` | Label removed from entity |
|
||||||
|
| `MilestoneSet` | Milestone assigned |
|
||||||
|
| `MilestoneRemoved` | Milestone removed |
|
||||||
|
| `Merged` | MR merged (deduplicated against state events) |
|
||||||
|
| `NoteEvidence` | Discussion note matched by FTS, with snippet |
|
||||||
|
| `CrossReferenced` | Reference to another entity |
|
||||||
|
|
||||||
|
#### Unresolved References
|
||||||
|
|
||||||
|
When graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the output. This enables discovery of external dependencies and can inform future sync targets.
|
||||||
|
|
||||||
|
### `lore sync`
|
||||||
|
|
||||||
|
Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore sync # Full pipeline
|
||||||
|
lore sync --full # Reset cursors, fetch everything
|
||||||
|
lore sync --force # Override stale lock
|
||||||
|
lore sync --no-embed # Skip embedding step
|
||||||
|
lore sync --no-docs # Skip document regeneration
|
||||||
|
lore sync --no-events # Skip resource event fetching
|
||||||
|
lore sync --dry-run # Preview what would be synced
|
||||||
|
```
|
||||||
|
|
||||||
|
The sync command displays animated progress bars for each stage and outputs timing metrics on completion. In robot mode (`-J`), detailed stage timing is included in the JSON response.
|
||||||
|
|
||||||
|
### `lore ingest`
|
||||||
|
|
||||||
|
Sync data from GitLab to local database. Runs only the ingestion step (no doc generation or embeddings). For issue ingestion, this includes a status enrichment phase that fetches work item statuses via the GitLab GraphQL API.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore ingest # Ingest everything (issues + MRs)
|
||||||
|
lore ingest issues # Issues only (includes status enrichment)
|
||||||
|
lore ingest mrs # MRs only
|
||||||
|
lore ingest issues -p group/repo # Single project
|
||||||
|
lore ingest --force # Override stale lock
|
||||||
|
lore ingest --full # Full re-sync (reset cursors)
|
||||||
|
lore ingest --dry-run # Preview what would change
|
||||||
|
```
|
||||||
|
|
||||||
|
The `--full` flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:
|
||||||
|
- Assignee data or other fields were missing from earlier syncs
|
||||||
|
- You want to ensure complete data after schema changes
|
||||||
|
- Troubleshooting sync issues
|
||||||
|
|
||||||
|
Status enrichment uses adaptive page sizing (100 → 50 → 25 → 10) to handle GitLab GraphQL complexity limits. It gracefully handles instances without GraphQL support or Premium/Ultimate licensing. Disable via `sync.fetchWorkItemStatus: false` in config.
|
||||||
|
|
||||||
|
### `lore generate-docs`
|
||||||
|
|
||||||
|
Extract searchable documents from ingested issues, MRs, and discussions for the FTS5 index.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore generate-docs # Incremental (dirty items only)
|
||||||
|
lore generate-docs --full # Full rebuild
|
||||||
|
lore generate-docs -p group/repo # Single project
|
||||||
|
```
|
||||||
|
|
||||||
|
### `lore embed`
|
||||||
|
|
||||||
|
Generate vector embeddings for documents via Ollama. Requires Ollama running with the configured embedding model.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore embed # Embed new/changed documents
|
||||||
|
lore embed --full # Re-embed all documents (clears existing)
|
||||||
|
lore embed --retry-failed # Retry previously failed embeddings
|
||||||
|
```
|
||||||
|
|
||||||
|
### `lore count`
|
||||||
|
|
||||||
|
Count entities in local database.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore count issues # Total issues
|
||||||
|
lore count mrs # Total MRs (with state breakdown)
|
||||||
|
lore count discussions # Total discussions
|
||||||
|
lore count discussions --for issue # Issue discussions only
|
||||||
|
lore count discussions --for mr # MR discussions only
|
||||||
|
lore count notes # Total notes (system vs user breakdown)
|
||||||
|
lore count notes --for issue # Issue notes only
|
||||||
|
lore count events # Total resource events
|
||||||
|
lore count events --for issue # Issue events only
|
||||||
|
lore count events --for mr # MR events only
|
||||||
|
```
|
||||||
|
|
||||||
|
### `lore stats`
|
||||||
|
|
||||||
|
Show document and index statistics, with optional integrity checks.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore stats # Document and index statistics
|
||||||
|
lore stats --check # Run integrity checks
|
||||||
|
lore stats --check --repair # Repair integrity issues
|
||||||
|
lore stats --dry-run # Preview repairs without saving
|
||||||
|
```
|
||||||
|
|
||||||
|
### `lore status`
|
||||||
|
|
||||||
|
Show current sync state and watermarks.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore status
|
||||||
|
```
|
||||||
|
|
||||||
|
Displays:
|
||||||
|
- Last sync run details (status, timing)
|
||||||
|
- Cursor positions per project and resource type (issues and MRs)
|
||||||
|
- Data summary counts
|
||||||
|
|
||||||
### `lore init`
|
### `lore init`
|
||||||
|
|
||||||
Initialize configuration and database interactively.
|
Initialize configuration and database interactively.
|
||||||
@@ -142,12 +510,23 @@ lore init --force # Overwrite existing config
|
|||||||
lore init --non-interactive # Fail if prompts needed
|
lore init --non-interactive # Fail if prompts needed
|
||||||
```
|
```
|
||||||
|
|
||||||
### `lore auth-test`
|
When multiple projects are configured, `init` prompts whether to set a default project (used when `-p` is omitted). This can also be set via the `--default-project` flag.
|
||||||
|
|
||||||
|
In robot mode, `init` supports non-interactive setup via flags:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore -J init --gitlab-url https://gitlab.com \
|
||||||
|
--token-env-var GITLAB_TOKEN \
|
||||||
|
--projects "group/project,other/project" \
|
||||||
|
--default-project group/project
|
||||||
|
```
|
||||||
|
|
||||||
|
### `lore auth`
|
||||||
|
|
||||||
Verify GitLab authentication is working.
|
Verify GitLab authentication is working.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
lore auth-test
|
lore auth
|
||||||
# Authenticated as @username (Full Name)
|
# Authenticated as @username (Full Name)
|
||||||
# GitLab: https://gitlab.com
|
# GitLab: https://gitlab.com
|
||||||
```
|
```
|
||||||
@@ -157,8 +536,7 @@ lore auth-test
|
|||||||
Check environment health and configuration.
|
Check environment health and configuration.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
lore doctor # Human-readable output
|
lore doctor
|
||||||
lore doctor --json # JSON output for scripting
|
|
||||||
```
|
```
|
||||||
|
|
||||||
Checks performed:
|
Checks performed:
|
||||||
@@ -168,132 +546,6 @@ Checks performed:
|
|||||||
- Project accessibility
|
- Project accessibility
|
||||||
- Ollama connectivity (optional)
|
- Ollama connectivity (optional)
|
||||||
|
|
||||||
### `lore ingest`
|
|
||||||
|
|
||||||
Sync data from GitLab to local database.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
# Issues
|
|
||||||
lore ingest --type issues # Sync all projects
|
|
||||||
lore ingest --type issues --project group/repo # Single project
|
|
||||||
lore ingest --type issues --force # Override stale lock
|
|
||||||
lore ingest --type issues --full # Full re-sync (reset cursors)
|
|
||||||
|
|
||||||
# Merge Requests
|
|
||||||
lore ingest --type mrs # Sync all projects
|
|
||||||
lore ingest --type mrs --project group/repo # Single project
|
|
||||||
lore ingest --type mrs --full # Full re-sync (reset cursors)
|
|
||||||
```
|
|
||||||
|
|
||||||
The `--full` flag resets sync cursors and discussion watermarks, then fetches all data from scratch. Useful when:
|
|
||||||
- Assignee data or other fields were missing from earlier syncs
|
|
||||||
- You want to ensure complete data after schema changes
|
|
||||||
- Troubleshooting sync issues
|
|
||||||
|
|
||||||
### `lore list issues`
|
|
||||||
|
|
||||||
Query issues from local database.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
lore list issues # Recent issues (default 50)
|
|
||||||
lore list issues --limit 100 # More results
|
|
||||||
lore list issues --state opened # Only open issues
|
|
||||||
lore list issues --state closed # Only closed issues
|
|
||||||
lore list issues --author username # By author (@ prefix optional)
|
|
||||||
lore list issues --assignee username # By assignee (@ prefix optional)
|
|
||||||
lore list issues --label bug # By label (AND logic)
|
|
||||||
lore list issues --label bug --label urgent # Multiple labels
|
|
||||||
lore list issues --milestone "v1.0" # By milestone title
|
|
||||||
lore list issues --since 7d # Updated in last 7 days
|
|
||||||
lore list issues --since 2w # Updated in last 2 weeks
|
|
||||||
lore list issues --since 2024-01-01 # Updated since date
|
|
||||||
lore list issues --due-before 2024-12-31 # Due before date
|
|
||||||
lore list issues --has-due-date # Only issues with due dates
|
|
||||||
lore list issues --project group/repo # Filter by project
|
|
||||||
lore list issues --sort created --order asc # Sort options
|
|
||||||
lore list issues --open # Open first result in browser
|
|
||||||
lore list issues --json # JSON output
|
|
||||||
```
|
|
||||||
|
|
||||||
Output includes: IID, title, state, author, assignee, labels, and update time.
|
|
||||||
|
|
||||||
### `lore list mrs`
|
|
||||||
|
|
||||||
Query merge requests from local database.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
lore list mrs # Recent MRs (default 50)
|
|
||||||
lore list mrs --limit 100 # More results
|
|
||||||
lore list mrs --state opened # Only open MRs
|
|
||||||
lore list mrs --state merged # Only merged MRs
|
|
||||||
lore list mrs --state closed # Only closed MRs
|
|
||||||
lore list mrs --state locked # Only locked MRs
|
|
||||||
lore list mrs --state all # All states
|
|
||||||
lore list mrs --author username # By author (@ prefix optional)
|
|
||||||
lore list mrs --assignee username # By assignee (@ prefix optional)
|
|
||||||
lore list mrs --reviewer username # By reviewer (@ prefix optional)
|
|
||||||
lore list mrs --draft # Only draft/WIP MRs
|
|
||||||
lore list mrs --no-draft # Exclude draft MRs
|
|
||||||
lore list mrs --target-branch main # By target branch
|
|
||||||
lore list mrs --source-branch feature/foo # By source branch
|
|
||||||
lore list mrs --label needs-review # By label (AND logic)
|
|
||||||
lore list mrs --since 7d # Updated in last 7 days
|
|
||||||
lore list mrs --project group/repo # Filter by project
|
|
||||||
lore list mrs --sort created --order asc # Sort options
|
|
||||||
lore list mrs --open # Open first result in browser
|
|
||||||
lore list mrs --json # JSON output
|
|
||||||
```
|
|
||||||
|
|
||||||
Output includes: IID, title (with [DRAFT] prefix if applicable), state, author, assignee, labels, and update time.
|
|
||||||
|
|
||||||
### `lore show issue`
|
|
||||||
|
|
||||||
Display detailed issue information.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
lore show issue 123 # Show issue #123
|
|
||||||
lore show issue 123 --project group/repo # Disambiguate if needed
|
|
||||||
```
|
|
||||||
|
|
||||||
Shows: title, description, state, author, assignees, labels, milestone, due date, web URL, and threaded discussions.
|
|
||||||
|
|
||||||
### `lore show mr`
|
|
||||||
|
|
||||||
Display detailed merge request information.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
lore show mr 456 # Show MR !456
|
|
||||||
lore show mr 456 --project group/repo # Disambiguate if needed
|
|
||||||
```
|
|
||||||
|
|
||||||
Shows: title, description, state, draft status, author, assignees, reviewers, labels, source/target branches, merge status, web URL, and threaded discussions. Inline code review comments (DiffNotes) display file context in the format `[src/file.ts:45]`.
|
|
||||||
|
|
||||||
### `lore count`
|
|
||||||
|
|
||||||
Count entities in local database.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
lore count issues # Total issues
|
|
||||||
lore count mrs # Total MRs (with state breakdown)
|
|
||||||
lore count discussions # Total discussions
|
|
||||||
lore count discussions --type issue # Issue discussions only
|
|
||||||
lore count discussions --type mr # MR discussions only
|
|
||||||
lore count notes # Total notes (shows system vs user breakdown)
|
|
||||||
```
|
|
||||||
|
|
||||||
### `lore sync-status`
|
|
||||||
|
|
||||||
Show current sync state and watermarks.
|
|
||||||
|
|
||||||
```bash
|
|
||||||
lore sync-status
|
|
||||||
```
|
|
||||||
|
|
||||||
Displays:
|
|
||||||
- Last sync run details (status, timing)
|
|
||||||
- Cursor positions per project and resource type (issues and MRs)
|
|
||||||
- Data summary counts
|
|
||||||
|
|
||||||
### `lore migrate`
|
### `lore migrate`
|
||||||
|
|
||||||
Run pending database migrations.
|
Run pending database migrations.
|
||||||
@@ -302,35 +554,170 @@ Run pending database migrations.
|
|||||||
lore migrate
|
lore migrate
|
||||||
```
|
```
|
||||||
|
|
||||||
Shows current schema version and applies any pending migrations.
|
### `lore health`
|
||||||
|
|
||||||
|
Quick pre-flight check for config, database, and schema version. Exits 0 if healthy, 19 if unhealthy.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore health
|
||||||
|
```
|
||||||
|
|
||||||
|
Useful as a fast gate before running queries or syncs. For a more thorough check including authentication and project access, use `lore doctor`.
|
||||||
|
|
||||||
|
### `lore robot-docs`
|
||||||
|
|
||||||
|
Machine-readable command manifest for agent self-discovery. Returns a JSON schema of all commands, flags, exit codes, and example workflows.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore robot-docs # Pretty-printed JSON
|
||||||
|
lore --robot robot-docs # Compact JSON for parsing
|
||||||
|
```
|
||||||
|
|
||||||
### `lore version`
|
### `lore version`
|
||||||
|
|
||||||
Show version information.
|
Show version information including the git commit hash.
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
lore version
|
lore version
|
||||||
|
# lore version 0.1.0 (abc1234)
|
||||||
```
|
```
|
||||||
|
|
||||||
### `lore backup`
|
## Robot Mode
|
||||||
|
|
||||||
Create timestamped database backup.
|
Machine-readable JSON output for scripting and AI agent consumption. All responses use compact (single-line) JSON with a uniform envelope and timing metadata.
|
||||||
|
|
||||||
|
### Activation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
lore backup
|
# Global flag
|
||||||
|
lore --robot issues -n 5
|
||||||
|
|
||||||
|
# JSON shorthand (-J)
|
||||||
|
lore -J issues -n 5
|
||||||
|
|
||||||
|
# Environment variable
|
||||||
|
LORE_ROBOT=1 lore issues -n 5
|
||||||
|
|
||||||
|
# Auto-detection (when stdout is not a TTY)
|
||||||
|
lore issues -n 5 | jq .
|
||||||
```
|
```
|
||||||
|
|
||||||
*Note: Not yet implemented.*
|
### Response Format
|
||||||
|
|
||||||
### `lore reset`
|
All commands return a consistent JSON envelope to stdout:
|
||||||
|
|
||||||
Delete database and reset all state.
|
```json
|
||||||
|
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
|
||||||
|
```
|
||||||
|
|
||||||
|
Every response includes `meta.elapsed_ms` (wall-clock milliseconds for the command).
|
||||||
|
|
||||||
|
Errors return structured JSON to stderr with machine-actionable recovery steps:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"error":{"code":"CONFIG_NOT_FOUND","message":"...","suggestion":"Run 'lore init'","actions":["lore init"]}}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `actions` array contains executable shell commands an agent can run to recover from the error. It is omitted when empty (e.g., for generic I/O errors).
|
||||||
|
|
||||||
|
### Field Selection
|
||||||
|
|
||||||
|
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response, reducing token usage for AI agent workflows:
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
lore reset --confirm
|
# Minimal preset (~60% fewer tokens)
|
||||||
|
lore -J issues --fields minimal
|
||||||
|
|
||||||
|
# Custom field list
|
||||||
|
lore -J issues --fields iid,title,state,labels,updated_at_iso
|
||||||
|
|
||||||
|
# Available presets
|
||||||
|
# minimal: iid, title, state, updated_at_iso
|
||||||
```
|
```
|
||||||
|
|
||||||
*Note: Not yet implemented.*
|
Valid fields for issues: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at_iso`
|
||||||
|
|
||||||
|
Valid fields for MRs: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`
|
||||||
|
|
||||||
|
### Agent Self-Discovery
|
||||||
|
|
||||||
|
The `robot-docs` command provides a complete machine-readable manifest including response schemas for every command:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore robot-docs | jq '.data.commands.issues.response_schema'
|
||||||
|
```
|
||||||
|
|
||||||
|
Each command entry includes `response_schema` describing the shape of its JSON response, `fields_presets` for commands supporting `--fields`, and copy-paste `example` invocations.
|
||||||
|
|
||||||
|
### Exit Codes
|
||||||
|
|
||||||
|
| Code | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| 0 | Success |
|
||||||
|
| 1 | Internal error / health check failed / not implemented |
|
||||||
|
| 2 | Usage error (invalid flags or arguments) |
|
||||||
|
| 3 | Config invalid |
|
||||||
|
| 4 | Token not set |
|
||||||
|
| 5 | GitLab auth failed |
|
||||||
|
| 6 | Resource not found |
|
||||||
|
| 7 | Rate limited |
|
||||||
|
| 8 | Network error |
|
||||||
|
| 9 | Database locked |
|
||||||
|
| 10 | Database error |
|
||||||
|
| 11 | Migration failed |
|
||||||
|
| 12 | I/O error |
|
||||||
|
| 13 | Transform error |
|
||||||
|
| 14 | Ollama unavailable |
|
||||||
|
| 15 | Ollama model not found |
|
||||||
|
| 16 | Embedding failed |
|
||||||
|
| 17 | Not found (entity does not exist) |
|
||||||
|
| 18 | Ambiguous match (use `-p` to specify project) |
|
||||||
|
| 19 | Health check failed |
|
||||||
|
| 20 | Config not found |
|
||||||
|
|
||||||
|
## Configuration Precedence
|
||||||
|
|
||||||
|
Settings are resolved in this order (highest to lowest priority):
|
||||||
|
|
||||||
|
1. CLI flags (`--robot`, `--config`, `--color`)
|
||||||
|
2. Environment variables (`LORE_ROBOT`, `GITLAB_TOKEN`, `LORE_CONFIG_PATH`)
|
||||||
|
3. Config file (`~/.config/lore/config.json`)
|
||||||
|
4. Built-in defaults
|
||||||
|
|
||||||
|
## Global Options
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore -c /path/to/config.json <command> # Use alternate config
|
||||||
|
lore --robot <command> # Machine-readable JSON
|
||||||
|
lore -J <command> # JSON shorthand
|
||||||
|
lore --color never <command> # Disable color output
|
||||||
|
lore --color always <command> # Force color output
|
||||||
|
lore -q <command> # Suppress non-essential output
|
||||||
|
lore -v <command> # Debug logging
|
||||||
|
lore -vv <command> # More verbose debug logging
|
||||||
|
lore -vvv <command> # Trace-level logging
|
||||||
|
lore --log-format json <command> # JSON-formatted log output to stderr
|
||||||
|
```
|
||||||
|
|
||||||
|
Color output respects `NO_COLOR` and `CLICOLOR` environment variables in `auto` mode (the default).
|
||||||
|
|
||||||
|
## Shell Completions
|
||||||
|
|
||||||
|
Generate shell completions for tab-completion support:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Bash (add to ~/.bashrc)
|
||||||
|
lore completions bash > ~/.local/share/bash-completion/completions/lore
|
||||||
|
|
||||||
|
# Zsh (add to ~/.zshrc: fpath=(~/.zfunc $fpath))
|
||||||
|
lore completions zsh > ~/.zfunc/_lore
|
||||||
|
|
||||||
|
# Fish
|
||||||
|
lore completions fish > ~/.config/fish/completions/lore.fish
|
||||||
|
|
||||||
|
# PowerShell (add to $PROFILE)
|
||||||
|
lore completions powershell >> $PROFILE
|
||||||
|
```
|
||||||
|
|
||||||
## Database Schema
|
## Database Schema
|
||||||
|
|
||||||
@@ -339,8 +726,8 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
|
|||||||
| Table | Purpose |
|
| Table | Purpose |
|
||||||
|-------|---------|
|
|-------|---------|
|
||||||
| `projects` | Tracked GitLab projects with metadata |
|
| `projects` | Tracked GitLab projects with metadata |
|
||||||
| `issues` | Issue metadata (title, state, author, due date, milestone) |
|
| `issues` | Issue metadata (title, state, author, due date, milestone, work item status) |
|
||||||
| `merge_requests` | MR metadata (title, state, draft, branches, merge status) |
|
| `merge_requests` | MR metadata (title, state, draft, branches, merge status, commit SHAs) |
|
||||||
| `milestones` | Project milestones with state and due dates |
|
| `milestones` | Project milestones with state and due dates |
|
||||||
| `labels` | Project labels with colors |
|
| `labels` | Project labels with colors |
|
||||||
| `issue_labels` | Many-to-many issue-label relationships |
|
| `issue_labels` | Many-to-many issue-label relationships |
|
||||||
@@ -348,8 +735,18 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
|
|||||||
| `mr_labels` | Many-to-many MR-label relationships |
|
| `mr_labels` | Many-to-many MR-label relationships |
|
||||||
| `mr_assignees` | Many-to-many MR-assignee relationships |
|
| `mr_assignees` | Many-to-many MR-assignee relationships |
|
||||||
| `mr_reviewers` | Many-to-many MR-reviewer relationships |
|
| `mr_reviewers` | Many-to-many MR-reviewer relationships |
|
||||||
|
| `mr_file_changes` | Files touched by each MR (path, change type, renames) |
|
||||||
| `discussions` | Issue/MR discussion threads |
|
| `discussions` | Issue/MR discussion threads |
|
||||||
| `notes` | Individual notes within discussions (with system note flag and DiffNote position data) |
|
| `notes` | Individual notes within discussions (with system note flag and DiffNote position data) |
|
||||||
|
| `resource_state_events` | Issue/MR state change history (opened, closed, merged, reopened) |
|
||||||
|
| `resource_label_events` | Label add/remove events with actor and timestamp |
|
||||||
|
| `resource_milestone_events` | Milestone add/remove events with actor and timestamp |
|
||||||
|
| `entity_references` | Cross-references between entities (MR closes issue, mentioned in, etc.) |
|
||||||
|
| `documents` | Extracted searchable text for FTS and embedding |
|
||||||
|
| `documents_fts` | FTS5 full-text search index |
|
||||||
|
| `embeddings` | Vector embeddings for semantic search |
|
||||||
|
| `dirty_sources` | Entities needing document regeneration after ingest |
|
||||||
|
| `pending_discussion_fetches` | Queue for discussion fetch operations |
|
||||||
| `sync_runs` | Audit trail of sync operations |
|
| `sync_runs` | Audit trail of sync operations |
|
||||||
| `sync_cursors` | Cursor positions for incremental sync |
|
| `sync_cursors` | Cursor positions for incremental sync |
|
||||||
| `app_locks` | Crash-safe single-flight lock |
|
| `app_locks` | Crash-safe single-flight lock |
|
||||||
@@ -358,12 +755,6 @@ Data is stored in SQLite with WAL mode and foreign keys enabled. Main tables:
|
|||||||
|
|
||||||
The database is stored at `~/.local/share/lore/lore.db` by default (XDG compliant).
|
The database is stored at `~/.local/share/lore/lore.db` by default (XDG compliant).
|
||||||
|
|
||||||
## Global Options
|
|
||||||
|
|
||||||
```bash
|
|
||||||
lore --config /path/to/config.json <command> # Use alternate config
|
|
||||||
```
|
|
||||||
|
|
||||||
## Development
|
## Development
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
@@ -371,10 +762,10 @@ lore --config /path/to/config.json <command> # Use alternate config
|
|||||||
cargo test
|
cargo test
|
||||||
|
|
||||||
# Run with debug logging
|
# Run with debug logging
|
||||||
RUST_LOG=lore=debug lore list issues
|
RUST_LOG=lore=debug lore issues
|
||||||
|
|
||||||
# Run with trace logging
|
# Run with trace logging
|
||||||
RUST_LOG=lore=trace lore ingest --type issues
|
RUST_LOG=lore=trace lore ingest issues
|
||||||
|
|
||||||
# Check formatting
|
# Check formatting
|
||||||
cargo fmt --check
|
cargo fmt --check
|
||||||
@@ -386,7 +777,8 @@ cargo clippy
|
|||||||
## Tech Stack
|
## Tech Stack
|
||||||
|
|
||||||
- **Rust** (2024 edition)
|
- **Rust** (2024 edition)
|
||||||
- **SQLite** via rusqlite (bundled)
|
- **SQLite** via rusqlite (bundled) with FTS5 and sqlite-vec
|
||||||
|
- **Ollama** for vector embeddings (nomic-embed-text)
|
||||||
- **clap** for CLI parsing
|
- **clap** for CLI parsing
|
||||||
- **reqwest** for HTTP
|
- **reqwest** for HTTP
|
||||||
- **tokio** for async runtime
|
- **tokio** for async runtime
|
||||||
@@ -394,23 +786,6 @@ cargo clippy
|
|||||||
- **tracing** for logging
|
- **tracing** for logging
|
||||||
- **indicatif** for progress bars
|
- **indicatif** for progress bars
|
||||||
|
|
||||||
## Current Status
|
|
||||||
|
|
||||||
This is Checkpoint 2 (CP2) of the Gitlore project. Currently implemented:
|
|
||||||
|
|
||||||
- Issue ingestion with cursor-based incremental sync
|
|
||||||
- Merge request ingestion with cursor-based incremental sync
|
|
||||||
- Discussion and note syncing for issues and MRs
|
|
||||||
- DiffNote support for inline code review comments
|
|
||||||
- Rich filtering and querying for both issues and MRs
|
|
||||||
- Full re-sync capability with watermark reset
|
|
||||||
|
|
||||||
Not yet implemented:
|
|
||||||
- Semantic search with embeddings (CP3+)
|
|
||||||
- Backup and reset commands
|
|
||||||
|
|
||||||
See [SPEC.md](SPEC.md) for the full project roadmap and architecture.
|
|
||||||
|
|
||||||
## License
|
## License
|
||||||
|
|
||||||
MIT
|
MIT
|
||||||
|
|||||||
1654
api-review.html
Normal file
1654
api-review.html
Normal file
File diff suppressed because it is too large
Load Diff
21
build.rs
Normal file
21
build.rs
Normal file
@@ -0,0 +1,21 @@
|
|||||||
|
fn main() {
|
||||||
|
let hash = std::process::Command::new("git")
|
||||||
|
.args(["rev-parse", "--short", "HEAD"])
|
||||||
|
.output()
|
||||||
|
.ok()
|
||||||
|
.and_then(|o| String::from_utf8(o.stdout).ok())
|
||||||
|
.unwrap_or_default();
|
||||||
|
let hash = hash.trim();
|
||||||
|
println!("cargo:rustc-env=GIT_HASH={hash}");
|
||||||
|
|
||||||
|
// Combined version string for clap --version flag
|
||||||
|
let pkg_version = std::env::var("CARGO_PKG_VERSION").unwrap_or_default();
|
||||||
|
if hash.is_empty() {
|
||||||
|
println!("cargo:rustc-env=LORE_VERSION={pkg_version}");
|
||||||
|
} else {
|
||||||
|
println!("cargo:rustc-env=LORE_VERSION={pkg_version} ({hash})");
|
||||||
|
}
|
||||||
|
|
||||||
|
println!("cargo:rerun-if-changed=.git/HEAD");
|
||||||
|
println!("cargo:rerun-if-changed=.git/refs/heads");
|
||||||
|
}
|
||||||
366
docs/api-efficiency-findings.md
Normal file
366
docs/api-efficiency-findings.md
Normal file
@@ -0,0 +1,366 @@
|
|||||||
|
---
|
||||||
|
plan: true
|
||||||
|
title: "api-efficiency-findings"
|
||||||
|
status: drafting
|
||||||
|
iteration: 0
|
||||||
|
target_iterations: 8
|
||||||
|
beads_revision: 0
|
||||||
|
related_plans: []
|
||||||
|
created: 2026-02-07
|
||||||
|
updated: 2026-02-07
|
||||||
|
---
|
||||||
|
|
||||||
|
# API Efficiency & Observability Findings
|
||||||
|
|
||||||
|
> **Status:** Draft - working through items
|
||||||
|
> **Context:** Audit of gitlore's GitLab API usage, data processing, and observability gaps
|
||||||
|
> **Interactive reference:** `api-review.html` (root of repo, open in browser)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Checkpoint 3 Alignment
|
||||||
|
|
||||||
|
Checkpoint 3 (`docs/prd/checkpoint-3.md`) introduces `lore sync` orchestration, document generation, and search. Several findings here overlap with that work. This section maps the relationship so effort isn't duplicated and so CP3 implementation can absorb the right instrumentation as it's built.
|
||||||
|
|
||||||
|
### Direct overlaps (CP3 partially addresses)
|
||||||
|
|
||||||
|
| Finding | CP3 coverage | Remaining gap |
|
||||||
|
|---------|-------------|---------------|
|
||||||
|
| **P0-1** sync_runs never written | `lore sync` step 7 says "record sync_run". `SyncResult` struct defined with counts. | Only covers the new `lore sync` command. Existing `lore ingest` still won't write sync_runs. Either instrument `lore ingest` separately or have `lore sync` subsume it entirely. |
|
||||||
|
| **P0-2** No timing | `print_sync` captures wall-clock `elapsed_secs` / `elapsed_ms` in robot mode JSON `meta` envelope. | Wall-clock only. No per-phase, per-API-call, or per-DB-write breakdown. The `SyncResult` struct has counts but no duration fields. |
|
||||||
|
| **P2-1** Discussion full-refresh | CP3 introduces `pending_discussion_fetches` queue with exponential backoff and bounded processing per sync. Structures the work better. | Same full-refresh strategy per entity. The queue adds retry resilience but doesn't reduce the number of API calls for unchanged discussions. |
|
||||||
|
|
||||||
|
### Different scope (complementary, no overlap)
|
||||||
|
|
||||||
|
| Finding | Why no overlap |
|
||||||
|
|---------|---------------|
|
||||||
|
| **P0-3** metrics_json schema | CP3 doesn't reference the `metrics_json` column. `SyncResult` is printed/returned but not persisted there. |
|
||||||
|
| **P0-4** Discussion sync telemetry columns | CP3's queue system (`pending_discussion_fetches`) is a replacement architecture. The existing per-MR telemetry columns (`discussions_sync_attempts`, `_last_error`) aren't referenced in CP3. Decide: use CP3's queue table or wire up the existing columns? |
|
||||||
|
| **P0-5** Progress events lack timing | CP3 lists "Progress visible during long syncs" as acceptance criteria but doesn't spec timing in events. |
|
||||||
|
| **P1-\*** Free data capture | CP3 doesn't touch GitLab API response field coverage at all. These are independent. |
|
||||||
|
| **P2-2** Keyset pagination (GitLab API) | CP3 uses keyset pagination for local SQLite queries (document seeding, embedding pipelines). Completely different from using GitLab API keyset pagination. |
|
||||||
|
| **P2-3** ETags | Not mentioned in CP3. |
|
||||||
|
| **P2-4** Labels enrichment | Not mentioned in CP3. |
|
||||||
|
| **P3-\*** Structural improvements | Not in CP3 scope. |
|
||||||
|
|
||||||
|
### Recommendation
|
||||||
|
|
||||||
|
CP3's `lore sync` orchestrator is the natural integration point for P0 instrumentation. Rather than retrofitting `lore ingest` separately, the most efficient path is:
|
||||||
|
|
||||||
|
1. Build P0 timing instrumentation as a reusable layer (e.g., a `SyncMetrics` struct that accumulates phase timings)
|
||||||
|
2. Wire it into the CP3 `run_sync` implementation as it's built
|
||||||
|
3. Have `run_sync` persist the full metrics (counts + timing) to `sync_runs.metrics_json`
|
||||||
|
4. Decide whether `lore ingest` becomes a thin wrapper around `lore sync --no-docs --no-embed` or stays separate with its own sync_runs recording
|
||||||
|
|
||||||
|
This avoids building instrumentation twice and ensures the new sync pipeline is observable from day one.
|
||||||
|
|
||||||
|
### Decision: `lore ingest` goes away
|
||||||
|
|
||||||
|
`lore sync` becomes the single command for all data fetching. First run does a full fetch (equivalent to today's `lore ingest`), subsequent runs are incremental via cursors. `lore ingest` becomes a hidden deprecated alias.
|
||||||
|
|
||||||
|
Implications:
|
||||||
|
- P0 instrumentation only needs to be built in one place (`run_sync`)
|
||||||
|
- CP3 Gate C owns the sync_runs lifecycle end-to-end
|
||||||
|
- The existing `lore ingest issues` / `lore ingest mrs` code becomes internal functions called by `run_sync`, not standalone CLI commands
|
||||||
|
- `lore sync` always syncs everything: issues, MRs, discussions, documents, embeddings (with `--no-embed` / `--no-docs` to opt out of later stages)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Sequence
|
||||||
|
|
||||||
|
### Phase A: Before CP3 (independent, enriches data model)
|
||||||
|
|
||||||
|
**Do first.** Migration + struct changes only. No architectural dependency. Gets richer source data into the DB before CP3's document generation pipeline locks in its schema.
|
||||||
|
|
||||||
|
1. **P1 batch: free data capture** - All ~11 fields in a single migration. `user_notes_count`, `upvotes`, `downvotes`, `confidential`, `has_conflicts`, `blocking_discussions_resolved`, `merge_commit_sha`, `discussion_locked`, `task_completion_status`, `issue_type`, `issue references`.
|
||||||
|
2. **P1-10: MR milestones** - Reuse existing issue milestone transformer. Slightly more work, same migration.
|
||||||
|
|
||||||
|
### Phase B: During CP3 Gate C (`lore sync`)
|
||||||
|
|
||||||
|
**Build instrumentation into the sync orchestrator as it's constructed.** Not a separate effort.
|
||||||
|
|
||||||
|
3. **P0-1 + P0-2 + P0-3** - `SyncMetrics` struct accumulating phase timings. `run_sync` writes to `sync_runs` with full `metrics_json` on completion.
|
||||||
|
4. **P0-4** - Decide: use CP3's `pending_discussion_fetches` queue or existing per-MR telemetry columns. Wire up the winner.
|
||||||
|
5. **P0-5** - Add `elapsed_ms` to `*Complete` progress event variants.
|
||||||
|
6. **Deprecate `lore ingest`** - Hidden alias pointing to `lore sync`. Remove from help output.
|
||||||
|
|
||||||
|
### Phase C: After CP3 ships, informed by real metrics
|
||||||
|
|
||||||
|
**Only pursue items that P0 data proves matter.**
|
||||||
|
|
||||||
|
7. **P2-1: Discussion optimization** - Check metrics_json from real runs. If discussion phase is <10% of wall-clock, skip.
|
||||||
|
8. **P2-2: Keyset pagination** - Check primary fetch timing on largest project. If fast, skip.
|
||||||
|
9. **P2-4: Labels enrichment** - If label colors are needed for any UI surface.
|
||||||
|
|
||||||
|
### Phase D: Future (needs a forcing function)
|
||||||
|
|
||||||
|
10. **P3-1: Users table** - When a UI needs display names / avatars.
|
||||||
|
11. **P2-3: ETags** - Only if P2-1 doesn't sufficiently reduce discussion overhead.
|
||||||
|
12. **P3-2/3/4: GraphQL, Events API, Webhooks** - Architectural shifts. Only if pull-based sync hits a scaling wall.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Priority 0: Observability (prerequisite for everything else)
|
||||||
|
|
||||||
|
We can't evaluate any efficiency question without measurement. Gitlore has no runtime performance instrumentation. The infrastructure for it was scaffolded (sync_runs table, metrics_json column, discussion sync telemetry columns) but never wired up.
|
||||||
|
|
||||||
|
### P0-1: sync_runs table is never written to
|
||||||
|
|
||||||
|
**Location:** Schema in `migrations/001_initial.sql:25-34`, read in `src/cli/commands/sync_status.rs:69-72`
|
||||||
|
|
||||||
|
The table exists and `lore status` reads from it, but no code ever INSERTs or UPDATEs rows. The entire audit trail is empty.
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Exists in schema, never populated
|
||||||
|
CREATE TABLE sync_runs (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
started_at INTEGER NOT NULL,
|
||||||
|
heartbeat_at INTEGER NOT NULL,
|
||||||
|
finished_at INTEGER,
|
||||||
|
status TEXT NOT NULL, -- 'running' | 'succeeded' | 'failed'
|
||||||
|
command TEXT NOT NULL,
|
||||||
|
error TEXT,
|
||||||
|
metrics_json TEXT -- never written
|
||||||
|
);
|
||||||
|
```
|
||||||
|
|
||||||
|
**What to do:** Instrument the ingest orchestrator to record sync runs. Each `lore ingest issues` / `lore ingest mrs` invocation should:
|
||||||
|
- INSERT a row with status='running' at start
|
||||||
|
- UPDATE with status='succeeded'/'failed' + finished_at on completion
|
||||||
|
- Populate metrics_json with the IngestProjectResult / IngestMrProjectResult counters
|
||||||
|
|
||||||
|
### P0-2: No operation timing anywhere
|
||||||
|
|
||||||
|
**Location:** Rate limiter in `src/gitlab/client.rs:20-65`, orchestrator in `src/ingestion/orchestrator.rs`
|
||||||
|
|
||||||
|
`Instant::now()` is used only for rate limiter enforcement. No operation durations are measured or logged. We don't know:
|
||||||
|
|
||||||
|
- How long a full issue ingest takes
|
||||||
|
- How long discussion sync takes per entity
|
||||||
|
- How long individual API requests take (network latency)
|
||||||
|
- How long database writes take per batch
|
||||||
|
- How long rate limiter sleeps accumulate to
|
||||||
|
- How long pagination takes across pages
|
||||||
|
|
||||||
|
**What to do:** Add timing instrumentation at these levels:
|
||||||
|
|
||||||
|
| Level | What to time | Where |
|
||||||
|
|-------|-------------|-------|
|
||||||
|
| **Run** | Total ingest wall-clock time | orchestrator entry/exit |
|
||||||
|
| **Phase** | Primary fetch vs discussion sync | orchestrator phase boundaries |
|
||||||
|
| **API call** | Individual HTTP request round-trip | client.rs request method |
|
||||||
|
| **DB write** | Transaction duration per batch | ingestion store functions |
|
||||||
|
| **Rate limiter** | Cumulative sleep time per run | client.rs acquire() |
|
||||||
|
|
||||||
|
Store phase-level and run-level timing in `metrics_json`. Log API-call-level timing at debug level.
|
||||||
|
|
||||||
|
### P0-3: metrics_json has no defined schema
|
||||||
|
|
||||||
|
**What to do:** Define what goes in there. Strawman based on existing IngestProjectResult fields plus timing:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"wall_clock_ms": 14200,
|
||||||
|
"phases": {
|
||||||
|
"primary_fetch": {
|
||||||
|
"duration_ms": 8400,
|
||||||
|
"api_calls": 12,
|
||||||
|
"items_fetched": 1143,
|
||||||
|
"items_upserted": 87,
|
||||||
|
"pages": 12,
|
||||||
|
"rate_limit_sleep_ms": 1200
|
||||||
|
},
|
||||||
|
"discussion_sync": {
|
||||||
|
"duration_ms": 5800,
|
||||||
|
"entities_checked": 87,
|
||||||
|
"entities_synced": 14,
|
||||||
|
"entities_skipped": 73,
|
||||||
|
"api_calls": 22,
|
||||||
|
"discussions_fetched": 156,
|
||||||
|
"notes_upserted": 412,
|
||||||
|
"rate_limit_sleep_ms": 2200
|
||||||
|
}
|
||||||
|
},
|
||||||
|
"db": {
|
||||||
|
"labels_created": 3,
|
||||||
|
"raw_payloads_stored": 87,
|
||||||
|
"raw_payloads_deduped": 42
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### P0-4: Discussion sync telemetry columns are dead code
|
||||||
|
|
||||||
|
**Location:** `merge_requests` table columns: `discussions_sync_last_attempt_at`, `discussions_sync_attempts`, `discussions_sync_last_error`
|
||||||
|
|
||||||
|
These exist in the schema but are never read or written. They were designed for tracking retry behavior on failed discussion syncs.
|
||||||
|
|
||||||
|
**What to do:** Wire these up during discussion sync. On attempt: set last_attempt_at and increment attempts. On failure: set last_error. On success: reset attempts to 0. This provides per-entity visibility into discussion sync health.
|
||||||
|
|
||||||
|
### P0-5: Progress events carry no timing
|
||||||
|
|
||||||
|
**Location:** `src/ingestion/orchestrator.rs:28-53`
|
||||||
|
|
||||||
|
ProgressEvent variants (`IssueFetched`, `DiscussionSynced`, etc.) carry only counts. Adding elapsed_ms to at least `*Complete` variants would give callers (CLI progress bars, robot mode output) real throughput numbers.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Priority 1: Free data capture (zero API cost)
|
||||||
|
|
||||||
|
These fields are already in the API responses gitlore receives. Storing them requires only Rust struct additions and DB column migrations. No additional API calls.
|
||||||
|
|
||||||
|
### P1-1: user_notes_count (Issues + MRs)
|
||||||
|
|
||||||
|
**API field:** `user_notes_count` (integer)
|
||||||
|
**Value:** Could short-circuit discussion re-sync. If count hasn't changed, discussions probably haven't changed either. Also useful for "most discussed" queries.
|
||||||
|
**Effort:** Add field to serde struct, add DB column, store during transform.
|
||||||
|
|
||||||
|
### P1-2: upvotes / downvotes (Issues + MRs)
|
||||||
|
|
||||||
|
**API field:** `upvotes`, `downvotes` (integers)
|
||||||
|
**Value:** Engagement metrics for triage. "Most upvoted open issues" is a common query.
|
||||||
|
**Effort:** Same pattern as above.
|
||||||
|
|
||||||
|
### P1-3: confidential (Issues)
|
||||||
|
|
||||||
|
**API field:** `confidential` (boolean)
|
||||||
|
**Value:** Security-sensitive filtering. Important to know when exposing issue data.
|
||||||
|
**Effort:** Low.
|
||||||
|
|
||||||
|
### P1-4: has_conflicts (MRs)
|
||||||
|
|
||||||
|
**API field:** `has_conflicts` (boolean)
|
||||||
|
**Value:** Identify MRs needing rebase. Useful for "stale MR" detection.
|
||||||
|
**Effort:** Low.
|
||||||
|
|
||||||
|
### P1-5: blocking_discussions_resolved (MRs)
|
||||||
|
|
||||||
|
**API field:** `blocking_discussions_resolved` (boolean)
|
||||||
|
**Value:** MR readiness indicator without joining the discussions table.
|
||||||
|
**Effort:** Low.
|
||||||
|
|
||||||
|
### P1-6: merge_commit_sha (MRs)
|
||||||
|
|
||||||
|
**API field:** `merge_commit_sha` (string, nullable)
|
||||||
|
**Value:** Trace merged MRs to specific commits in git history.
|
||||||
|
**Effort:** Low.
|
||||||
|
|
||||||
|
### P1-7: discussion_locked (Issues + MRs)
|
||||||
|
|
||||||
|
**API field:** `discussion_locked` (boolean)
|
||||||
|
**Value:** Know if new comments can be added. Useful for robot mode consumers.
|
||||||
|
**Effort:** Low.
|
||||||
|
|
||||||
|
### P1-8: task_completion_status (Issues + MRs)
|
||||||
|
|
||||||
|
**API field:** `task_completion_status` (object: `{count, completed_count}`)
|
||||||
|
**Value:** Track task-list checkbox progress without parsing markdown.
|
||||||
|
**Effort:** Low. Store as two integer columns or a small JSON blob.
|
||||||
|
|
||||||
|
### P1-9: issue_type (Issues)
|
||||||
|
|
||||||
|
**API field:** `issue_type` (string: "issue" | "incident" | "test_case")
|
||||||
|
**Value:** Distinguish issues vs incidents vs test cases for filtering.
|
||||||
|
**Effort:** Low.
|
||||||
|
|
||||||
|
### P1-10: MR milestone (MRs)
|
||||||
|
|
||||||
|
**API field:** `milestone` (object, same structure as on issues)
|
||||||
|
**Current state:** Milestones are fully stored for issues but completely ignored for MRs.
|
||||||
|
**Value:** "Which MRs are in milestone X?" Currently impossible to query locally.
|
||||||
|
**Effort:** Medium - reuse existing milestone transformer from issue pipeline.
|
||||||
|
|
||||||
|
### P1-11: Issue references (Issues)
|
||||||
|
|
||||||
|
**API field:** `references` (object: `{short, relative, full}`)
|
||||||
|
**Current state:** Stored for MRs (`references_short`, `references_full`), dropped for issues.
|
||||||
|
**Value:** Cross-project issue references (e.g., `group/project#42`).
|
||||||
|
**Effort:** Low.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Priority 2: Efficiency improvements (requires measurement from P0 first)
|
||||||
|
|
||||||
|
These are potential optimizations. **Do not implement until P0 instrumentation proves they matter.**
|
||||||
|
|
||||||
|
### P2-1: Discussion full-refresh strategy
|
||||||
|
|
||||||
|
**Current behavior:** When an issue/MR's `updated_at` advances, ALL its discussions are deleted and re-fetched from scratch.
|
||||||
|
|
||||||
|
**Potential optimization:** Use `user_notes_count` (P1-1) to detect whether discussions actually changed. Skip re-sync if count is unchanged.
|
||||||
|
|
||||||
|
**Why we need P0 first:** The full-refresh may be fast enough. Since we already fetch the data from GitLab, the DELETE+INSERT is just local SQLite I/O. If discussion sync for a typical entity takes <100ms locally, this isn't worth optimizing. We need the per-entity timing from P0-2 to know.
|
||||||
|
|
||||||
|
**Trade-offs to consider:**
|
||||||
|
- Full-refresh catches edited and deleted notes. Incremental would miss those.
|
||||||
|
- `user_notes_count` doesn't change when notes are edited, only when added/removed.
|
||||||
|
- Full-refresh is simpler to reason about for consistency.
|
||||||
|
|
||||||
|
### P2-2: Keyset pagination
|
||||||
|
|
||||||
|
**Current behavior:** Offset-based (`page=N&per_page=100`).
|
||||||
|
**Alternative:** Keyset pagination (`pagination=keyset`), O(1) per page instead of O(N).
|
||||||
|
|
||||||
|
**Why we need P0 first:** Only matters for large projects (>10K issues). Most projects will never hit enough pages for this to be measurable. P0 timing of pagination will show if this is a bottleneck.
|
||||||
|
|
||||||
|
**Note:** Gitlore already parses `Link` headers for next-page detection, which is the client-side mechanism keyset pagination uses. So partial support exists.
|
||||||
|
|
||||||
|
### P2-3: ETag / conditional requests
|
||||||
|
|
||||||
|
**Current behavior:** All requests are unconditional.
|
||||||
|
**Alternative:** Cache ETags, send `If-None-Match`, get 304s back.
|
||||||
|
|
||||||
|
**Why we need P0 first:** The cursor-based sync already avoids re-fetching unchanged data for primary resources. ETags would mainly help with discussion re-fetches where nothing changed. If P2-1 (user_notes_count skip) is implemented, ETags become less valuable.
|
||||||
|
|
||||||
|
### P2-4: Labels API enrichment
|
||||||
|
|
||||||
|
**Current behavior:** Labels extracted from the `labels[]` string array in issue/MR responses. The `labels` table has `color` and `description` columns that may not be populated.
|
||||||
|
**Alternative:** Single call to `GET /projects/:id/labels` per project per sync to populate label metadata.
|
||||||
|
**Cost:** 1 API call per project per sync run.
|
||||||
|
**Value:** Label colors for UI rendering, descriptions for tooltips.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Priority 3: Structural improvements (future consideration)
|
||||||
|
|
||||||
|
### P3-1: Users table
|
||||||
|
|
||||||
|
**Current state:** Only `username` stored. Author `name`, `avatar_url`, `web_url`, `state` are in every API response but discarded.
|
||||||
|
**Proposal:** Create a `users` table, upsert on every encounter. Zero API cost.
|
||||||
|
**Value:** Richer user display, detect blocked/deactivated users.
|
||||||
|
|
||||||
|
### P3-2: GraphQL API for field-precise fetching
|
||||||
|
|
||||||
|
**Current state:** REST API returns ~40-50 fields per entity. Gitlore uses ~15-23.
|
||||||
|
**Alternative:** GraphQL API allows requesting exactly the fields needed.
|
||||||
|
**Trade-offs:** Different pagination model, potentially less stable API, more complex client code. The bandwidth savings are real but likely minor compared to discussion re-fetch overhead.
|
||||||
|
|
||||||
|
### P3-3: Events API for lightweight change detection
|
||||||
|
|
||||||
|
**Endpoint:** `GET /projects/:id/events`
|
||||||
|
**Value:** Lightweight "has anything changed?" check before running full issue/MR sync. Could replace or supplement the cursor-based approach for very active projects.
|
||||||
|
|
||||||
|
### P3-4: Webhook-based push sync
|
||||||
|
|
||||||
|
**Endpoint:** `POST /projects/:id/hooks` (setup), then receive pushes.
|
||||||
|
**Value:** Near-real-time sync without polling cost. Eliminates all rate-limit concerns.
|
||||||
|
**Barrier:** Requires a listener endpoint, which changes the architecture from pull-only CLI to something with a daemon/server component.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Working notes
|
||||||
|
|
||||||
|
_Space for recording decisions as we work through items._
|
||||||
|
|
||||||
|
### Decisions made
|
||||||
|
|
||||||
|
| Item | Decision | Rationale |
|
||||||
|
|------|----------|-----------|
|
||||||
|
| `lore ingest` | Remove. `lore sync` is the single entry point. | No reason to separate initial load from incremental updates. First run = full fetch, subsequent = cursor-based delta. |
|
||||||
|
| CP3 alignment | Build P0 instrumentation into CP3 Gate C, not separately. | Avoids building in two places. `lore sync` owns the full lifecycle. |
|
||||||
|
| P2 timing | Defer all efficiency optimizations until P0 metrics from real runs are available. | Can't evaluate trade-offs without measurement. |
|
||||||
|
|
||||||
|
### Open questions
|
||||||
|
|
||||||
|
- What's the typical project size (issue/MR count) for gitlore users? This determines whether keyset pagination (P2-2) matters.
|
||||||
|
- Is there a plan for a web UI or TUI? That would increase the value of P3-1 (users table) and P2-4 (label colors).
|
||||||
245
docs/diagrams/01-human-flow-map.excalidraw
Normal file
245
docs/diagrams/01-human-flow-map.excalidraw
Normal file
@@ -0,0 +1,245 @@
|
|||||||
|
{
|
||||||
|
"type": "excalidraw",
|
||||||
|
"version": 2,
|
||||||
|
"source": "https://excalidraw.com",
|
||||||
|
"elements": [
|
||||||
|
{ "type": "text", "id": "title", "x": 300, "y": 15, "text": "Human User Flow Map", "fontSize": 28 },
|
||||||
|
{ "type": "text", "id": "subtitle", "x": 220, "y": 53, "text": "15 human workflows mapped to lore commands. Arrows show data dependency.", "fontSize": 14, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "col-trigger", "x": 60, "y": 80, "text": "TRIGGER (Problem)", "fontSize": 16, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "col-flow", "x": 400, "y": 80, "text": "COMMAND FLOW", "fontSize": 16, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "col-gap", "x": 880, "y": 80, "text": "GAP", "fontSize": 16, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-daily", "x": 20, "y": 110, "width": 960, "height": 190,
|
||||||
|
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-daily-label", "x": 30, "y": 115, "text": "Daily Operations", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h1-trigger", "x": 30, "y": 140, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H1: Standup prep\n\"What moved overnight?\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h1-a1", "x": 230, "y": 165, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h1-cmd1", "x": 280, "y": 145, "width": 90, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "sync -q", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h1-a2", "x": 370, "y": 165, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h1-cmd2", "x": 400, "y": 145, "width": 140, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues --since 1d", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h1-a3", "x": 540, "y": 165, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h1-cmd3", "x": 570, "y": 145, "width": 130, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "mrs --since 1d", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h1-a4", "x": 700, "y": 165, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h1-cmd4", "x": 730, "y": 145, "width": 100, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who @me", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h1-a5", "x": 830, "y": 165, "width": 40, "height": 0,
|
||||||
|
"points": [[0,0],[40,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h1-gap", "x": 870, "y": 140, "width": 100, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No @me\nNo feed", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h3-trigger", "x": 30, "y": 210, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H3: Incident\n\"Deploy broke prod\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h3-a1", "x": 230, "y": 235, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h3-cmd1", "x": 280, "y": 215, "width": 130, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "timeline deploy", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h3-a2", "x": 410, "y": 235, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h3-cmd2", "x": 440, "y": 215, "width": 160, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "search deploy --mr", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h3-a3", "x": 600, "y": 235, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h3-cmd3", "x": 630, "y": 215, "width": 110, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "mrs <iid>", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h3-a4", "x": 740, "y": 235, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h3-cmd4", "x": 770, "y": 215, "width": 100, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who --overlap", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-planning", "x": 20, "y": 310, "width": 960, "height": 190,
|
||||||
|
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-planning-label", "x": 30, "y": 315, "text": "Planning & Assignment", "fontSize": 14, "strokeColor": "#15803d" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h2-trigger", "x": 30, "y": 340, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H2: Sprint plan\n\"What's ready to pick?\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h2-a1", "x": 230, "y": 365, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h2-cmd1", "x": 280, "y": 345, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues -s opened -l ready", "fontSize": 13 } },
|
||||||
|
{ "type": "arrow", "id": "h2-a2", "x": 450, "y": 365, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h2-cmd2", "x": 480, "y": 345, "width": 150, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues --has-due", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h2-a3", "x": 630, "y": 365, "width": 230, "height": 0,
|
||||||
|
"points": [[0,0],[230,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h2-gap", "x": 860, "y": 340, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No\n--no-assignee", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h8-trigger", "x": 30, "y": 410, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H8: Assign work\n\"Who has bandwidth?\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h8-a1", "x": 230, "y": 435, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h8-cmd1", "x": 280, "y": 415, "width": 120, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who @alice", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h8-a2", "x": 400, "y": 435, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h8-cmd2", "x": 430, "y": 415, "width": 110, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who @bob", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h8-a3", "x": 540, "y": 435, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h8-cmd3", "x": 570, "y": 415, "width": 120, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who @carol...", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h8-a4", "x": 690, "y": 435, "width": 170, "height": 0,
|
||||||
|
"points": [[0,0],[170,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h8-gap", "x": 860, "y": 410, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No team\nworkload view", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-investigation", "x": 20, "y": 510, "width": 960, "height": 260,
|
||||||
|
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-invest-label", "x": 30, "y": 515, "text": "Investigation & Understanding", "fontSize": 14, "strokeColor": "#b45309" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h7-trigger", "x": 30, "y": 540, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H7: Why this way?\n\"Understand a decision\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h7-a1", "x": 230, "y": 565, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h7-cmd1", "x": 280, "y": 545, "width": 160, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "search \"rationale\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h7-a2", "x": 440, "y": 565, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h7-cmd2", "x": 470, "y": 545, "width": 140, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "timeline --depth 2", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h7-a3", "x": 610, "y": 565, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h7-cmd3", "x": 640, "y": 545, "width": 100, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues 234", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h7-a4", "x": 740, "y": 565, "width": 120, "height": 0,
|
||||||
|
"points": [[0,0],[120,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h7-gap", "x": 860, "y": 540, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No per-note\nsearch", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h11-trigger", "x": 30, "y": 610, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H11: Bug lifecycle\n\"Why does #321 reopen?\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h11-a1", "x": 230, "y": 635, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h11-cmd1", "x": 280, "y": 615, "width": 120, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues 321", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h11-a2", "x": 400, "y": 635, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h11-cmd2", "x": 430, "y": 615, "width": 130, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "timeline ???", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h11-a3", "x": 560, "y": 635, "width": 300, "height": 0,
|
||||||
|
"points": [[0,0],[300,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h11-gap", "x": 860, "y": 610, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No entity\ntimeline", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h14-trigger", "x": 30, "y": 680, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H14: Prior art?\n\"Was this tried before?\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h14-a1", "x": 230, "y": 705, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h14-cmd1", "x": 280, "y": 685, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "search \"memory leak\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h14-a2", "x": 450, "y": 705, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h14-cmd2", "x": 480, "y": 685, "width": 120, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "mrs --closed?", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h14-a3", "x": 600, "y": 705, "width": 260, "height": 0,
|
||||||
|
"points": [[0,0],[260,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h14-gap", "x": 860, "y": 680, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No --state\non search", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-people", "x": 20, "y": 780, "width": 960, "height": 190,
|
||||||
|
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-people-label", "x": 30, "y": 785, "text": "People & Expertise", "fontSize": 14, "strokeColor": "#7048e8" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h4-trigger", "x": 30, "y": 810, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H4: Review prep\n\"Context for MR !789\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h4-a1", "x": 230, "y": 835, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h4-cmd1", "x": 280, "y": 815, "width": 100, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "mrs 789", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h4-a2", "x": 380, "y": 835, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h4-cmd2", "x": 410, "y": 815, "width": 120, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who src/auth/", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h4-a3", "x": 530, "y": 835, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h4-cmd3", "x": 560, "y": 815, "width": 130, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "search \"auth\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h4-a4", "x": 690, "y": 835, "width": 170, "height": 0,
|
||||||
|
"points": [[0,0],[170,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h4-gap", "x": 860, "y": 810, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No MR file\nlist output", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "h6-trigger", "x": 30, "y": 880, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "H6: Find reviewer\n\"Who should review?\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h6-a1", "x": 230, "y": 905, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h6-cmd1", "x": 280, "y": 885, "width": 130, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who src/auth/", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h6-a2", "x": 410, "y": 905, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h6-cmd2", "x": 440, "y": 885, "width": 140, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who src/pay/", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h6-a3", "x": 580, "y": 905, "width": 30, "height": 0,
|
||||||
|
"points": [[0,0],[30,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h6-cmd3", "x": 610, "y": 885, "width": 140, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who @candidate", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "h6-a4", "x": 750, "y": 905, "width": 110, "height": 0,
|
||||||
|
"points": [[0,0],[110,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "h6-gap", "x": 860, "y": 880, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No multi-\npath query", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "callout-1", "x": 30, "y": 990, "text": "Pattern: Most human flows require 3-5 serial commands. Average gap rate: 73% of flows have at least one.", "fontSize": 14, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "callout-2", "x": 30, "y": 1015, "text": "Top optimization: Composite commands (activity feed, team workload) would reduce multi-command flows by ~40%.", "fontSize": 14, "strokeColor": "#15803d" },
|
||||||
|
{ "type": "text", "id": "callout-3", "x": 30, "y": 1040, "text": "Top missing data: MR file changes and entity references are stored but invisible to CLI users.", "fontSize": 14, "strokeColor": "#ef4444" }
|
||||||
|
],
|
||||||
|
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
|
||||||
|
"files": {}
|
||||||
|
}
|
||||||
BIN
docs/diagrams/01-human-flow-map.png
Normal file
BIN
docs/diagrams/01-human-flow-map.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 274 KiB |
204
docs/diagrams/02-agent-flow-map.excalidraw
Normal file
204
docs/diagrams/02-agent-flow-map.excalidraw
Normal file
@@ -0,0 +1,204 @@
|
|||||||
|
{
|
||||||
|
"type": "excalidraw",
|
||||||
|
"version": 2,
|
||||||
|
"source": "https://excalidraw.com",
|
||||||
|
"elements": [
|
||||||
|
{ "type": "text", "id": "title", "x": 320, "y": 15, "text": "AI Agent Flow Map", "fontSize": 28 },
|
||||||
|
{ "type": "text", "id": "subtitle", "x": 180, "y": 53, "text": "15 agent automation workflows. Agents need structured JSON (-J), exit codes, and field selection.", "fontSize": 14, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "col-trigger", "x": 60, "y": 80, "text": "TRIGGER (Agent Goal)", "fontSize": 16, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "col-flow", "x": 400, "y": 80, "text": "COMMAND PIPELINE", "fontSize": 16, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "col-gap", "x": 880, "y": 80, "text": "BLOCKED BY", "fontSize": 16, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-context", "x": 20, "y": 110, "width": 960, "height": 200,
|
||||||
|
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-context-label", "x": 30, "y": 115, "text": "Context Gathering (pre-action)", "fontSize": 14, "strokeColor": "#7048e8" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a1-trigger", "x": 30, "y": 140, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A1: Pre-edit context\nAbout to modify files", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a1-a1", "x": 230, "y": 165, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a1-cmd1", "x": 280, "y": 145, "width": 80, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J health", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a1-a2", "x": 360, "y": 165, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a1-cmd2", "x": 380, "y": 145, "width": 140, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J who src/auth/", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a1-a3", "x": 520, "y": 165, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a1-cmd3", "x": 540, "y": 145, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J search \"auth\" -n 10", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a1-a4", "x": 710, "y": 165, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a1-cmd4", "x": 730, "y": 145, "width": 130, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J who --overlap", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a6-trigger", "x": 30, "y": 210, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A6: Auto-assign reviewers\nBased on file expertise", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a6-a1", "x": 230, "y": 235, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a6-cmd1", "x": 280, "y": 215, "width": 100, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J mrs 456", "fontSize": 14 } },
|
||||||
|
{ "type": "text", "id": "a6-block", "x": 390, "y": 218, "text": "file list not\nin response!", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
{ "type": "arrow", "id": "a6-a2", "x": 380, "y": 245, "width": 480, "height": -10,
|
||||||
|
"points": [[0,0],[480,-10]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
|
||||||
|
{ "type": "rectangle", "id": "a6-gap", "x": 860, "y": 210, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "MR files\nnot exposed", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-report", "x": 20, "y": 320, "width": 960, "height": 200,
|
||||||
|
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-report-label", "x": 30, "y": 325, "text": "Reporting & Synthesis", "fontSize": 14, "strokeColor": "#15803d" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a3-trigger", "x": 30, "y": 350, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A3: Sprint status report\n7 queries for 1 report", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a3-a1", "x": 230, "y": 375, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a3-cmd1", "x": 280, "y": 352, "width": 100, "height": 36,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues -s closed", "fontSize": 12 } },
|
||||||
|
{ "type": "rectangle", "id": "a3-cmd2", "x": 390, "y": 352, "width": 100, "height": 36,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues --status", "fontSize": 12 } },
|
||||||
|
{ "type": "rectangle", "id": "a3-cmd3", "x": 500, "y": 352, "width": 100, "height": 36,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "mrs -s merged", "fontSize": 12 } },
|
||||||
|
{ "type": "rectangle", "id": "a3-cmd4", "x": 610, "y": 352, "width": 80, "height": 36,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "mrs -s open", "fontSize": 12 } },
|
||||||
|
{ "type": "rectangle", "id": "a3-cmd5", "x": 700, "y": 352, "width": 80, "height": 36,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "count x2", "fontSize": 12 } },
|
||||||
|
{ "type": "rectangle", "id": "a3-cmd6", "x": 790, "y": 352, "width": 60, "height": 36,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "who", "fontSize": 12 } },
|
||||||
|
{ "type": "arrow", "id": "a3-agap", "x": 850, "y": 370, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a3-gap", "x": 860, "y": 350, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No summary\ncommand", "fontSize": 14 } },
|
||||||
|
{ "type": "text", "id": "a3-note", "x": 280, "y": 395, "text": "7 sequential API calls for one report. A `lore summary` could reduce to 1.", "fontSize": 12, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a7-trigger", "x": 30, "y": 430, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A7: Incident timeline\nPostmortem reconstruction", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a7-a1", "x": 230, "y": 455, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a7-cmd1", "x": 280, "y": 435, "width": 190, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J timeline --depth 2", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a7-a2", "x": 470, "y": 455, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a7-cmd2", "x": 490, "y": 435, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J search --since 3d", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a7-a3", "x": 660, "y": 455, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a7-cmd3", "x": 680, "y": 435, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J mrs -s merged", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-discover", "x": 20, "y": 530, "width": 960, "height": 200,
|
||||||
|
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-discover-label", "x": 30, "y": 535, "text": "Discovery & Correlation", "fontSize": 14, "strokeColor": "#b45309" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a5-trigger", "x": 30, "y": 560, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A5: PR description\nFind related issues to link", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a5-a1", "x": 230, "y": 585, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a5-cmd1", "x": 280, "y": 565, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J search keywords", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a5-a2", "x": 450, "y": 585, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a5-cmd2", "x": 470, "y": 565, "width": 180, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J issues --fields iid,url", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a5-a3", "x": 650, "y": 585, "width": 210, "height": 0,
|
||||||
|
"points": [[0,0],[210,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
|
||||||
|
{ "type": "rectangle", "id": "a5-gap", "x": 860, "y": 560, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No refs\nquery", "fontSize": 14 } },
|
||||||
|
{ "type": "text", "id": "a5-note", "x": 280, "y": 612, "text": "Agent can't ask \"which issues does MR !456 close?\" -- entity_references data exists but isn't queryable.", "fontSize": 12, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a11-trigger", "x": 30, "y": 640, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A11: Knowledge graph\nMap entity relationships", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a11-a1", "x": 230, "y": 665, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a11-cmd1", "x": 280, "y": 645, "width": 140, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J search -n 30", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a11-a2", "x": 420, "y": 665, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a11-cmd2", "x": 440, "y": 645, "width": 190, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J timeline --depth 2", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a11-a3", "x": 630, "y": 665, "width": 230, "height": 0,
|
||||||
|
"points": [[0,0],[230,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
|
||||||
|
{ "type": "rectangle", "id": "a11-gap", "x": 860, "y": 640, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No refs\nquery", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-maint", "x": 20, "y": 740, "width": 960, "height": 140,
|
||||||
|
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-maint-label", "x": 30, "y": 745, "text": "Maintenance & Cleanup", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a9-trigger", "x": 30, "y": 770, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A9: Stale issue cleanup\nWeekly backlog hygiene", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a9-a1", "x": 230, "y": 795, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a9-cmd1", "x": 280, "y": 775, "width": 200, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J issues --sort updated --asc", "fontSize": 12 } },
|
||||||
|
{ "type": "arrow", "id": "a9-a2", "x": 480, "y": 795, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a9-cmd2", "x": 500, "y": 775, "width": 120, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "filter client-side", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a9-a3", "x": 620, "y": 795, "width": 240, "height": 0,
|
||||||
|
"points": [[0,0],[240,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
|
||||||
|
{ "type": "rectangle", "id": "a9-gap", "x": 860, "y": 770, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No --before\nNo offset", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "a15-trigger", "x": 30, "y": 840, "width": 200, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "A15: Conflict detect\n\"Safe to start work?\"", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a15-a1", "x": 230, "y": 865, "width": 50, "height": 0,
|
||||||
|
"points": [[0,0],[50,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a15-cmd1", "x": 280, "y": 845, "width": 110, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J issues 123", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a15-a2", "x": 390, "y": 865, "width": 20, "height": 0,
|
||||||
|
"points": [[0,0],[20,0]], "endArrowhead": "arrow" },
|
||||||
|
{ "type": "rectangle", "id": "a15-cmd2", "x": 410, "y": 845, "width": 130, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "-J who --overlap", "fontSize": 14 } },
|
||||||
|
{ "type": "arrow", "id": "a15-a3", "x": 540, "y": 865, "width": 320, "height": 0,
|
||||||
|
"points": [[0,0],[320,0]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
|
||||||
|
{ "type": "rectangle", "id": "a15-gap", "x": 860, "y": 840, "width": 110, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "No refs +\n--state", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "callout-1", "x": 30, "y": 910, "text": "Agent-specific pain: Agents always use -J and --fields minimal for token efficiency. Every extra query burns tokens.", "fontSize": 14, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "callout-2", "x": 30, "y": 935, "text": "Biggest ROI: `lore refs` command would unblock A5, A11, A12, A15 instantly. Data already exists in entity_references table.", "fontSize": 14, "strokeColor": "#15803d" },
|
||||||
|
{ "type": "text", "id": "callout-3", "x": 30, "y": 960, "text": "Token waste: Sprint report (A3) requires 7 calls. A composite `lore summary` could save ~85% of tokens.", "fontSize": 14, "strokeColor": "#ef4444" }
|
||||||
|
],
|
||||||
|
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
|
||||||
|
"files": {}
|
||||||
|
}
|
||||||
BIN
docs/diagrams/02-agent-flow-map.png
Normal file
BIN
docs/diagrams/02-agent-flow-map.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 269 KiB |
203
docs/diagrams/03-command-coverage.excalidraw
Normal file
203
docs/diagrams/03-command-coverage.excalidraw
Normal file
@@ -0,0 +1,203 @@
|
|||||||
|
{
|
||||||
|
"type": "excalidraw",
|
||||||
|
"version": 2,
|
||||||
|
"source": "https://excalidraw.com",
|
||||||
|
"elements": [
|
||||||
|
{ "type": "text", "id": "title", "x": 280, "y": 15, "text": "Command Coverage Heatmap", "fontSize": 28 },
|
||||||
|
{ "type": "text", "id": "subtitle", "x": 220, "y": 53, "text": "Which commands serve which workflows? Darker = more essential to that flow.", "fontSize": 14, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "col-issues", "x": 260, "y": 85, "text": "issues", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-mrs", "x": 330, "y": 85, "text": "mrs", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-search", "x": 390, "y": 85, "text": "search", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-who", "x": 465, "y": 85, "text": "who", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-timeline", "x": 520, "y": 85, "text": "timeline", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-sync", "x": 600, "y": 85, "text": "sync", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-count", "x": 660, "y": 85, "text": "count", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-status", "x": 720, "y": 85, "text": "status", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "col-missing", "x": 790, "y": 85, "text": "MISSING?", "fontSize": 14, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "grp-human", "x": 15, "y": 108, "text": "HUMAN FLOWS", "fontSize": 14, "strokeColor": "#15803d" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h1-label", "x": 15, "y": 135, "text": "H1 Standup prep", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h1-issues", "x": 255, "y": 130, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h1-mrs", "x": 325, "y": 130, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h1-who", "x": 460, "y": 130, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h1-sync", "x": 595, "y": 130, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h1-gap", "x": 780, "y": 135, "text": "activity feed", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h2-label", "x": 15, "y": 170, "text": "H2 Sprint planning", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h2-issues", "x": 255, "y": 165, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h2-count", "x": 655, "y": 165, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h2-gap", "x": 780, "y": 170, "text": "--no-assignee", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h3-label", "x": 15, "y": 205, "text": "H3 Incident response", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h3-mrs", "x": 325, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h3-search", "x": 390, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h3-who", "x": 460, "y": 200, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h3-timeline", "x": 525, "y": 200, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h3-sync", "x": 595, "y": 200, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h4-label", "x": 15, "y": 240, "text": "H4 Code review prep", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h4-mrs", "x": 325, "y": 235, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h4-search", "x": 390, "y": 235, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h4-who", "x": 460, "y": 235, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h4-timeline", "x": 525, "y": 235, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h4-gap", "x": 780, "y": 240, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h5-label", "x": 15, "y": 275, "text": "H5 Onboarding", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h5-issues", "x": 255, "y": 270, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h5-mrs", "x": 325, "y": 270, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h5-search", "x": 390, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h5-who", "x": 460, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h5-timeline", "x": 525, "y": 270, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h6-label", "x": 15, "y": 310, "text": "H6 Find reviewer", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h6-who", "x": 460, "y": 305, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h6-gap", "x": 780, "y": 310, "text": "multi-path who", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h7-label", "x": 15, "y": 345, "text": "H7 Why was this built?", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h7-issues", "x": 255, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h7-mrs", "x": 325, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h7-search", "x": 390, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h7-timeline", "x": 525, "y": 340, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h7-gap", "x": 780, "y": 345, "text": "per-note search", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h8-label", "x": 15, "y": 380, "text": "H8 Team workload", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h8-who", "x": 460, "y": 375, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h8-gap", "x": 780, "y": 380, "text": "team view", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h9-label", "x": 15, "y": 415, "text": "H9 Release notes", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h9-issues", "x": 255, "y": 410, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h9-mrs", "x": 325, "y": 410, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h9-gap", "x": 780, "y": 415, "text": "mrs --milestone", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h10-label", "x": 15, "y": 450, "text": "H10 Stale issues", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h10-issues", "x": 255, "y": 445, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h10-gap", "x": 780, "y": 450, "text": "--updated-before", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h11-label", "x": 15, "y": 485, "text": "H11 Bug lifecycle", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h11-issues", "x": 255, "y": 480, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h11-timeline", "x": 525, "y": 480, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h11-gap", "x": 780, "y": 485, "text": "entity timeline", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h12-label", "x": 15, "y": 520, "text": "H12 Who broke tests?", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h12-search", "x": 390, "y": 515, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h12-who", "x": 460, "y": 515, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h13-label", "x": 15, "y": 555, "text": "H13 Feature tracking", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h13-issues", "x": 255, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h13-mrs", "x": 325, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h13-timeline", "x": 525, "y": 550, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h14-label", "x": 15, "y": 590, "text": "H14 Prior art check", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h14-search", "x": 390, "y": 585, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "h14-timeline", "x": 525, "y": 585, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h14-gap", "x": 780, "y": 590, "text": "--state on search", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "h15-label", "x": 15, "y": 625, "text": "H15 My discussions", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "h15-who", "x": 460, "y": 620, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "h15-gap", "x": 780, "y": 625, "text": "participant filter", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "divider", "x": 10, "y": 655, "width": 910, "height": 2, "backgroundColor": "#dee2e6", "fillStyle": "solid" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "grp-agent", "x": 15, "y": 668, "text": "AI AGENT FLOWS", "fontSize": 14, "strokeColor": "#7048e8" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a1-label", "x": 15, "y": 695, "text": "A1 Pre-edit context", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a1-mrs", "x": 325, "y": 690, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a1-search", "x": 390, "y": 690, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a1-who", "x": 460, "y": 690, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a2-label", "x": 15, "y": 730, "text": "A2 Auto-triage", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a2-issues", "x": 255, "y": 725, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a2-search", "x": 390, "y": 725, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a2-who", "x": 460, "y": 725, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a2-gap", "x": 780, "y": 730, "text": "detail --fields", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a3-label", "x": 15, "y": 765, "text": "A3 Sprint report", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a3-issues", "x": 255, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a3-mrs", "x": 325, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a3-who", "x": 460, "y": 760, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a3-count", "x": 655, "y": 760, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a3-gap", "x": 780, "y": 765, "text": "summary cmd", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a4-label", "x": 15, "y": 800, "text": "A4 Prior art", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a4-search", "x": 390, "y": 795, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a4-timeline", "x": 525, "y": 795, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a4-gap", "x": 780, "y": 800, "text": "per-note search", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a5-label", "x": 15, "y": 835, "text": "A5 PR description", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a5-issues", "x": 255, "y": 830, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a5-search", "x": 390, "y": 830, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a5-gap", "x": 780, "y": 835, "text": "entity refs query", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a6-label", "x": 15, "y": 870, "text": "A6 Reviewer assign", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a6-mrs", "x": 325, "y": 865, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a6-who", "x": 460, "y": 865, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a6-gap", "x": 780, "y": 870, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a7-label", "x": 15, "y": 905, "text": "A7 Incident timeline", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a7-mrs", "x": 325, "y": 900, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a7-search", "x": 390, "y": 900, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a7-timeline", "x": 525, "y": 900, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a8-label", "x": 15, "y": 940, "text": "A8 Cross-project", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a8-search", "x": 390, "y": 935, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a8-timeline", "x": 525, "y": 935, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a8-gap", "x": 780, "y": 940, "text": "group by project", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a9-label", "x": 15, "y": 975, "text": "A9 Stale cleanup", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a9-issues", "x": 255, "y": 970, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a9-search", "x": 390, "y": 970, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a9-gap", "x": 780, "y": 975, "text": "--updated-before", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a10-label", "x": 15, "y": 1010, "text": "A10 Review context", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a10-mrs", "x": 325, "y": 1005, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a10-who", "x": 460, "y": 1005, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a10-gap", "x": 780, "y": 1010, "text": "MR file list", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a11-label", "x": 15, "y": 1045, "text": "A11 Knowledge graph", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a11-search", "x": 390, "y": 1040, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a11-timeline", "x": 525, "y": 1040, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a11-gap", "x": 780, "y": 1045, "text": "entity refs query", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a12-label", "x": 15, "y": 1080, "text": "A12 Release check", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a12-issues", "x": 255, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a12-mrs", "x": 325, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a12-who", "x": 460, "y": 1075, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a12-gap", "x": 780, "y": 1080, "text": "mrs --milestone", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a13-label", "x": 15, "y": 1115, "text": "A13 What changed?", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a13-issues", "x": 255, "y": 1110, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a13-mrs", "x": 325, "y": 1110, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a13-gap", "x": 780, "y": 1115, "text": "state-change filter", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a14-label", "x": 15, "y": 1150, "text": "A14 Meeting prep", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a14-issues", "x": 255, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a14-mrs", "x": 325, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a14-who", "x": 460, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a14-count", "x": 655, "y": 1145, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a14-gap", "x": 780, "y": 1150, "text": "summary cmd", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "a15-label", "x": 15, "y": 1185, "text": "A15 Conflict detect", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "a15-issues", "x": 255, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a15-mrs", "x": 325, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "rectangle", "id": "a15-who", "x": 460, "y": 1180, "width": 50, "height": 28, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "a15-gap", "x": 780, "y": 1185, "text": "entity refs, --state", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "legend-title", "x": 15, "y": 1230, "text": "Legend:", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "leg-essential", "x": 80, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#22c55e", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "leg-essential-t", "x": 105, "y": 1230, "text": "Essential", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "leg-supporting", "x": 190, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "leg-supporting-t", "x": 215, "y": 1230, "text": "Supporting", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "leg-partial", "x": 310, "y": 1228, "width": 20, "height": 20, "backgroundColor": "#ffd8a8", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "leg-partial-t", "x": 335, "y": 1230, "text": "Partially blocked", "fontSize": 14 },
|
||||||
|
{ "type": "text", "id": "leg-gap-t", "x": 470, "y": 1230, "text": "Red text = gap", "fontSize": 14, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "insight-1", "x": 15, "y": 1270, "text": "Key insight: `issues` and `search` are the workhorses (used in 20+ flows).", "fontSize": 14, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "insight-2", "x": 15, "y": 1295, "text": "`who` is critical for people questions but siloed from file-change data.", "fontSize": 14, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "insight-3", "x": 15, "y": 1320, "text": "`timeline` is powerful but keyword-only seeding limits entity-specific queries.", "fontSize": 14, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "insight-4", "x": 15, "y": 1345, "text": "22/30 flows have at least one gap. Most gaps are filter additions, not new commands.", "fontSize": 14, "strokeColor": "#ef4444" }
|
||||||
|
],
|
||||||
|
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
|
||||||
|
"files": {}
|
||||||
|
}
|
||||||
BIN
docs/diagrams/03-command-coverage.png
Normal file
BIN
docs/diagrams/03-command-coverage.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 217 KiB |
110
docs/diagrams/04-gap-priority-matrix.excalidraw
Normal file
110
docs/diagrams/04-gap-priority-matrix.excalidraw
Normal file
@@ -0,0 +1,110 @@
|
|||||||
|
{
|
||||||
|
"type": "excalidraw",
|
||||||
|
"version": 2,
|
||||||
|
"source": "https://excalidraw.com",
|
||||||
|
"elements": [
|
||||||
|
{ "type": "text", "id": "title", "x": 300, "y": 20, "text": "Lore CLI Gap Priority Matrix", "fontSize": 28 },
|
||||||
|
{ "type": "text", "id": "subtitle", "x": 310, "y": 58, "text": "20 identified gaps plotted by impact vs effort", "fontSize": 16, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "q1-zone", "x": 100, "y": 120, "width": 500, "height": 380,
|
||||||
|
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 25 },
|
||||||
|
{ "type": "text", "id": "q1-label", "x": 110, "y": 126, "text": "QUICK WINS", "fontSize": 18, "strokeColor": "#15803d" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "q2-zone", "x": 620, "y": 120, "width": 500, "height": 380,
|
||||||
|
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 25 },
|
||||||
|
{ "type": "text", "id": "q2-label", "x": 630, "y": 126, "text": "STRATEGIC", "fontSize": 18, "strokeColor": "#b45309" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "q3-zone", "x": 100, "y": 520, "width": 500, "height": 300,
|
||||||
|
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 25 },
|
||||||
|
{ "type": "text", "id": "q3-label", "x": 110, "y": 526, "text": "FILL-IN", "fontSize": 18, "strokeColor": "#1971c2" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "q4-zone", "x": 620, "y": 520, "width": 500, "height": 300,
|
||||||
|
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#ef4444", "strokeWidth": 1, "opacity": 25 },
|
||||||
|
{ "type": "text", "id": "q4-label", "x": 630, "y": 526, "text": "DEPRIORITIZE", "fontSize": 18, "strokeColor": "#c92a2a" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "y-axis-hi", "x": 30, "y": 130, "text": "HIGH\nIMPACT", "fontSize": 16, "strokeColor": "#495057", "textAlign": "center" },
|
||||||
|
{ "type": "text", "id": "y-axis-lo", "x": 30, "y": 550, "text": "LOW\nIMPACT", "fontSize": 16, "strokeColor": "#495057", "textAlign": "center" },
|
||||||
|
{ "type": "text", "id": "x-axis-lo", "x": 280, "y": 840, "text": "LOW EFFORT", "fontSize": 16, "strokeColor": "#495057" },
|
||||||
|
{ "type": "text", "id": "x-axis-hi", "x": 800, "y": 840, "text": "HIGH EFFORT", "fontSize": 16, "strokeColor": "#495057" },
|
||||||
|
|
||||||
|
{ "type": "arrow", "id": "y-arrow", "x": 85, "y": 810, "width": 0, "height": -680,
|
||||||
|
"points": [[0,0],[0,-680]], "endArrowhead": "arrow", "strokeColor": "#495057", "strokeWidth": 1 },
|
||||||
|
{ "type": "arrow", "id": "x-arrow", "x": 85, "y": 810, "width": 1050, "height": 0,
|
||||||
|
"points": [[0,0],[1050,0]], "endArrowhead": "arrow", "strokeColor": "#495057", "strokeWidth": 1 },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "g5", "x": 120, "y": 160, "width": 210, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#5 @me alias", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g8", "x": 120, "y": 225, "width": 210, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#8 --state on search", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g9", "x": 120, "y": 290, "width": 210, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#9 mrs --milestone", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g10", "x": 120, "y": 355, "width": 210, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#10 --no-assignee", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g11", "x": 350, "y": 160, "width": 230, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#11 --updated-before", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g14", "x": 350, "y": 225, "width": 230, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#14 detail --fields", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g18", "x": 350, "y": 290, "width": 230, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#18 1y/12m duration", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g20", "x": 350, "y": 355, "width": 230, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#20 sort by due date", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "g1", "x": 640, "y": 160, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#1 MR file changes", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g2", "x": 640, "y": 225, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#2 entity refs query", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g3", "x": 640, "y": 290, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#3 per-note search", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g4", "x": 880, "y": 160, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#4 entity timeline", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g6", "x": 880, "y": 225, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#6 activity feed", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g12", "x": 880, "y": 290, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffd8a8", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#12 team workload", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "g13", "x": 120, "y": 570, "width": 210, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#13 pagination/offset", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g15", "x": 120, "y": 635, "width": 210, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#15 group by project", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g19", "x": 120, "y": 700, "width": 210, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#19 participant filter", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "g7", "x": 640, "y": 570, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#7 multi-path who", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g16", "x": 640, "y": 635, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#16 trend metrics", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "g17", "x": 640, "y": 700, "width": 220, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid",
|
||||||
|
"label": { "text": "#17 --for-issue on mrs", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "q1-count", "x": 180, "y": 430, "text": "8 gaps - lowest hanging fruit", "fontSize": 14, "strokeColor": "#15803d" },
|
||||||
|
{ "type": "text", "id": "q2-count", "x": 710, "y": 370, "text": "6 gaps - build deliberately", "fontSize": 14, "strokeColor": "#b45309" },
|
||||||
|
{ "type": "text", "id": "q3-count", "x": 160, "y": 770, "text": "3 gaps - fill as needed", "fontSize": 14, "strokeColor": "#1971c2" },
|
||||||
|
{ "type": "text", "id": "q4-count", "x": 680, "y": 770, "text": "3 gaps - defer or rethink", "fontSize": 14, "strokeColor": "#c92a2a" }
|
||||||
|
],
|
||||||
|
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
|
||||||
|
"files": {}
|
||||||
|
}
|
||||||
BIN
docs/diagrams/04-gap-priority-matrix.png
Normal file
BIN
docs/diagrams/04-gap-priority-matrix.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 179 KiB |
184
docs/diagrams/05-data-flow-architecture.excalidraw
Normal file
184
docs/diagrams/05-data-flow-architecture.excalidraw
Normal file
@@ -0,0 +1,184 @@
|
|||||||
|
{
|
||||||
|
"type": "excalidraw",
|
||||||
|
"version": 2,
|
||||||
|
"source": "https://excalidraw.com",
|
||||||
|
"elements": [
|
||||||
|
{ "type": "text", "id": "title", "x": 350, "y": 15, "text": "Lore Data Flow Architecture", "fontSize": 28 },
|
||||||
|
{ "type": "text", "id": "subtitle", "x": 280, "y": 53, "text": "Green = queryable via CLI | Red = stored but hidden | Gray = internal", "fontSize": 14, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-gitlab", "x": 30, "y": 90, "width": 200, "height": 300,
|
||||||
|
"backgroundColor": "#e5dbff", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#8b5cf6", "strokeWidth": 1, "opacity": 30 },
|
||||||
|
{ "type": "text", "id": "zone-gitlab-label", "x": 55, "y": 96, "text": "GitLab APIs", "fontSize": 16, "strokeColor": "#7048e8" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "rest-api", "x": 50, "y": 130, "width": 160, "height": 60,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "REST API\n(paginated)", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "graphql-api", "x": 50, "y": 210, "width": 160, "height": 60,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "GraphQL API\n(adaptive pages)", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "ollama-api", "x": 50, "y": 310, "width": 160, "height": 60,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#d0bfff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "Ollama\n(embeddings)", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-ingest", "x": 270, "y": 90, "width": 180, "height": 300,
|
||||||
|
"backgroundColor": "#dbe4ff", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#4a9eed", "strokeWidth": 1, "opacity": 30 },
|
||||||
|
{ "type": "text", "id": "zone-ingest-label", "x": 300, "y": 96, "text": "Ingestion", "fontSize": 16, "strokeColor": "#1971c2" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "ingest-issues", "x": 285, "y": 130, "width": 150, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "Issue Sync", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "ingest-mrs", "x": 285, "y": 195, "width": 150, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "MR Sync", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "ingest-disc", "x": 285, "y": 260, "width": 150, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "Discussion Sync", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "ingest-events", "x": 285, "y": 325, "width": 150, "height": 50,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#a5d8ff", "fillStyle": "solid",
|
||||||
|
"label": { "text": "Event Sync", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "arrow", "id": "a-rest-issues", "x": 210, "y": 155, "width": 75, "height": 0,
|
||||||
|
"points": [[0,0],[75,0]], "endArrowhead": "arrow", "strokeColor": "#495057" },
|
||||||
|
{ "type": "arrow", "id": "a-rest-mrs", "x": 210, "y": 165, "width": 75, "height": 50,
|
||||||
|
"points": [[0,0],[75,50]], "endArrowhead": "arrow", "strokeColor": "#495057" },
|
||||||
|
{ "type": "arrow", "id": "a-graphql-issues", "x": 210, "y": 240, "width": 75, "height": -80,
|
||||||
|
"points": [[0,0],[75,-80]], "endArrowhead": "arrow", "strokeColor": "#495057" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-sqlite", "x": 490, "y": 90, "width": 400, "height": 650,
|
||||||
|
"backgroundColor": "#d3f9d8", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#22c55e", "strokeWidth": 1, "opacity": 20 },
|
||||||
|
{ "type": "text", "id": "zone-sqlite-label", "x": 570, "y": 96, "text": "SQLite (WAL mode)", "fontSize": 16, "strokeColor": "#15803d" },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "grp-queryable", "x": 500, "y": 120, "text": "Queryable Tables", "fontSize": 14, "strokeColor": "#15803d" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "t-projects", "x": 500, "y": 145, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "projects", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-issues", "x": 500, "y": 195, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "issues + assignees", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-mrs", "x": 500, "y": 245, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "merge_requests", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-discussions", "x": 500, "y": 295, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "discussions + notes", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-events", "x": 500, "y": 345, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "resource_*_events", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-docs", "x": 500, "y": 395, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "documents + FTS5", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-embed", "x": 500, "y": 445, "width": 170, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#b2f2bb", "fillStyle": "solid",
|
||||||
|
"label": { "text": "embeddings (vec)", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "grp-hidden", "x": 700, "y": 120, "text": "Hidden Tables", "fontSize": 14, "strokeColor": "#c92a2a" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "t-file-changes", "x": 695, "y": 145, "width": 180, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "mr_file_changes", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-entity-refs", "x": 695, "y": 195, "width": 180, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "entity_references", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-raw", "x": 695, "y": 245, "width": 180, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444",
|
||||||
|
"label": { "text": "raw_payloads", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "grp-internal", "x": 700, "y": 310, "text": "Internal Only", "fontSize": 14, "strokeColor": "#868e96" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "t-sync", "x": 695, "y": 340, "width": 180, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
|
||||||
|
"label": { "text": "sync_runs + cursors", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-dirty", "x": 695, "y": 390, "width": 180, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
|
||||||
|
"label": { "text": "dirty_sources", "fontSize": 14 } },
|
||||||
|
{ "type": "rectangle", "id": "t-locks", "x": 695, "y": 440, "width": 180, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96",
|
||||||
|
"label": { "text": "app_locks", "fontSize": 14 } },
|
||||||
|
|
||||||
|
{ "type": "arrow", "id": "a-ingest-tables", "x": 435, "y": 200, "width": 55, "height": 0,
|
||||||
|
"points": [[0,0],[55,0]], "endArrowhead": "arrow", "strokeColor": "#495057" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "zone-cli", "x": 930, "y": 90, "width": 250, "height": 650,
|
||||||
|
"backgroundColor": "#fff3bf", "fillStyle": "solid", "roundness": { "type": 3 },
|
||||||
|
"strokeColor": "#f59e0b", "strokeWidth": 1, "opacity": 25 },
|
||||||
|
{ "type": "text", "id": "zone-cli-label", "x": 990, "y": 96, "text": "CLI Commands", "fontSize": 16, "strokeColor": "#b45309" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "cmd-issues", "x": 950, "y": 130, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore issues", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-mrs", "x": 950, "y": 185, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore mrs", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-search", "x": 950, "y": 240, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore search", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-who", "x": 950, "y": 295, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore who", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-timeline", "x": 950, "y": 350, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore timeline", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-count", "x": 950, "y": 405, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore count", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-sync", "x": 950, "y": 460, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore sync", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-status", "x": 950, "y": 515, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#fff3bf", "fillStyle": "solid",
|
||||||
|
"label": { "text": "lore status", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "arrow", "id": "a-issues-cmd", "x": 670, "y": 215, "width": 270, "height": -65,
|
||||||
|
"points": [[0,0],[270,-65]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
|
||||||
|
{ "type": "arrow", "id": "a-mrs-cmd", "x": 670, "y": 265, "width": 270, "height": -60,
|
||||||
|
"points": [[0,0],[270,-60]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
|
||||||
|
{ "type": "arrow", "id": "a-docs-cmd", "x": 670, "y": 415, "width": 270, "height": -155,
|
||||||
|
"points": [[0,0],[270,-155]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
|
||||||
|
{ "type": "arrow", "id": "a-embed-cmd", "x": 670, "y": 465, "width": 270, "height": -200,
|
||||||
|
"points": [[0,0],[270,-200]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
|
||||||
|
{ "type": "arrow", "id": "a-events-cmd", "x": 670, "y": 365, "width": 270, "height": 5,
|
||||||
|
"points": [[0,0],[270,5]], "endArrowhead": "arrow", "strokeColor": "#22c55e", "strokeWidth": 2 },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "hidden-note-1", "x": 695, "y": 498, "text": "mr_file_changes: populated by\nMR sync but NOT queryable.\nBlocks H4, A6, A10 flows.", "fontSize": 14, "strokeColor": "#ef4444" },
|
||||||
|
{ "type": "text", "id": "hidden-note-2", "x": 695, "y": 568, "text": "entity_references: used by\ntimeline internally but NOT\nqueryable. Blocks A5, A11.", "fontSize": 14, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "arrow", "id": "a-hidden-who", "x": 875, "y": 165, "width": 65, "height": 148,
|
||||||
|
"points": [[0,0],[65,148]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeWidth": 2,
|
||||||
|
"strokeStyle": "dashed" },
|
||||||
|
{ "type": "text", "id": "hidden-who-label", "x": 880, "y": 240, "text": "who uses\nDiffNotes,\nnot file\nchanges", "fontSize": 12, "strokeColor": "#ef4444" },
|
||||||
|
|
||||||
|
{ "type": "arrow", "id": "a-hidden-timeline", "x": 875, "y": 215, "width": 65, "height": 155,
|
||||||
|
"points": [[0,0],[65,155]], "endArrowhead": "arrow", "strokeColor": "#ef4444", "strokeWidth": 2,
|
||||||
|
"strokeStyle": "dashed" },
|
||||||
|
|
||||||
|
{ "type": "rectangle", "id": "cmd-missing-refs", "x": 950, "y": 580, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
|
||||||
|
"label": { "text": "lore refs (missing)", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-missing-files", "x": 950, "y": 635, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
|
||||||
|
"label": { "text": "lore files (missing)", "fontSize": 16 } },
|
||||||
|
{ "type": "rectangle", "id": "cmd-missing-activity", "x": 950, "y": 690, "width": 210, "height": 40,
|
||||||
|
"roundness": { "type": 3 }, "backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed",
|
||||||
|
"label": { "text": "lore activity (missing)", "fontSize": 16 } },
|
||||||
|
|
||||||
|
{ "type": "text", "id": "legend-title", "x": 30, "y": 430, "text": "Legend", "fontSize": 16 },
|
||||||
|
{ "type": "rectangle", "id": "leg-green", "x": 30, "y": 460, "width": 20, "height": 20,
|
||||||
|
"backgroundColor": "#b2f2bb", "fillStyle": "solid" },
|
||||||
|
{ "type": "text", "id": "leg-green-t", "x": 60, "y": 462, "text": "Queryable via CLI", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "leg-red", "x": 30, "y": 490, "width": 20, "height": 20,
|
||||||
|
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444" },
|
||||||
|
{ "type": "text", "id": "leg-red-t", "x": 60, "y": 492, "text": "Stored but hidden", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "leg-gray", "x": 30, "y": 520, "width": 20, "height": 20,
|
||||||
|
"backgroundColor": "#dee2e6", "fillStyle": "solid", "strokeColor": "#868e96" },
|
||||||
|
{ "type": "text", "id": "leg-gray-t", "x": 60, "y": 522, "text": "Internal bookkeeping", "fontSize": 14 },
|
||||||
|
{ "type": "rectangle", "id": "leg-dashed", "x": 30, "y": 550, "width": 20, "height": 20,
|
||||||
|
"backgroundColor": "#ffc9c9", "fillStyle": "solid", "strokeColor": "#ef4444", "strokeStyle": "dashed" },
|
||||||
|
{ "type": "text", "id": "leg-dashed-t", "x": 60, "y": 552, "text": "Missing command", "fontSize": 14 }
|
||||||
|
],
|
||||||
|
"appState": { "viewBackgroundColor": "#ffffff", "gridSize": null },
|
||||||
|
"files": {}
|
||||||
|
}
|
||||||
BIN
docs/diagrams/05-data-flow-architecture.png
Normal file
BIN
docs/diagrams/05-data-flow-architecture.png
Normal file
Binary file not shown.
|
After Width: | Height: | Size: 238 KiB |
308
docs/embedding-pipeline-hardening.md
Normal file
308
docs/embedding-pipeline-hardening.md
Normal file
@@ -0,0 +1,308 @@
|
|||||||
|
# Embedding Pipeline Hardening: Chunk Config Drift, Adaptive Dedup, Full Flag Wiring
|
||||||
|
|
||||||
|
> **Status:** Proposed
|
||||||
|
> **Date:** 2026-02-02
|
||||||
|
> **Context:** Reduced CHUNK_MAX_BYTES from 32KB to 6KB to prevent Ollama context window overflow. This plan addresses the downstream consequences of that change.
|
||||||
|
|
||||||
|
## Problem Statement
|
||||||
|
|
||||||
|
Three issues stem from the chunk size reduction:
|
||||||
|
|
||||||
|
1. **Broken `--full` wiring**: `handle_embed` in main.rs ignores `args.full` (calls `run_embed` instead of `run_embed_full`). `run_sync` hardcodes `false` for retry_failed and never passes `options.full` to embed. Users running `lore sync --full` or `lore embed --full` don't get a full re-embed.
|
||||||
|
|
||||||
|
2. **Mixed chunk sizes in vector space**: Existing embeddings (32KB chunks) coexist with new embeddings (6KB chunks). These are semantically incomparable -- different granularity vectors in the same KNN space degrade search quality. No mechanism detects this drift.
|
||||||
|
|
||||||
|
3. **Static dedup multiplier**: `search_vector` uses `limit * 8` to over-fetch for dedup. With smaller chunks producing 5-6 chunks per document, clustered search results can exhaust slots before reaching `limit` unique documents. The multiplier should adapt to actual data.
|
||||||
|
|
||||||
|
## Decision Record
|
||||||
|
|
||||||
|
| Decision | Choice | Rationale |
|
||||||
|
|----------|--------|-----------|
|
||||||
|
| Detect chunk config drift | Store `chunk_max_bytes` in `embedding_metadata` | Allows automatic invalidation without user intervention. Self-heals on next sync. |
|
||||||
|
| Dedup multiplier strategy | Adaptive from DB with static floor | One cheap aggregate query per search. Self-adjusts as data grows. No wasted KNN budget. |
|
||||||
|
| `--full` propagation | `sync --full` passes full to embed step | Matches user expectation: "start fresh" means everything, not just ingest+docs. |
|
||||||
|
| Migration strategy | New migration 010 for `chunk_max_bytes` column | Non-breaking additive change. NULL values = "unknown config" treated as needing re-embed. |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Changes
|
||||||
|
|
||||||
|
### Change 1: Wire `--full` flag through to embed
|
||||||
|
|
||||||
|
**Files:**
|
||||||
|
- `src/main.rs` (line 1116)
|
||||||
|
- `src/cli/commands/sync.rs` (line 105)
|
||||||
|
|
||||||
|
**main.rs `handle_embed`** (line 1116):
|
||||||
|
```rust
|
||||||
|
// BEFORE:
|
||||||
|
let result = run_embed(&config, retry_failed).await?;
|
||||||
|
|
||||||
|
// AFTER:
|
||||||
|
let result = run_embed_full(&config, args.full, retry_failed).await?;
|
||||||
|
```
|
||||||
|
|
||||||
|
Update the import at top of main.rs from `run_embed` to `run_embed_full`.
|
||||||
|
|
||||||
|
**sync.rs `run_sync`** (line 105):
|
||||||
|
```rust
|
||||||
|
// BEFORE:
|
||||||
|
match run_embed(config, false).await {
|
||||||
|
|
||||||
|
// AFTER:
|
||||||
|
match run_embed_full(config, options.full, false).await {
|
||||||
|
```
|
||||||
|
|
||||||
|
Update the import at line 11 from `run_embed` to `run_embed_full`.
|
||||||
|
|
||||||
|
**Cleanup `embed.rs`**: Remove `run_embed` (the wrapper that hardcodes `full: false`). All callers should use `run_embed_full` directly. Rename `run_embed_full` to `run_embed` with the 3-arg signature `(config, full, retry_failed)`.
|
||||||
|
|
||||||
|
Final signature:
|
||||||
|
```rust
|
||||||
|
pub async fn run_embed(
|
||||||
|
config: &Config,
|
||||||
|
full: bool,
|
||||||
|
retry_failed: bool,
|
||||||
|
) -> Result<EmbedCommandResult>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Change 2: Migration 010 -- add `chunk_max_bytes` to `embedding_metadata`
|
||||||
|
|
||||||
|
**New file:** `migrations/010_chunk_config.sql`
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Migration 010: Chunk config tracking
|
||||||
|
-- Schema version: 10
|
||||||
|
-- Adds chunk_max_bytes to embedding_metadata for drift detection.
|
||||||
|
-- Existing rows get NULL, which the change detector treats as "needs re-embed".
|
||||||
|
|
||||||
|
ALTER TABLE embedding_metadata ADD COLUMN chunk_max_bytes INTEGER;
|
||||||
|
|
||||||
|
UPDATE schema_version SET version = 10
|
||||||
|
WHERE version = (SELECT MAX(version) FROM schema_version);
|
||||||
|
-- Or if using INSERT pattern:
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (10, strftime('%s', 'now') * 1000, 'Add chunk_max_bytes to embedding_metadata for config drift detection');
|
||||||
|
```
|
||||||
|
|
||||||
|
Check existing migration pattern in `src/core/db.rs` for how migrations are applied -- follow that exact pattern for consistency.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Change 3: Store `chunk_max_bytes` when writing embeddings
|
||||||
|
|
||||||
|
**File:** `src/embedding/pipeline.rs`
|
||||||
|
|
||||||
|
**`store_embedding`** (lines 238-266): Add `chunk_max_bytes` to the INSERT:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// Add import at top:
|
||||||
|
use crate::embedding::chunking::CHUNK_MAX_BYTES;
|
||||||
|
|
||||||
|
// In store_embedding, update SQL:
|
||||||
|
conn.execute(
|
||||||
|
"INSERT OR REPLACE INTO embedding_metadata
|
||||||
|
(document_id, chunk_index, model, dims, document_hash, chunk_hash,
|
||||||
|
created_at, attempt_count, last_error, chunk_max_bytes)
|
||||||
|
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, 1, NULL, ?8)",
|
||||||
|
rusqlite::params![
|
||||||
|
doc_id, chunk_index as i64, model_name, EXPECTED_DIMS as i64,
|
||||||
|
doc_hash, chunk_hash, now, CHUNK_MAX_BYTES as i64
|
||||||
|
],
|
||||||
|
)?;
|
||||||
|
```
|
||||||
|
|
||||||
|
**`record_embedding_error`** (lines 269-291): Also store `chunk_max_bytes` so error rows track which config they failed under:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
conn.execute(
|
||||||
|
"INSERT INTO embedding_metadata
|
||||||
|
(document_id, chunk_index, model, dims, document_hash, chunk_hash,
|
||||||
|
created_at, attempt_count, last_error, last_attempt_at, chunk_max_bytes)
|
||||||
|
VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, 1, ?8, ?7, ?9)
|
||||||
|
ON CONFLICT(document_id, chunk_index) DO UPDATE SET
|
||||||
|
attempt_count = embedding_metadata.attempt_count + 1,
|
||||||
|
last_error = ?8,
|
||||||
|
last_attempt_at = ?7,
|
||||||
|
chunk_max_bytes = ?9",
|
||||||
|
rusqlite::params![
|
||||||
|
doc_id, chunk_index as i64, model_name, EXPECTED_DIMS as i64,
|
||||||
|
doc_hash, chunk_hash, now, error, CHUNK_MAX_BYTES as i64
|
||||||
|
],
|
||||||
|
)?;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Change 4: Detect chunk config drift in change detector
|
||||||
|
|
||||||
|
**File:** `src/embedding/change_detector.rs`
|
||||||
|
|
||||||
|
Add a third condition to the pending detection: embeddings where `chunk_max_bytes` differs from the current `CHUNK_MAX_BYTES` constant (or is NULL, meaning pre-migration embeddings).
|
||||||
|
|
||||||
|
```rust
|
||||||
|
use crate::embedding::chunking::CHUNK_MAX_BYTES;
|
||||||
|
|
||||||
|
pub fn find_pending_documents(
|
||||||
|
conn: &Connection,
|
||||||
|
page_size: usize,
|
||||||
|
last_id: i64,
|
||||||
|
) -> Result<Vec<PendingDocument>> {
|
||||||
|
let sql = r#"
|
||||||
|
SELECT d.id, d.content_text, d.content_hash
|
||||||
|
FROM documents d
|
||||||
|
WHERE d.id > ?1
|
||||||
|
AND (
|
||||||
|
-- Case 1: No embedding metadata (new document)
|
||||||
|
NOT EXISTS (
|
||||||
|
SELECT 1 FROM embedding_metadata em
|
||||||
|
WHERE em.document_id = d.id AND em.chunk_index = 0
|
||||||
|
)
|
||||||
|
-- Case 2: Document content changed
|
||||||
|
OR EXISTS (
|
||||||
|
SELECT 1 FROM embedding_metadata em
|
||||||
|
WHERE em.document_id = d.id AND em.chunk_index = 0
|
||||||
|
AND em.document_hash != d.content_hash
|
||||||
|
)
|
||||||
|
-- Case 3: Chunk config drift (different chunk size or pre-migration NULL)
|
||||||
|
OR EXISTS (
|
||||||
|
SELECT 1 FROM embedding_metadata em
|
||||||
|
WHERE em.document_id = d.id AND em.chunk_index = 0
|
||||||
|
AND (em.chunk_max_bytes IS NULL OR em.chunk_max_bytes != ?3)
|
||||||
|
)
|
||||||
|
)
|
||||||
|
ORDER BY d.id
|
||||||
|
LIMIT ?2
|
||||||
|
"#;
|
||||||
|
|
||||||
|
let mut stmt = conn.prepare(sql)?;
|
||||||
|
let rows = stmt
|
||||||
|
.query_map(
|
||||||
|
rusqlite::params![last_id, page_size as i64, CHUNK_MAX_BYTES as i64],
|
||||||
|
|row| {
|
||||||
|
Ok(PendingDocument {
|
||||||
|
document_id: row.get(0)?,
|
||||||
|
content_text: row.get(1)?,
|
||||||
|
content_hash: row.get(2)?,
|
||||||
|
})
|
||||||
|
},
|
||||||
|
)?
|
||||||
|
.collect::<std::result::Result<Vec<_>, _>>()?;
|
||||||
|
|
||||||
|
Ok(rows)
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Apply the same change to `count_pending_documents` -- add the third OR clause and the `?3` parameter.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Change 5: Adaptive dedup multiplier in vector search
|
||||||
|
|
||||||
|
**File:** `src/search/vector.rs`
|
||||||
|
|
||||||
|
Replace the static `limit * 8` with an adaptive multiplier based on the actual max chunks-per-document in the database.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
/// Query the max chunks any single document has in the embedding table.
|
||||||
|
/// Returns the max chunk count, or a default floor if no data exists.
|
||||||
|
fn max_chunks_per_document(conn: &Connection) -> i64 {
|
||||||
|
conn.query_row(
|
||||||
|
"SELECT COALESCE(MAX(cnt), 1) FROM (
|
||||||
|
SELECT COUNT(*) as cnt FROM embedding_metadata
|
||||||
|
WHERE last_error IS NULL
|
||||||
|
GROUP BY document_id
|
||||||
|
)",
|
||||||
|
[],
|
||||||
|
|row| row.get(0),
|
||||||
|
)
|
||||||
|
.unwrap_or(1)
|
||||||
|
}
|
||||||
|
|
||||||
|
pub fn search_vector(
|
||||||
|
conn: &Connection,
|
||||||
|
query_embedding: &[f32],
|
||||||
|
limit: usize,
|
||||||
|
) -> Result<Vec<VectorResult>> {
|
||||||
|
if query_embedding.is_empty() || limit == 0 {
|
||||||
|
return Ok(Vec::new());
|
||||||
|
}
|
||||||
|
|
||||||
|
let embedding_bytes: Vec<u8> = query_embedding
|
||||||
|
.iter()
|
||||||
|
.flat_map(|f| f.to_le_bytes())
|
||||||
|
.collect();
|
||||||
|
|
||||||
|
// Adaptive over-fetch: use actual max chunks per doc, with floor of 8x
|
||||||
|
// The 1.5x safety margin handles clustering in KNN results
|
||||||
|
let max_chunks = max_chunks_per_document(conn);
|
||||||
|
let multiplier = (max_chunks as usize * 3 / 2).max(8);
|
||||||
|
let k = limit * multiplier;
|
||||||
|
|
||||||
|
// ... rest unchanged ...
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Why `max_chunks * 1.5` with floor of 8**:
|
||||||
|
- `max_chunks` is the worst case for a single document dominating results
|
||||||
|
- `* 1.5` adds margin for multiple clustered documents
|
||||||
|
- Floor of `8` ensures reasonable over-fetch even with single-chunk documents
|
||||||
|
- This is a single aggregate query on an indexed column -- sub-millisecond
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### Change 6: Update chunk_ids.rs comment
|
||||||
|
|
||||||
|
**File:** `src/embedding/chunk_ids.rs` (line 1-3)
|
||||||
|
|
||||||
|
Update the comment to reflect current reality:
|
||||||
|
```rust
|
||||||
|
/// Multiplier for encoding (document_id, chunk_index) into a single rowid.
|
||||||
|
/// Supports up to 1000 chunks per document. At CHUNK_MAX_BYTES=6000,
|
||||||
|
/// a 2MB document (MAX_DOCUMENT_BYTES_HARD) produces ~333 chunks.
|
||||||
|
pub const CHUNK_ROWID_MULTIPLIER: i64 = 1000;
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Modified (Summary)
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `migrations/010_chunk_config.sql` | **NEW** -- Add `chunk_max_bytes` column |
|
||||||
|
| `src/embedding/pipeline.rs` | Store `CHUNK_MAX_BYTES` in metadata writes |
|
||||||
|
| `src/embedding/change_detector.rs` | Detect chunk config drift (3rd OR clause) |
|
||||||
|
| `src/search/vector.rs` | Adaptive dedup multiplier from DB |
|
||||||
|
| `src/cli/commands/embed.rs` | Consolidate to single `run_embed(config, full, retry_failed)` |
|
||||||
|
| `src/cli/commands/sync.rs` | Pass `options.full` to embed, update import |
|
||||||
|
| `src/main.rs` | Call `run_embed` with `args.full`, update import |
|
||||||
|
| `src/embedding/chunk_ids.rs` | Comment update only |
|
||||||
|
|
||||||
|
## Verification
|
||||||
|
|
||||||
|
1. **Compile check**: `cargo build` -- no errors
|
||||||
|
2. **Unit tests**: `cargo test` -- all existing tests pass
|
||||||
|
3. **Migration test**: Run `lore doctor` or `lore migrate` -- migration 010 applies cleanly
|
||||||
|
4. **Full flag wiring**: `lore embed --full` should clear all embeddings and re-embed. Verify by checking `lore --robot stats` before and after (embedded count should reset then rebuild).
|
||||||
|
5. **Chunk config drift**: After migration, existing embeddings have `chunk_max_bytes = NULL`. Running `lore embed` (without --full) should detect all existing embeddings as stale and re-embed them automatically.
|
||||||
|
6. **Sync propagation**: `lore sync --full` should produce the same embed behavior as `lore embed --full`
|
||||||
|
7. **Adaptive dedup**: Run `lore search "some query"` and verify the result count matches the requested limit (default 20). Check with `RUST_LOG=debug` that the computed `k` value scales with actual chunk distribution.
|
||||||
|
|
||||||
|
## Decision Record (for future reference)
|
||||||
|
|
||||||
|
**Date:** 2026-02-02
|
||||||
|
**Trigger:** Reduced CHUNK_MAX_BYTES from 32KB to 6KB to prevent Ollama nomic-embed-text context window overflow (8192 tokens).
|
||||||
|
|
||||||
|
**Downstream consequences identified:**
|
||||||
|
1. Chunk ID headroom reduced (1000 slots, now ~333 used for 2MB docs) -- acceptable, no action needed
|
||||||
|
2. Vector search dedup pressure increased 5x -- fixed with adaptive multiplier
|
||||||
|
3. Embedding DB grows ~5x -- acceptable at current scale (~7.5MB)
|
||||||
|
4. Mixed chunk sizes degrade search -- fixed with config drift detection
|
||||||
|
5. Ollama API call volume increases proportionally -- acceptable for local model
|
||||||
|
|
||||||
|
**Rejected alternatives:**
|
||||||
|
- Two-phase KNN fetch (fetch, check, re-fetch with higher k): adds code complexity for marginal improvement over adaptive. sqlite-vec doesn't support OFFSET in KNN queries, requiring full re-query.
|
||||||
|
- Generous static multiplier (15x): wastes KNN budget on datasets where documents are small. Over-allocates permanently instead of adapting.
|
||||||
|
- Manual `--full` as the only drift remedy: requires users to understand chunk config internals. Violates principle of least surprise.
|
||||||
66
docs/ideas/README.md
Normal file
66
docs/ideas/README.md
Normal file
@@ -0,0 +1,66 @@
|
|||||||
|
# Gitlore Feature Ideas
|
||||||
|
|
||||||
|
Central registry of potential features. Each idea leverages data already ingested
|
||||||
|
into the local SQLite database (issues, MRs, discussions, notes, resource events,
|
||||||
|
entity references, embeddings, file changes).
|
||||||
|
|
||||||
|
## Priority Tiers
|
||||||
|
|
||||||
|
**Tier 1 — High confidence, low effort, immediate value:**
|
||||||
|
|
||||||
|
| # | Idea | File | Confidence |
|
||||||
|
|---|------|------|------------|
|
||||||
|
| 9 | Similar Issues Finder | [similar-issues.md](similar-issues.md) | 95% |
|
||||||
|
| 17 | "What Changed?" Digest | [digest.md](digest.md) | 93% |
|
||||||
|
| 5 | Who Knows About X? | [experts.md](experts.md) | 92% |
|
||||||
|
| -- | Multi-Project Ergonomics | [project-ergonomics.md](project-ergonomics.md) | 90% |
|
||||||
|
| 27 | Weekly Digest Generator | [weekly-digest.md](weekly-digest.md) | 90% |
|
||||||
|
| 4 | Stale Discussion Finder | [stale-discussions.md](stale-discussions.md) | 90% |
|
||||||
|
|
||||||
|
**Tier 2 — Strong ideas, moderate effort:**
|
||||||
|
|
||||||
|
| # | Idea | File | Confidence |
|
||||||
|
|---|------|------|------------|
|
||||||
|
| 19 | MR-to-Issue Closure Gap | [closure-gaps.md](closure-gaps.md) | 88% |
|
||||||
|
| 1 | Contributor Heatmap | [contributors.md](contributors.md) | 88% |
|
||||||
|
| 21 | Knowledge Silo Detection | [silos.md](silos.md) | 87% |
|
||||||
|
| 2 | Review Bottleneck Detector | [bottlenecks.md](bottlenecks.md) | 85% |
|
||||||
|
| 14 | File Hotspot Report | [hotspots.md](hotspots.md) | 85% |
|
||||||
|
| 26 | Unlinked MR Finder | [unlinked.md](unlinked.md) | 83% |
|
||||||
|
| 6 | Decision Archaeology | [decisions.md](decisions.md) | 82% |
|
||||||
|
| 18 | Label Hygiene Audit | [label-audit.md](label-audit.md) | 82% |
|
||||||
|
|
||||||
|
**Tier 3 — Promising, needs more design work:**
|
||||||
|
|
||||||
|
| # | Idea | File | Confidence |
|
||||||
|
|---|------|------|------------|
|
||||||
|
| 29 | Entity Relationship Explorer | [graph.md](graph.md) | 80% |
|
||||||
|
| 12 | Milestone Risk Report | [milestone-risk.md](milestone-risk.md) | 78% |
|
||||||
|
| 3 | Label Velocity | [label-flow.md](label-flow.md) | 78% |
|
||||||
|
| 24 | Recurring Bug Patterns | [recurring-patterns.md](recurring-patterns.md) | 76% |
|
||||||
|
| 7 | Cross-Project Impact Graph | [impact-graph.md](impact-graph.md) | 75% |
|
||||||
|
| 16 | Idle Work Detector | [idle.md](idle.md) | 73% |
|
||||||
|
| 8 | MR Churn Analysis | [churn.md](churn.md) | 72% |
|
||||||
|
| 15 | Author Collaboration Network | [collaboration.md](collaboration.md) | 70% |
|
||||||
|
| 28 | DiffNote Coverage Map | [review-coverage.md](review-coverage.md) | 75% |
|
||||||
|
| 25 | MR Pipeline Efficiency | [mr-pipeline.md](mr-pipeline.md) | 78% |
|
||||||
|
|
||||||
|
## Rejected Ideas (with reasons)
|
||||||
|
|
||||||
|
| # | Idea | Reason |
|
||||||
|
|---|------|--------|
|
||||||
|
| 10 | Sprint Burndown from Labels | Too opinionated about label semantics |
|
||||||
|
| 11 | Code Review Quality Score | Subjective "quality" scoring creates perverse incentives |
|
||||||
|
| 13 | Discussion Sentiment Drift | Unreliable heuristic sentiment on technical text |
|
||||||
|
| 20 | Response Time Leaderboard | Toxic "leaderboard" framing; metric folded into #2 |
|
||||||
|
| 22 | Timeline Diff | Niche use case; timeline already interleaves events |
|
||||||
|
| 23 | Discussion Thread Summarizer | Requires LLM inference; out of scope for local-first tool |
|
||||||
|
| 30 | NL Query Interface | Over-engineered; existing filters cover this |
|
||||||
|
|
||||||
|
## How to use this list
|
||||||
|
|
||||||
|
1. Pick an idea from Tier 1 or Tier 2
|
||||||
|
2. Read its detail file for implementation plan and SQL sketches
|
||||||
|
3. Create a bead (`br create`) referencing the idea file
|
||||||
|
4. Implement following TDD (test first, then minimal impl)
|
||||||
|
5. Update the idea file with `status: implemented` when done
|
||||||
555
docs/ideas/SYSTEM-PROPOSAL.md
Normal file
555
docs/ideas/SYSTEM-PROPOSAL.md
Normal file
@@ -0,0 +1,555 @@
|
|||||||
|
# Project Manager System — Design Proposal
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
We have a growing backlog of ideas and issues in markdown files. Agents can ship
|
||||||
|
features in under an hour. The constraint isn't execution speed — it's knowing
|
||||||
|
WHAT to execute NEXT, in what ORDER, and detecting when the plan needs to change.
|
||||||
|
|
||||||
|
We need a system that:
|
||||||
|
1. Automatically scores and sequences work items
|
||||||
|
2. Detects when scope changes during spec generation
|
||||||
|
3. Tracks the full lifecycle: idea → spec → beads → shipped
|
||||||
|
4. Re-triages instantly when the dependency graph changes
|
||||||
|
5. Runs in seconds, not minutes
|
||||||
|
|
||||||
|
## Architecture
|
||||||
|
|
||||||
|
```
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ docs/ideas/*.md │
|
||||||
|
│ docs/issues/*.md │
|
||||||
|
│ (YAML frontmatter) │
|
||||||
|
└──────────────────────────┬──────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ IDEA TRIAGE SKILL │
|
||||||
|
│ │
|
||||||
|
│ Phase 1: INGEST — parse all frontmatter │
|
||||||
|
│ Phase 2: VALIDATE — check refs, detect staleness │
|
||||||
|
│ Phase 3: EVALUATE — detect scope changes since last run │
|
||||||
|
│ Phase 4: SCORE — compute priority with unlock graph │
|
||||||
|
│ Phase 5: SEQUENCE — topological sort by dependency + score │
|
||||||
|
│ Phase 6: RECOMMEND — top 3 + unlock advisories + warnings │
|
||||||
|
└──────────────────────────┬──────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ HUMAN DECIDES │
|
||||||
|
│ (picks from top 3, takes seconds) │
|
||||||
|
└──────────────────────────┬──────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ SPEC GENERATION (Claude/GPT) │
|
||||||
|
│ Takes the idea doc, generates detailed implementation spec │
|
||||||
|
│ ALSO: re-evaluates frontmatter fields based on deeper │
|
||||||
|
│ understanding. Updates effort, blocked-by, components. │
|
||||||
|
│ This is the SCOPE CHANGE DETECTION point. │
|
||||||
|
└──────────────────────────┬──────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ PLAN-TO-BEADS (existing skill) │
|
||||||
|
│ Spec → granular beads with dependencies via br CLI │
|
||||||
|
│ Links bead IDs back into the idea frontmatter │
|
||||||
|
└──────────────────────────┬──────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ AGENT IMPLEMENTATION │
|
||||||
|
│ Works beads via br/bv workflow │
|
||||||
|
│ bv --robot-triage handles execution-phase prioritization │
|
||||||
|
└──────────────────────────┬──────────────────────────────────┘
|
||||||
|
│
|
||||||
|
▼
|
||||||
|
┌─────────────────────────────────────────────────────────────┐
|
||||||
|
│ COMPLETION & RE-TRIAGE │
|
||||||
|
│ Beads close → idea status updates to implemented │
|
||||||
|
│ Skill re-runs → newly unblocked ideas surface │
|
||||||
|
│ Loop back to top │
|
||||||
|
└─────────────────────────────────────────────────────────────┘
|
||||||
|
```
|
||||||
|
|
||||||
|
## The Two Systems and Their Boundary
|
||||||
|
|
||||||
|
| Concern | Ideas System (new) | Beads System (existing) |
|
||||||
|
|---------|-------------------|------------------------|
|
||||||
|
| Phase | Pre-commitment (what to build) | Execution (how to build) |
|
||||||
|
| Data | docs/ideas/*.md, docs/issues/*.md | .beads/issues.jsonl |
|
||||||
|
| Triage | Idea triage skill | bv --robot-triage |
|
||||||
|
| Tracking | YAML frontmatter | JSONL records |
|
||||||
|
| Granularity | Feature-level | Task-level |
|
||||||
|
| Lifecycle | proposed → specced → promoted | open → in_progress → closed |
|
||||||
|
|
||||||
|
**The handoff point is promotion.** An idea becomes one or more beads. After that,
|
||||||
|
the ideas system only tracks the idea's status (promoted/implemented). Beads owns
|
||||||
|
execution.
|
||||||
|
|
||||||
|
An idea file is NEVER deleted. It's a permanent design record. Even after
|
||||||
|
implementation, it documents WHY the feature was built and what tradeoffs were made.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Data Model
|
||||||
|
|
||||||
|
### Frontmatter Schema
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
---
|
||||||
|
# ── Identity ──
|
||||||
|
id: idea-009 # stable unique identifier
|
||||||
|
title: Similar Issues Finder
|
||||||
|
type: idea # idea | issue
|
||||||
|
status: proposed # see lifecycle below
|
||||||
|
|
||||||
|
# ── Timestamps ──
|
||||||
|
created: 2026-02-09
|
||||||
|
updated: 2026-02-09
|
||||||
|
eval-hash: null # SHA of scoring fields at last triage run
|
||||||
|
|
||||||
|
# ── Scoring Inputs ──
|
||||||
|
impact: high # high | medium | low
|
||||||
|
effort: small # small | medium | large | xlarge
|
||||||
|
severity: null # critical | high | medium | low (issues only)
|
||||||
|
autonomy: full # full | needs-design | needs-human
|
||||||
|
|
||||||
|
# ── Dependency Graph ──
|
||||||
|
blocked-by: [] # IDs of ideas/issues that must complete first
|
||||||
|
unlocks: # IDs that become possible/better after this ships
|
||||||
|
- idea-recurring-patterns
|
||||||
|
requires: [] # external prerequisites (gate names)
|
||||||
|
related: # soft links, not blocking
|
||||||
|
- issue-001
|
||||||
|
|
||||||
|
# ── Implementation Context ──
|
||||||
|
components: # source code paths this will touch
|
||||||
|
- src/search/
|
||||||
|
- src/embedding/
|
||||||
|
command: lore similar # proposed CLI command (null for issues)
|
||||||
|
has-spec: false # detailed spec has been generated
|
||||||
|
spec-path: null # path to spec doc if it exists
|
||||||
|
beads: [] # bead IDs after promotion
|
||||||
|
|
||||||
|
# ── Classification ──
|
||||||
|
tags:
|
||||||
|
- embeddings
|
||||||
|
- search
|
||||||
|
---
|
||||||
|
```
|
||||||
|
|
||||||
|
### Status Lifecycle
|
||||||
|
|
||||||
|
```
|
||||||
|
IDEA lifecycle:
|
||||||
|
proposed ──→ accepted ──→ specced ──→ promoted ──→ implemented
|
||||||
|
│ │
|
||||||
|
└──→ rejected └──→ (scope changed, back to accepted)
|
||||||
|
|
||||||
|
ISSUE lifecycle:
|
||||||
|
open ──→ accepted ──→ specced ──→ promoted ──→ resolved
|
||||||
|
│
|
||||||
|
└──→ wontfix
|
||||||
|
```
|
||||||
|
|
||||||
|
Transitions:
|
||||||
|
- `proposed → accepted`: Human confirms this is worth building
|
||||||
|
- `accepted → specced`: Detailed implementation spec has been generated
|
||||||
|
- `specced → promoted`: Beads created from the spec
|
||||||
|
- `promoted → implemented`: All beads closed
|
||||||
|
- Any → `rejected`/`wontfix`: Decided not to build (with reason in body)
|
||||||
|
- `specced → accepted`: Scope changed during spec, needs re-evaluation
|
||||||
|
|
||||||
|
### Effort Calibration (Agent-Executed)
|
||||||
|
|
||||||
|
| Level | Wall Clock | Autonomy | Example |
|
||||||
|
|-------|-----------|----------|---------|
|
||||||
|
| small | ~30 min | Agent ships end-to-end | stale-discussions, closure-gaps |
|
||||||
|
| medium | ~1 hour | Agent ships end-to-end | similar-issues, digest |
|
||||||
|
| large | 1-2 hours | May need one design decision | recurring-patterns, experts |
|
||||||
|
| xlarge | 2+ hours | Needs human architecture input | project groups |
|
||||||
|
|
||||||
|
### Gates Registry (docs/gates.yaml)
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
gates:
|
||||||
|
gate-1:
|
||||||
|
title: Resource Events Ingestion
|
||||||
|
status: complete
|
||||||
|
completed: 2025-12-15
|
||||||
|
|
||||||
|
gate-2:
|
||||||
|
title: Cross-References & Entity Graph
|
||||||
|
status: complete
|
||||||
|
completed: 2026-01-10
|
||||||
|
|
||||||
|
gate-3:
|
||||||
|
title: Timeline Pipeline
|
||||||
|
status: complete
|
||||||
|
completed: 2026-01-25
|
||||||
|
|
||||||
|
gate-4:
|
||||||
|
title: MR File Changes Ingestion
|
||||||
|
status: partial
|
||||||
|
notes: Schema ready (migration 016), ingestion code exists but untested
|
||||||
|
tracks: mr_file_changes table population
|
||||||
|
|
||||||
|
gate-5:
|
||||||
|
title: Code Trace (file:line → commit → MR → issue)
|
||||||
|
status: not-started
|
||||||
|
blocked-by: gate-4
|
||||||
|
notes: Requires git log parsing + commit SHA matching
|
||||||
|
```
|
||||||
|
|
||||||
|
The skill reads this file to determine which `requires` entries are satisfied.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scoring Algorithm
|
||||||
|
|
||||||
|
### Priority Score
|
||||||
|
|
||||||
|
```
|
||||||
|
For ideas:
|
||||||
|
base = impact_weight # high=3, medium=2, low=1
|
||||||
|
unlock = 1 + (0.5 × count_of_unlocks) # items this directly enables
|
||||||
|
readiness = 0 if blocked, 1 if ready
|
||||||
|
priority = base × unlock × readiness
|
||||||
|
|
||||||
|
For issues:
|
||||||
|
base = severity_weight × 1.5 # critical=6, high=4.5, medium=3, low=1.5
|
||||||
|
unlock = 1 + (0.5 × count_of_unlocks) # (bugs rarely unlock, but can)
|
||||||
|
readiness = 0 if blocked, 1 if ready
|
||||||
|
priority = base × unlock × readiness
|
||||||
|
|
||||||
|
Tiebreak (among equal priority):
|
||||||
|
1. Prefer smaller effort (ships faster, starts next cycle sooner)
|
||||||
|
2. Prefer autonomy:full over needs-design over needs-human
|
||||||
|
3. Prefer older items (FIFO within same score)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Why This Works
|
||||||
|
|
||||||
|
- High-impact items that unlock other items float to the top
|
||||||
|
- Blocked items score 0 regardless of impact (can't be worked)
|
||||||
|
- Effort is a tiebreaker, not a primary factor (since execution is fast)
|
||||||
|
- Issues with severity get a 1.5× multiplier (bugs degrade existing value)
|
||||||
|
- Unlock multiplier captures the "do Gate 4 first" insight automatically
|
||||||
|
|
||||||
|
### Example Rankings
|
||||||
|
|
||||||
|
| Item | Impact | Unlocks | Readiness | Score |
|
||||||
|
|------|--------|---------|-----------|-------|
|
||||||
|
| project-ergonomics | high(3) | 10 | ready(1) | 3 × 6.0 = 18.0 |
|
||||||
|
| gate-4-completion | med(2) | 5 | ready(1) | 2 × 3.5 = 7.0 |
|
||||||
|
| similar-issues | high(3) | 1 | ready(1) | 3 × 1.5 = 4.5 |
|
||||||
|
| stale-discussions | high(3) | 0 | ready(1) | 3 × 1.0 = 3.0 |
|
||||||
|
| hotspots | high(3) | 1 | blocked(0) | 0.0 |
|
||||||
|
|
||||||
|
Project-ergonomics dominates because it unlocks 10 downstream items. This is the
|
||||||
|
correct recommendation — it's the highest-leverage work even though "stale-discussions"
|
||||||
|
is simpler.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope Change Detection
|
||||||
|
|
||||||
|
This is the hardest problem. An idea's scope can change in three ways:
|
||||||
|
|
||||||
|
### 1. During Spec Generation (Primary Detection Point)
|
||||||
|
|
||||||
|
When Claude/GPT generates a detailed implementation spec from an idea doc, it
|
||||||
|
understands the idea more deeply than the original sketch. The spec process should
|
||||||
|
be instructed to:
|
||||||
|
|
||||||
|
- Re-evaluate effort (now that implementation is understood in detail)
|
||||||
|
- Discover new dependencies (need to change schema first, need a new config option)
|
||||||
|
- Identify component changes (touches more modules than originally thought)
|
||||||
|
- Assess impact more accurately (this is actually higher/lower value than estimated)
|
||||||
|
|
||||||
|
**Mechanism:** The spec generation prompt includes an explicit "re-evaluate frontmatter"
|
||||||
|
step. The spec output includes an updated frontmatter block. If scoring-relevant
|
||||||
|
fields changed, the skill flags it:
|
||||||
|
|
||||||
|
```
|
||||||
|
SCOPE CHANGE DETECTED:
|
||||||
|
idea-009 (Similar Issues Finder)
|
||||||
|
- effort: small → medium (needs embedding aggregation strategy)
|
||||||
|
- blocked-by: [] → [gate-embeddings-populated]
|
||||||
|
- components: +src/cli/commands/similar.rs (new file)
|
||||||
|
Previous score: 4.5 → New score: 3.0
|
||||||
|
Recommendation: Still top-3, but sequencing may change.
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. During Implementation (Discovered Complexity)
|
||||||
|
|
||||||
|
An agent working on beads may discover the spec was wrong:
|
||||||
|
- "This requires a database migration I didn't anticipate"
|
||||||
|
- "This module doesn't expose the API I need"
|
||||||
|
|
||||||
|
**Mechanism:** When a bead is blocked or takes significantly longer than estimated,
|
||||||
|
the agent should update the idea's frontmatter. The skill detects the change on
|
||||||
|
next triage run via eval-hash comparison.
|
||||||
|
|
||||||
|
### 3. External Changes (Gate Completion, New Ideas)
|
||||||
|
|
||||||
|
When a gate completes or a new idea is added that changes the dependency graph:
|
||||||
|
- Gate 4 completes → 5 ideas become unblocked
|
||||||
|
- New idea added that's higher priority than current top-3
|
||||||
|
- Two ideas discovered to be duplicates
|
||||||
|
|
||||||
|
**Mechanism:** The skill detects these automatically by re-computing the full graph
|
||||||
|
on every run. The eval-hash tracks what the scoring fields looked like last time;
|
||||||
|
if they haven't changed but the SCORE changed (because a dependency was resolved),
|
||||||
|
the skill flags it as "newly unblocked."
|
||||||
|
|
||||||
|
### The eval-hash Field
|
||||||
|
|
||||||
|
```yaml
|
||||||
|
eval-hash: "a1b2c3d4" # SHA-256 of: impact + effort + blocked-by + unlocks + requires
|
||||||
|
```
|
||||||
|
|
||||||
|
Computed by hashing the concatenation of all scoring-relevant fields. When the skill
|
||||||
|
runs, it compares:
|
||||||
|
- If eval-hash matches AND score is same → no change, skip
|
||||||
|
- If eval-hash matches BUT score changed → external change (dependency resolved)
|
||||||
|
- If eval-hash differs → item was modified, re-evaluate
|
||||||
|
|
||||||
|
This avoids re-announcing unchanged items on every run.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Skill Design
|
||||||
|
|
||||||
|
### Location
|
||||||
|
|
||||||
|
`.claude/skills/idea-triage/SKILL.md` (project-local)
|
||||||
|
|
||||||
|
### Trigger Phrases
|
||||||
|
|
||||||
|
- "triage ideas" / "what should I build next?"
|
||||||
|
- "idea triage" / "prioritize ideas"
|
||||||
|
- "what's the highest value work?"
|
||||||
|
- `/idea-triage`
|
||||||
|
|
||||||
|
### Workflow Phases
|
||||||
|
|
||||||
|
**Phase 1: INGEST**
|
||||||
|
- Glob docs/ideas/*.md and docs/issues/*.md
|
||||||
|
- Parse YAML frontmatter from each file
|
||||||
|
- Read docs/gates.yaml for capability status
|
||||||
|
- Collect: id, title, type, status, impact, effort, severity, autonomy,
|
||||||
|
blocked-by, unlocks, requires, has-spec, beads, eval-hash
|
||||||
|
|
||||||
|
**Phase 2: VALIDATE**
|
||||||
|
- Required fields present (id, title, type, status, impact, effort)
|
||||||
|
- All blocked-by IDs reference existing files
|
||||||
|
- All unlocks IDs reference existing files
|
||||||
|
- All requires entries exist in gates.yaml
|
||||||
|
- No dependency cycles (blocked-by graph is a DAG)
|
||||||
|
- Status transitions are valid (no "proposed" with beads linked)
|
||||||
|
- Output: list of validation errors/warnings
|
||||||
|
|
||||||
|
**Phase 3: EVALUATE (Scope Change Detection)**
|
||||||
|
- For each item, compute current eval-hash from scoring fields
|
||||||
|
- Compare against stored eval-hash in frontmatter
|
||||||
|
- If different: flag as SCOPE_CHANGED with field-level diff
|
||||||
|
- If same but score changed (due to external dep resolution): flag as NEWLY_UNBLOCKED
|
||||||
|
- If status is specced but has-spec is false: flag as INCONSISTENT
|
||||||
|
|
||||||
|
**Phase 4: SCORE**
|
||||||
|
- Resolve requires against gates.yaml (is the gate complete?)
|
||||||
|
- Resolve blocked-by against other items (is the blocker done?)
|
||||||
|
- Compute readiness: 0 if any hard blocker is unresolved, 1 otherwise
|
||||||
|
- Compute unlock count: count items whose blocked-by includes this ID
|
||||||
|
- Apply scoring formula:
|
||||||
|
- Ideas: impact_weight × (1 + 0.5 × unlock_count) × readiness
|
||||||
|
- Issues: severity_weight × 1.5 × (1 + 0.5 × unlock_count) × readiness
|
||||||
|
- Apply tiebreak: effort_weight, autonomy, created date
|
||||||
|
|
||||||
|
**Phase 5: SEQUENCE**
|
||||||
|
- Separate into: actionable (score > 0) vs blocked (score = 0)
|
||||||
|
- Among actionable: sort by score descending with tiebreak
|
||||||
|
- Among blocked: sort by "what-if score" (score if blockers were resolved)
|
||||||
|
- Compute unlock advisories: "completing X unblocks Y items worth Z total score"
|
||||||
|
|
||||||
|
**Phase 6: RECOMMEND**
|
||||||
|
Output structured report:
|
||||||
|
|
||||||
|
```
|
||||||
|
== IDEA TRIAGE ==
|
||||||
|
Run: 2026-02-09T14:30:00Z
|
||||||
|
Items: 22 (18 proposed, 2 accepted, 1 specced, 1 implemented)
|
||||||
|
|
||||||
|
RECOMMENDED SEQUENCE:
|
||||||
|
1. [idea-project-ergonomics] Multi-Project Ergonomics
|
||||||
|
impact:high effort:medium autonomy:full score:18.0
|
||||||
|
WHY FIRST: Unlocks 10 downstream ideas. Highest leverage.
|
||||||
|
COMPONENTS: src/core/config.rs, src/core/project.rs, src/cli/
|
||||||
|
|
||||||
|
2. [idea-009] Similar Issues Finder
|
||||||
|
impact:high effort:small autonomy:full score:4.5
|
||||||
|
WHY NEXT: Highest standalone impact. Ships in ~30 min.
|
||||||
|
UNLOCKS: idea-recurring-patterns
|
||||||
|
|
||||||
|
3. [idea-004] Stale Discussion Finder
|
||||||
|
impact:high effort:small autonomy:full score:3.0
|
||||||
|
WHY NEXT: Quick win, no dependencies, immediate user value.
|
||||||
|
|
||||||
|
BLOCKED (would rank high if unblocked):
|
||||||
|
idea-014 File Hotspots score-if-unblocked:4.5 BLOCKED BY: gate-4
|
||||||
|
idea-021 Knowledge Silos score-if-unblocked:3.0 BLOCKED BY: gate-4
|
||||||
|
UNLOCK ADVISORY: Completing gate-4 unblocks 5 items (combined: 15.0)
|
||||||
|
|
||||||
|
SCOPE CHANGES DETECTED:
|
||||||
|
idea-009: effort changed small→medium (eval-hash mismatch)
|
||||||
|
idea-017: now has spec (has-spec flipped to true)
|
||||||
|
|
||||||
|
NEWLY UNBLOCKED:
|
||||||
|
(none this run)
|
||||||
|
|
||||||
|
WARNINGS:
|
||||||
|
idea-016: status=proposed, unchanged for 30+ days
|
||||||
|
idea-008: blocked-by references "idea-gate4" which doesn't exist (typo?)
|
||||||
|
|
||||||
|
HEALTH:
|
||||||
|
Proposed: 18 | Accepted: 2 | Specced: 1 | Promoted: 0 | Implemented: 1
|
||||||
|
Blocked: 6 | Actionable: 16
|
||||||
|
Backlog runway at ~5/day: ~3 days
|
||||||
|
```
|
||||||
|
|
||||||
|
### What the Skill Does NOT Do
|
||||||
|
|
||||||
|
- **Never modifies files.** Read-only triage. The agent or human updates frontmatter.
|
||||||
|
Exception: the skill CAN update eval-hash after a triage run (opt-in).
|
||||||
|
- **Never creates beads.** That's plan-to-beads skill territory.
|
||||||
|
- **Never replaces bv.** Once work is in beads, bv --robot-triage handles execution
|
||||||
|
prioritization. This skill owns pre-commitment only.
|
||||||
|
- **Never generates specs.** That's a separate step with Claude/GPT.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Integration Points
|
||||||
|
|
||||||
|
### With Spec Generation
|
||||||
|
|
||||||
|
The spec generation prompt (separate from this skill) should include:
|
||||||
|
|
||||||
|
```
|
||||||
|
After generating the implementation spec, re-evaluate the idea's frontmatter:
|
||||||
|
1. Is the effort estimate still accurate? (small/medium/large/xlarge)
|
||||||
|
2. Did you discover new dependencies? (add to blocked-by)
|
||||||
|
3. Are there components not listed? (add to components)
|
||||||
|
4. Has the impact assessment changed?
|
||||||
|
5. Can an agent ship this autonomously? (autonomy: full/needs-design/needs-human)
|
||||||
|
|
||||||
|
Output an UPDATED frontmatter block at the end of the spec.
|
||||||
|
If any scoring field changed, explain what changed and why.
|
||||||
|
```
|
||||||
|
|
||||||
|
### With plan-to-beads
|
||||||
|
|
||||||
|
When promoting an idea to beads:
|
||||||
|
1. Run plan-to-beads on the spec
|
||||||
|
2. Capture the created bead IDs
|
||||||
|
3. Update the idea's frontmatter: status → promoted, beads → [bd-xxx, bd-yyy]
|
||||||
|
4. Run br sync --flush-only && git add .beads/
|
||||||
|
|
||||||
|
### With bv --robot-triage
|
||||||
|
|
||||||
|
These systems don't talk to each other directly. The boundary is:
|
||||||
|
- Idea triage skill → "build idea-009 next"
|
||||||
|
- Human/agent generates spec → plan-to-beads → beads created
|
||||||
|
- bv --robot-triage → "work on bd-xxx next"
|
||||||
|
- Beads close → human/agent updates idea frontmatter → idea triage re-runs
|
||||||
|
|
||||||
|
### With New Item Ingestion
|
||||||
|
|
||||||
|
When someone adds a new file to docs/ideas/ or docs/issues/:
|
||||||
|
- If it has valid frontmatter: picked up automatically on next triage run
|
||||||
|
- If it has no/invalid frontmatter: flagged in WARNINGS section
|
||||||
|
- Skill can suggest default frontmatter based on content analysis
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Failure Modes and Mitigations
|
||||||
|
|
||||||
|
### 1. Frontmatter Rot
|
||||||
|
**Risk:** Fields don't get updated. Status says "proposed" but it's actually shipped.
|
||||||
|
**Mitigation:** Cross-reference with beads. If an idea has beads and all beads are
|
||||||
|
closed, flag that the idea should be "implemented" even if frontmatter says otherwise.
|
||||||
|
The skill detects this inconsistency.
|
||||||
|
|
||||||
|
### 2. Score Gaming
|
||||||
|
**Risk:** Someone inflates impact or unlocks count to make their idea rank higher.
|
||||||
|
**Mitigation:** Unlocks are verified — the skill checks that the referenced items
|
||||||
|
actually have this idea in their blocked-by. Impact is subjective but reviewed during
|
||||||
|
spec generation (second opinion from a different model/session).
|
||||||
|
|
||||||
|
### 3. Stale Gates Registry
|
||||||
|
**Risk:** gate-4 is actually complete but gates.yaml wasn't updated.
|
||||||
|
**Mitigation:** Skill warns when a gate has been "partial" for a long time. Could
|
||||||
|
also probe the codebase (check if mr_file_changes ingestion code exists and has tests).
|
||||||
|
|
||||||
|
### 4. Circular Dependencies
|
||||||
|
**Risk:** A blocks B blocks A.
|
||||||
|
**Mitigation:** Phase 2 validation explicitly checks for cycles in the blocked-by
|
||||||
|
graph and reports them as errors.
|
||||||
|
|
||||||
|
### 5. Unlock Count Inflation
|
||||||
|
**Risk:** An item claims to unlock 20 things, making it score astronomically.
|
||||||
|
**Mitigation:** Unlock count is VERIFIED by checking reverse blocked-by references.
|
||||||
|
If idea-X says it unlocks idea-Y, but idea-Y's blocked-by doesn't include idea-X,
|
||||||
|
the claim is discounted. Both explicit unlocks and reverse blocked-by contribute to
|
||||||
|
the count, but unverified claims are flagged.
|
||||||
|
|
||||||
|
### 6. Scope Creep During Spec
|
||||||
|
**Risk:** Spec generation reveals the idea is actually 5× harder than estimated.
|
||||||
|
The score drops, but the human has already mentally committed.
|
||||||
|
**Mitigation:** The scope change detection makes this VISIBLE. The triage output
|
||||||
|
explicitly shows "effort changed small→xlarge, score dropped from 4.5 to 0.75."
|
||||||
|
Human can then decide: proceed anyway, or switch to a different top-3 pick.
|
||||||
|
|
||||||
|
### 7. Orphaned Ideas
|
||||||
|
**Risk:** Ideas get promoted to beads, beads get implemented, but the idea file
|
||||||
|
never gets updated. It sits in "promoted" forever.
|
||||||
|
**Mitigation:** Skill checks: for each idea with status=promoted, look up the
|
||||||
|
linked beads. If all beads are closed, flag: "idea-009 appears complete, update
|
||||||
|
status to implemented."
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Implementation Plan
|
||||||
|
|
||||||
|
### Step 1: Create the Frontmatter Schema (this doc → applied to all files)
|
||||||
|
- Define the exact YAML schema (above)
|
||||||
|
- Create docs/gates.yaml
|
||||||
|
- Apply frontmatter to all 22 existing files in docs/ideas/ and docs/issues/
|
||||||
|
|
||||||
|
### Step 2: Build the Skill
|
||||||
|
- Create .claude/skills/idea-triage/SKILL.md
|
||||||
|
- Implement all 6 phases in the skill prompt
|
||||||
|
- The skill uses Glob, Read, and text processing — no external scripts needed
|
||||||
|
(25 files is small enough for Claude to process directly)
|
||||||
|
|
||||||
|
### Step 3: Test the System
|
||||||
|
- Run the skill against current files
|
||||||
|
- Verify scoring matches manual expectations
|
||||||
|
- Check that project-ergonomics ranks #1 (it should, due to unlock count)
|
||||||
|
- Verify blocked items score 0
|
||||||
|
- Check validation catches intentional errors
|
||||||
|
|
||||||
|
### Step 4: Run One Full Cycle
|
||||||
|
- Pick the top recommendation
|
||||||
|
- Generate a spec (separate session)
|
||||||
|
- Verify scope change detection works (spec should update frontmatter)
|
||||||
|
- Promote to beads via plan-to-beads
|
||||||
|
- Implement
|
||||||
|
- Verify completion detection works
|
||||||
|
|
||||||
|
### Step 5: Iterate
|
||||||
|
- Run triage again after implementation
|
||||||
|
- Verify newly unblocked items surface
|
||||||
|
- Adjust scoring weights if rankings feel wrong
|
||||||
|
- Add new ideas as they emerge
|
||||||
88
docs/ideas/bottlenecks.md
Normal file
88
docs/ideas/bottlenecks.md
Normal file
@@ -0,0 +1,88 @@
|
|||||||
|
# Review Bottleneck Detector
|
||||||
|
|
||||||
|
- **Command:** `lore bottlenecks [--since <date>]`
|
||||||
|
- **Confidence:** 85%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — join MRs with first review note, compute percentiles
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
For MRs in a given time window, compute:
|
||||||
|
1. **Time to first review** — created_at to first non-author DiffNote
|
||||||
|
2. **Review cycles** — count of discussion resolution rounds
|
||||||
|
3. **Time to merge** — created_at to merged_at
|
||||||
|
|
||||||
|
Flag MRs above P90 thresholds as bottlenecks.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Review bottlenecks are the #1 developer productivity killer. Making them visible
|
||||||
|
and measurable is the first step to fixing them. This provides data for process
|
||||||
|
retrospectives.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `merge_requests` (created_at, merged_at, author_username)
|
||||||
|
- `notes` (note_type='DiffNote', author_username, created_at)
|
||||||
|
- `discussions` (resolved, resolvable)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Time to first review per MR
|
||||||
|
SELECT
|
||||||
|
mr.id,
|
||||||
|
mr.iid,
|
||||||
|
mr.title,
|
||||||
|
mr.author_username,
|
||||||
|
mr.created_at,
|
||||||
|
mr.merged_at,
|
||||||
|
p.path_with_namespace,
|
||||||
|
MIN(n.created_at) as first_review_at,
|
||||||
|
(MIN(n.created_at) - mr.created_at) / 3600000.0 as hours_to_first_review,
|
||||||
|
(mr.merged_at - mr.created_at) / 3600000.0 as hours_to_merge
|
||||||
|
FROM merge_requests mr
|
||||||
|
JOIN projects p ON mr.project_id = p.id
|
||||||
|
LEFT JOIN discussions d ON d.merge_request_id = mr.id
|
||||||
|
LEFT JOIN notes n ON n.discussion_id = d.id
|
||||||
|
AND n.note_type = 'DiffNote'
|
||||||
|
AND n.is_system = 0
|
||||||
|
AND n.author_username != mr.author_username
|
||||||
|
WHERE mr.created_at >= ?1
|
||||||
|
AND mr.state IN ('merged', 'opened')
|
||||||
|
GROUP BY mr.id
|
||||||
|
ORDER BY hours_to_first_review DESC NULLS FIRST;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Review Bottlenecks (last 30 days)
|
||||||
|
|
||||||
|
P50 time to first review: 4.2h
|
||||||
|
P90 time to first review: 28.1h
|
||||||
|
P50 time to merge: 2.1d
|
||||||
|
P90 time to merge: 8.3d
|
||||||
|
|
||||||
|
Slowest to review:
|
||||||
|
!234 Refactor auth 72h to first review (alice, still open)
|
||||||
|
!228 Database migration 48h to first review (bob, merged in 5d)
|
||||||
|
|
||||||
|
Most review cycles:
|
||||||
|
!234 Refactor auth 8 discussion threads, 4 resolved
|
||||||
|
!225 API versioning 6 discussion threads, 6 resolved
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Doesn't capture review done outside GitLab (Slack, in-person)
|
||||||
|
- DiffNote timestamp != when reviewer started reading
|
||||||
|
- Large MRs naturally take longer; no size normalization
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore bottlenecks --reviewer alice` — how fast does alice review?
|
||||||
|
- Per-project comparison: which project has the fastest review cycle?
|
||||||
|
- Trend line: is review speed improving or degrading over time?
|
||||||
77
docs/ideas/churn.md
Normal file
77
docs/ideas/churn.md
Normal file
@@ -0,0 +1,77 @@
|
|||||||
|
# MR Churn Analysis
|
||||||
|
|
||||||
|
- **Command:** `lore churn [--since <date>]`
|
||||||
|
- **Confidence:** 72%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — multi-table aggregation with composite scoring
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
For merged MRs, compute a "contentiousness score" based on: number of review
|
||||||
|
discussions, number of DiffNotes, resolution cycles, file count. Flag high-churn
|
||||||
|
MRs as candidates for architectural review.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
High-churn MRs often indicate architectural disagreements, unclear requirements,
|
||||||
|
or code that's hard to review. Surfacing them post-merge enables retrospectives
|
||||||
|
and identifies areas that need better design upfront.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `merge_requests` (state='merged')
|
||||||
|
- `discussions` (merge_request_id, resolved, resolvable)
|
||||||
|
- `notes` (note_type='DiffNote', discussion_id)
|
||||||
|
- `mr_file_changes` (file count per MR)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
mr.iid,
|
||||||
|
mr.title,
|
||||||
|
mr.author_username,
|
||||||
|
p.path_with_namespace,
|
||||||
|
COUNT(DISTINCT d.id) as discussion_count,
|
||||||
|
COUNT(DISTINCT CASE WHEN n.note_type = 'DiffNote' THEN n.id END) as diffnote_count,
|
||||||
|
COUNT(DISTINCT CASE WHEN d.resolvable = 1 AND d.resolved = 1 THEN d.id END) as resolved_threads,
|
||||||
|
COUNT(DISTINCT mfc.id) as files_changed,
|
||||||
|
-- Composite score: normalize each metric and weight
|
||||||
|
(COUNT(DISTINCT d.id) * 2 + COUNT(DISTINCT n.id) + COUNT(DISTINCT mfc.id)) as churn_score
|
||||||
|
FROM merge_requests mr
|
||||||
|
JOIN projects p ON mr.project_id = p.id
|
||||||
|
LEFT JOIN discussions d ON d.merge_request_id = mr.id AND d.noteable_type = 'MergeRequest'
|
||||||
|
LEFT JOIN notes n ON n.discussion_id = d.id AND n.is_system = 0
|
||||||
|
LEFT JOIN mr_file_changes mfc ON mfc.merge_request_id = mr.id
|
||||||
|
WHERE mr.state = 'merged'
|
||||||
|
AND mr.merged_at >= ?1
|
||||||
|
GROUP BY mr.id
|
||||||
|
ORDER BY churn_score DESC
|
||||||
|
LIMIT ?2;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
High-Churn MRs (last 90 days)
|
||||||
|
|
||||||
|
MR Discussions DiffNotes Files Score Title
|
||||||
|
!234 12 28 8 60 Refactor auth middleware
|
||||||
|
!225 8 19 5 39 API versioning v2
|
||||||
|
!218 6 15 12 39 Database schema migration
|
||||||
|
!210 5 8 3 21 Update logging framework
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- High discussion count could mean thorough review, not contention
|
||||||
|
- Composite scoring weights are arbitrary; needs calibration per team
|
||||||
|
- Large MRs naturally score higher regardless of contention
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- Normalize by file count (discussions per file changed)
|
||||||
|
- Compare against team averages (flag outliers, not absolute values)
|
||||||
|
- `lore churn --author alice` — which of alice's MRs generate the most discussion?
|
||||||
73
docs/ideas/closure-gaps.md
Normal file
73
docs/ideas/closure-gaps.md
Normal file
@@ -0,0 +1,73 @@
|
|||||||
|
# MR-to-Issue Closure Gap
|
||||||
|
|
||||||
|
- **Command:** `lore closure-gaps`
|
||||||
|
- **Confidence:** 88%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** low — single join query
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Find entity_references where reference_type='closes' AND the target issue is still
|
||||||
|
open AND the source MR is merged. These represent broken auto-close links where a
|
||||||
|
merge should have closed an issue but didn't.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Simple, definitive, actionable. If a merged MR says "closes #42" but #42 is still
|
||||||
|
open, something is wrong. Either auto-close failed (wrong target branch), the
|
||||||
|
reference was incorrect, or the issue needs manual attention.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `entity_references` (reference_type='closes')
|
||||||
|
- `merge_requests` (state='merged')
|
||||||
|
- `issues` (state='opened')
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
mr.iid as mr_iid,
|
||||||
|
mr.title as mr_title,
|
||||||
|
mr.merged_at,
|
||||||
|
mr.target_branch,
|
||||||
|
i.iid as issue_iid,
|
||||||
|
i.title as issue_title,
|
||||||
|
i.state as issue_state,
|
||||||
|
p.path_with_namespace
|
||||||
|
FROM entity_references er
|
||||||
|
JOIN merge_requests mr ON er.source_entity_type = 'merge_request'
|
||||||
|
AND er.source_entity_id = mr.id
|
||||||
|
JOIN issues i ON er.target_entity_type = 'issue'
|
||||||
|
AND er.target_entity_id = i.id
|
||||||
|
JOIN projects p ON er.project_id = p.id
|
||||||
|
WHERE er.reference_type = 'closes'
|
||||||
|
AND mr.state = 'merged'
|
||||||
|
AND i.state = 'opened';
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Closure Gaps — merged MRs that didn't close their referenced issues
|
||||||
|
|
||||||
|
group/backend !234 merged 3d ago → #42 still OPEN
|
||||||
|
"Refactor auth middleware" should have closed "Login timeout bug"
|
||||||
|
Target branch: develop (default: main) — possible branch mismatch
|
||||||
|
|
||||||
|
group/frontend !45 merged 1w ago → #38 still OPEN
|
||||||
|
"Update dashboard" should have closed "Dashboard layout broken"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Could be intentional (MR merged to wrong branch, issue tracked across branches)
|
||||||
|
- Cross-project references may not be resolvable if target project not synced
|
||||||
|
- GitLab auto-close only works when merging to default branch
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- Flag likely cause: branch mismatch (target_branch != project.default_branch)
|
||||||
|
- `lore closure-gaps --auto-close` — actually close the issues via API (dangerous, needs confirmation)
|
||||||
101
docs/ideas/collaboration.md
Normal file
101
docs/ideas/collaboration.md
Normal file
@@ -0,0 +1,101 @@
|
|||||||
|
# Author Collaboration Network
|
||||||
|
|
||||||
|
- **Command:** `lore collaboration [--since <date>]`
|
||||||
|
- **Confidence:** 70%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — self-join on notes, graph construction
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Build a weighted graph of author pairs: (author_A, author_B, weight) where weight =
|
||||||
|
number of times A reviewed B's MR + B reviewed A's MR + they both commented on the
|
||||||
|
same entity.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Reveals team structure empirically. Shows who collaborates across team boundaries
|
||||||
|
and where knowledge transfer happens. Useful for re-orgs, onboarding planning,
|
||||||
|
and identifying isolated team members.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `merge_requests` (author_username)
|
||||||
|
- `notes` (author_username, note_type='DiffNote')
|
||||||
|
- `discussions` (for co-participation)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Review relationships: who reviews whose MRs
|
||||||
|
SELECT
|
||||||
|
mr.author_username as author,
|
||||||
|
n.author_username as reviewer,
|
||||||
|
COUNT(*) as review_count
|
||||||
|
FROM merge_requests mr
|
||||||
|
JOIN discussions d ON d.merge_request_id = mr.id
|
||||||
|
JOIN notes n ON n.discussion_id = d.id
|
||||||
|
WHERE n.note_type = 'DiffNote'
|
||||||
|
AND n.is_system = 0
|
||||||
|
AND n.author_username != mr.author_username
|
||||||
|
AND mr.created_at >= ?1
|
||||||
|
GROUP BY mr.author_username, n.author_username;
|
||||||
|
|
||||||
|
-- Co-participation: who comments on the same entities
|
||||||
|
WITH entity_participants AS (
|
||||||
|
SELECT
|
||||||
|
COALESCE(d.issue_id, d.merge_request_id) as entity_id,
|
||||||
|
d.noteable_type,
|
||||||
|
n.author_username
|
||||||
|
FROM discussions d
|
||||||
|
JOIN notes n ON n.discussion_id = d.id
|
||||||
|
WHERE n.is_system = 0
|
||||||
|
AND n.created_at >= ?1
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
a.author_username as person_a,
|
||||||
|
b.author_username as person_b,
|
||||||
|
COUNT(DISTINCT a.entity_id) as shared_entities
|
||||||
|
FROM entity_participants a
|
||||||
|
JOIN entity_participants b
|
||||||
|
ON a.entity_id = b.entity_id
|
||||||
|
AND a.noteable_type = b.noteable_type
|
||||||
|
AND a.author_username < b.author_username -- avoid duplicates
|
||||||
|
GROUP BY a.author_username, b.author_username;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Formats
|
||||||
|
|
||||||
|
### JSON (for further analysis)
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"nodes": ["alice", "bob", "charlie"],
|
||||||
|
"edges": [
|
||||||
|
{ "source": "alice", "target": "bob", "reviews": 15, "co_participated": 8 },
|
||||||
|
{ "source": "bob", "target": "charlie", "reviews": 3, "co_participated": 12 }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### Human
|
||||||
|
```
|
||||||
|
Collaboration Network (last 90 days)
|
||||||
|
|
||||||
|
alice <-> bob 15 reviews, 8 shared discussions [strong]
|
||||||
|
bob <-> charlie 3 reviews, 12 shared discussions [moderate]
|
||||||
|
alice <-> charlie 1 review, 2 shared discussions [weak]
|
||||||
|
dave <-> (none) 0 reviews, 0 shared discussions [isolated]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Interpretation requires context; high collaboration might mean dependency
|
||||||
|
- Doesn't capture collaboration outside GitLab
|
||||||
|
- Self-join can be slow with many notes
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore collaboration --format dot` — GraphViz network diagram
|
||||||
|
- `lore collaboration --isolated` — find team members with no collaboration edges
|
||||||
|
- Team boundary detection via graph clustering algorithms
|
||||||
86
docs/ideas/contributors.md
Normal file
86
docs/ideas/contributors.md
Normal file
@@ -0,0 +1,86 @@
|
|||||||
|
# Contributor Heatmap
|
||||||
|
|
||||||
|
- **Command:** `lore contributors [--since <date>]`
|
||||||
|
- **Confidence:** 88%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — multiple aggregation queries
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Rank team members by activity across configurable time windows (7d, 30d, 90d). Shows
|
||||||
|
issues authored, MRs authored, MRs merged, review comments made, discussions
|
||||||
|
participated in.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Team leads constantly ask "who's been active?" or "who's contributing to reviews?"
|
||||||
|
This answers it from local data without GitLab Premium analytics. Also useful for
|
||||||
|
identifying team members who may be overloaded or disengaged.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `issues` (author_username, created_at)
|
||||||
|
- `merge_requests` (author_username, created_at, merged_at)
|
||||||
|
- `notes` (author_username, created_at, note_type, is_system)
|
||||||
|
- `discussions` (for participation counting)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Combined activity per author
|
||||||
|
WITH activity AS (
|
||||||
|
SELECT author_username, 'issue_authored' as activity_type, created_at
|
||||||
|
FROM issues WHERE created_at >= ?1
|
||||||
|
UNION ALL
|
||||||
|
SELECT author_username, 'mr_authored', created_at
|
||||||
|
FROM merge_requests WHERE created_at >= ?1
|
||||||
|
UNION ALL
|
||||||
|
SELECT author_username, 'mr_merged', merged_at
|
||||||
|
FROM merge_requests WHERE merged_at >= ?1 AND state = 'merged'
|
||||||
|
UNION ALL
|
||||||
|
SELECT author_username, 'review_comment', created_at
|
||||||
|
FROM notes WHERE created_at >= ?1 AND note_type = 'DiffNote' AND is_system = 0
|
||||||
|
UNION ALL
|
||||||
|
SELECT author_username, 'discussion_comment', created_at
|
||||||
|
FROM notes WHERE created_at >= ?1 AND note_type != 'DiffNote' AND is_system = 0
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
author_username,
|
||||||
|
COUNT(*) FILTER (WHERE activity_type = 'issue_authored') as issues,
|
||||||
|
COUNT(*) FILTER (WHERE activity_type = 'mr_authored') as mrs_authored,
|
||||||
|
COUNT(*) FILTER (WHERE activity_type = 'mr_merged') as mrs_merged,
|
||||||
|
COUNT(*) FILTER (WHERE activity_type = 'review_comment') as reviews,
|
||||||
|
COUNT(*) FILTER (WHERE activity_type = 'discussion_comment') as comments,
|
||||||
|
COUNT(*) as total
|
||||||
|
FROM activity
|
||||||
|
GROUP BY author_username
|
||||||
|
ORDER BY total DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: SQLite doesn't support FILTER — use SUM(CASE WHEN ... THEN 1 ELSE 0 END).
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Contributors (last 30 days)
|
||||||
|
|
||||||
|
Username Issues MRs Merged Reviews Comments Total
|
||||||
|
alice 3 8 7 23 12 53
|
||||||
|
bob 1 5 4 31 8 49
|
||||||
|
charlie 5 3 2 4 15 29
|
||||||
|
dave 0 1 0 2 3 6
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Could be used for surveillance; frame as team health, not individual tracking
|
||||||
|
- Activity volume != productivity (one thoughtful review > ten "LGTM"s)
|
||||||
|
- Doesn't capture work done outside GitLab
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore contributors --project group/backend` — scoped to project
|
||||||
|
- `lore contributors --type reviews` — focus on review activity only
|
||||||
|
- Trend comparison: `--compare 30d,90d` shows velocity changes
|
||||||
94
docs/ideas/decisions.md
Normal file
94
docs/ideas/decisions.md
Normal file
@@ -0,0 +1,94 @@
|
|||||||
|
# Decision Archaeology
|
||||||
|
|
||||||
|
- **Command:** `lore decisions <query>`
|
||||||
|
- **Confidence:** 82%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — search pipeline + regex pattern matching on notes
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Search for discussion notes that contain decision-making language. Use the existing
|
||||||
|
search pipeline but boost notes containing patterns like "decided", "agreed",
|
||||||
|
"will go with", "tradeoff", "because we", "rationale", "the approach is", "we chose".
|
||||||
|
Return the surrounding discussion context.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
This is gitlore's unique value proposition — "why was this decision made?" is the
|
||||||
|
question that no other tool answers well. Architecture Decision Records are rarely
|
||||||
|
maintained; the real decisions live in discussion threads. This mines them.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `documents` + search pipeline (for finding relevant entities)
|
||||||
|
- `notes` (body text for pattern matching)
|
||||||
|
- `discussions` (for thread context)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Run existing hybrid search to find entities matching the query topic
|
||||||
|
2. For each result entity, query all discussion notes
|
||||||
|
3. Score each note against decision-language patterns:
|
||||||
|
- Strong signals (weight 3): "decided to", "agreed on", "the decision is",
|
||||||
|
"we will go with", "approved approach"
|
||||||
|
- Medium signals (weight 2): "tradeoff", "because", "rationale", "chosen",
|
||||||
|
"opted for", "rejected", "alternative"
|
||||||
|
- Weak signals (weight 1): "should we", "proposal", "option A", "option B",
|
||||||
|
"pros and cons"
|
||||||
|
4. Return notes scoring above threshold, with surrounding context (previous and
|
||||||
|
next note in discussion thread)
|
||||||
|
5. Sort by: search relevance * decision score
|
||||||
|
```
|
||||||
|
|
||||||
|
### Decision Patterns (regex)
|
||||||
|
|
||||||
|
```rust
|
||||||
|
const STRONG_PATTERNS: &[&str] = &[
|
||||||
|
r"(?i)\b(decided|agreed|approved)\s+(to|on|that)\b",
|
||||||
|
r"(?i)\bthe\s+(decision|approach|plan)\s+is\b",
|
||||||
|
r"(?i)\bwe('ll| will| are going to)\s+(go with|use|implement)\b",
|
||||||
|
r"(?i)\blet'?s\s+(go with|use|do)\b",
|
||||||
|
];
|
||||||
|
|
||||||
|
const MEDIUM_PATTERNS: &[&str] = &[
|
||||||
|
r"(?i)\b(tradeoff|trade-off|rationale|because we|opted for)\b",
|
||||||
|
r"(?i)\b(rejected|ruled out|won't work|not viable)\b",
|
||||||
|
r"(?i)\b(chosen|selected|picked)\b.{0,20}\b(over|instead of)\b",
|
||||||
|
];
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Decisions related to "authentication"
|
||||||
|
|
||||||
|
group/backend !234 — "Refactor auth middleware"
|
||||||
|
Discussion #a1b2c3 (alice, 3w ago):
|
||||||
|
"We decided to use JWT with short-lived tokens instead of session cookies.
|
||||||
|
The tradeoff is more complexity in the refresh flow, but we get stateless
|
||||||
|
auth which scales better."
|
||||||
|
Decision confidence: HIGH (3 strong pattern matches)
|
||||||
|
|
||||||
|
group/backend #42 — "Auth architecture review"
|
||||||
|
Discussion #d4e5f6 (bob, 2mo ago):
|
||||||
|
"After discussing with the security team, we'll go with bcrypt for password
|
||||||
|
hashing. Argon2 is theoretically better but bcrypt has wider library support."
|
||||||
|
Decision confidence: HIGH (2 strong pattern matches)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Pattern matching is imperfect; may miss decisions phrased differently
|
||||||
|
- May surface "discussion about deciding" rather than actual decisions
|
||||||
|
- Non-English discussions won't match
|
||||||
|
- Requires good search results as input (garbage in, garbage out)
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore decisions --recent` — decisions made in last 30 days
|
||||||
|
- `lore decisions --author alice` — decisions made by specific person
|
||||||
|
- Export as ADR (Architecture Decision Record) format
|
||||||
|
- Combine with timeline for chronological decision history
|
||||||
131
docs/ideas/digest.md
Normal file
131
docs/ideas/digest.md
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
# "What Changed?" Digest
|
||||||
|
|
||||||
|
- **Command:** `lore digest --since <date>`
|
||||||
|
- **Confidence:** 93%
|
||||||
|
- **Tier:** 1
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — multiple queries across event tables, formatting logic
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Generate a structured summary of all activity since a given date: issues
|
||||||
|
opened/closed, MRs merged, labels changed, milestones updated, key discussions.
|
||||||
|
Group by project and sort by significance (state changes > merges > label changes >
|
||||||
|
new comments).
|
||||||
|
|
||||||
|
Default `--since` is 1 day (last 24 hours). Supports `7d`, `2w`, `YYYY-MM-DD`.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
"What happened while I was on PTO?" is the most universal developer question. This
|
||||||
|
is a killer feature that leverages ALL the event data gitlore has ingested. No other
|
||||||
|
local tool provides this.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `resource_state_events` (opened/closed/merged/reopened)
|
||||||
|
- `resource_label_events` (label add/remove)
|
||||||
|
- `resource_milestone_events` (milestone add/remove)
|
||||||
|
- `merge_requests` (merged_at for merge events)
|
||||||
|
- `issues` (created_at for new issues)
|
||||||
|
- `discussions` (last_note_at for active discussions)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Parse --since into ms epoch timestamp
|
||||||
|
2. Query each event table WHERE created_at >= since
|
||||||
|
3. Query new issues WHERE created_at >= since
|
||||||
|
4. Query merged MRs WHERE merged_at >= since
|
||||||
|
5. Query active discussions WHERE last_note_at >= since
|
||||||
|
6. Group all events by project
|
||||||
|
7. Within each project, sort by: state changes first, then merges, then labels
|
||||||
|
8. Format as human-readable sections or robot JSON
|
||||||
|
```
|
||||||
|
|
||||||
|
### SQL Queries
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- State changes in window
|
||||||
|
SELECT rse.*, i.iid as issue_iid, mr.iid as mr_iid,
|
||||||
|
COALESCE(i.title, mr.title) as title,
|
||||||
|
p.path_with_namespace
|
||||||
|
FROM resource_state_events rse
|
||||||
|
LEFT JOIN issues i ON rse.issue_id = i.id
|
||||||
|
LEFT JOIN merge_requests mr ON rse.merge_request_id = mr.id
|
||||||
|
JOIN projects p ON rse.project_id = p.id
|
||||||
|
WHERE rse.created_at >= ?1
|
||||||
|
ORDER BY rse.created_at DESC;
|
||||||
|
|
||||||
|
-- Newly merged MRs
|
||||||
|
SELECT mr.iid, mr.title, mr.author_username, mr.merged_at,
|
||||||
|
p.path_with_namespace
|
||||||
|
FROM merge_requests mr
|
||||||
|
JOIN projects p ON mr.project_id = p.id
|
||||||
|
WHERE mr.merged_at >= ?1
|
||||||
|
ORDER BY mr.merged_at DESC;
|
||||||
|
|
||||||
|
-- New issues
|
||||||
|
SELECT i.iid, i.title, i.author_username, i.created_at,
|
||||||
|
p.path_with_namespace
|
||||||
|
FROM issues i
|
||||||
|
JOIN projects p ON i.project_id = p.id
|
||||||
|
WHERE i.created_at >= ?1
|
||||||
|
ORDER BY i.created_at DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
=== What Changed (last 7 days) ===
|
||||||
|
|
||||||
|
group/backend (12 events)
|
||||||
|
Merged:
|
||||||
|
!234 Refactor auth middleware (alice, 2d ago)
|
||||||
|
!231 Fix connection pool leak (bob, 5d ago)
|
||||||
|
Closed:
|
||||||
|
#89 Login timeout on slow networks (closed by alice, 3d ago)
|
||||||
|
Opened:
|
||||||
|
#95 Rate limiting returns 500 (charlie, 1d ago)
|
||||||
|
Labels:
|
||||||
|
#90 +priority::high (dave, 4d ago)
|
||||||
|
|
||||||
|
group/frontend (3 events)
|
||||||
|
Merged:
|
||||||
|
!45 Update dashboard layout (eve, 6d ago)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Robot Mode Output
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ok": true,
|
||||||
|
"data": {
|
||||||
|
"since": "2025-01-20T00:00:00Z",
|
||||||
|
"projects": [
|
||||||
|
{
|
||||||
|
"path": "group/backend",
|
||||||
|
"merged": [ { "iid": 234, "title": "...", "author": "alice" } ],
|
||||||
|
"closed": [ { "iid": 89, "title": "...", "actor": "alice" } ],
|
||||||
|
"opened": [ { "iid": 95, "title": "...", "author": "charlie" } ],
|
||||||
|
"label_changes": [ { "iid": 90, "label": "priority::high", "action": "add" } ]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"summary": { "total_events": 15, "projects_active": 2 }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Can be overwhelming for very active repos; needs `--limit` per category
|
||||||
|
- Doesn't capture nuance (a 200-comment MR merge is more significant than a typo fix)
|
||||||
|
- Only shows what gitlore has synced; stale data = stale digest
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore digest --author alice` — personal activity digest
|
||||||
|
- `lore digest --project group/backend` — single project scope
|
||||||
|
- `lore digest --format markdown` — paste-ready for Slack/email
|
||||||
|
- Combine with weekly-digest for scheduled summaries
|
||||||
120
docs/ideas/experts.md
Normal file
120
docs/ideas/experts.md
Normal file
@@ -0,0 +1,120 @@
|
|||||||
|
# Who Knows About X?
|
||||||
|
|
||||||
|
- **Command:** `lore experts <path-or-topic>`
|
||||||
|
- **Confidence:** 92%
|
||||||
|
- **Tier:** 1
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — two query paths (file-based, topic-based)
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Given a file path, find people who have authored MRs touching that file, left
|
||||||
|
DiffNotes on that file, or discussed issues referencing that file. Given a topic
|
||||||
|
string, use search to find relevant entities then extract the active participants.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
"Who should I ask about the auth module?" is one of the most common questions in
|
||||||
|
large teams. This answers it empirically from actual contribution and review data.
|
||||||
|
No guessing, no out-of-date wiki pages.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `mr_file_changes` (new_path, merge_request_id) — who changed the file
|
||||||
|
- `notes` (position_new_path, author_username) — who reviewed the file
|
||||||
|
- `merge_requests` (author_username) — MR authorship
|
||||||
|
- `documents` + search pipeline — for topic-based queries
|
||||||
|
- `discussions` + `notes` — for participant extraction
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
### Path Mode: `lore experts src/auth/`
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Query mr_file_changes WHERE new_path LIKE 'src/auth/%'
|
||||||
|
2. Join merge_requests to get author_username for each MR
|
||||||
|
3. Query notes WHERE position_new_path LIKE 'src/auth/%'
|
||||||
|
4. Collect all usernames with activity counts
|
||||||
|
5. Rank by: MR authorship (weight 3) + DiffNote authorship (weight 2) + discussion participation (weight 1)
|
||||||
|
6. Apply recency decay (recent activity weighted higher)
|
||||||
|
```
|
||||||
|
|
||||||
|
### Topic Mode: `lore experts "authentication timeout"`
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Run existing hybrid search for the topic
|
||||||
|
2. Collect top N document results
|
||||||
|
3. For each document, extract author_username
|
||||||
|
4. For each document's entity, query discussions and collect note authors
|
||||||
|
5. Rank by frequency and recency
|
||||||
|
```
|
||||||
|
|
||||||
|
### SQL (Path Mode)
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Authors who changed files matching pattern
|
||||||
|
SELECT mr.author_username, COUNT(*) as changes, MAX(mr.merged_at) as last_active
|
||||||
|
FROM mr_file_changes mfc
|
||||||
|
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
|
||||||
|
WHERE mfc.new_path LIKE ?1
|
||||||
|
AND mr.state = 'merged'
|
||||||
|
GROUP BY mr.author_username
|
||||||
|
ORDER BY changes DESC;
|
||||||
|
|
||||||
|
-- Reviewers who commented on files matching pattern
|
||||||
|
SELECT n.author_username, COUNT(*) as reviews, MAX(n.created_at) as last_active
|
||||||
|
FROM notes n
|
||||||
|
WHERE n.position_new_path LIKE ?1
|
||||||
|
AND n.note_type = 'DiffNote'
|
||||||
|
AND n.is_system = 0
|
||||||
|
GROUP BY n.author_username
|
||||||
|
ORDER BY reviews DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
Experts for: src/auth/
|
||||||
|
|
||||||
|
alice 12 changes, 8 reviews (last active 3d ago) [top contributor]
|
||||||
|
bob 3 changes, 15 reviews (last active 1d ago) [top reviewer]
|
||||||
|
charlie 5 changes, 2 reviews (last active 2w ago)
|
||||||
|
dave 1 change, 0 reviews (last active 3mo ago) [stale]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Robot Mode Output
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ok": true,
|
||||||
|
"data": {
|
||||||
|
"query": "src/auth/",
|
||||||
|
"query_type": "path",
|
||||||
|
"experts": [
|
||||||
|
{
|
||||||
|
"username": "alice",
|
||||||
|
"changes": 12,
|
||||||
|
"reviews": 8,
|
||||||
|
"discussions": 3,
|
||||||
|
"score": 62,
|
||||||
|
"last_active": "2025-01-25T10:00:00Z",
|
||||||
|
"role": "top_contributor"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Historical data may be stale (people leave teams, change roles)
|
||||||
|
- Path mode requires `mr_file_changes` to be populated (Gate 4 ingestion)
|
||||||
|
- Topic mode quality depends on search quality
|
||||||
|
- Doesn't account for org chart / actual ownership
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore experts --since 90d` — recency filter
|
||||||
|
- `lore experts --min-activity 3` — noise filter
|
||||||
|
- Combine with `lore silos` to highlight when an expert is the ONLY expert
|
||||||
75
docs/ideas/graph.md
Normal file
75
docs/ideas/graph.md
Normal file
@@ -0,0 +1,75 @@
|
|||||||
|
# Entity Relationship Explorer
|
||||||
|
|
||||||
|
- **Command:** `lore graph <entity-type> <iid>`
|
||||||
|
- **Confidence:** 80%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — BFS traversal (similar to timeline expand), output formatting
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Given an issue or MR, traverse `entity_references` and display all connected
|
||||||
|
entities with relationship types and depths. Output as tree, JSON, or Mermaid diagram.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
The entity_references graph is already built (Gate 2) but has no dedicated
|
||||||
|
exploration command. Timeline shows events over time; this shows the relationship
|
||||||
|
structure. "What's connected to this issue?" is a different question from "what
|
||||||
|
happened to this issue?"
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `entity_references` (source/target entity, reference_type)
|
||||||
|
- `issues` / `merge_requests` (for entity context)
|
||||||
|
- Timeline expand stage already implements BFS over this graph
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Resolve entity type + iid to local ID
|
||||||
|
2. BFS over entity_references:
|
||||||
|
- Follow source→target AND target→source (bidirectional)
|
||||||
|
- Track depth (--depth flag, default 2)
|
||||||
|
- Track reference_type for edge labels
|
||||||
|
3. Hydrate each discovered entity with title, state, URL
|
||||||
|
4. Format as tree / JSON / Mermaid
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output (Tree)
|
||||||
|
|
||||||
|
```
|
||||||
|
#42 Login timeout bug (CLOSED)
|
||||||
|
├── closes ── !234 Refactor auth middleware (MERGED)
|
||||||
|
│ ├── mentioned ── #38 Connection timeout in auth flow (CLOSED)
|
||||||
|
│ └── mentioned ── #51 Token refresh improvements (OPEN)
|
||||||
|
├── related ── #45 Auth module documentation (OPEN)
|
||||||
|
└── mentioned ── !228 Database migration (MERGED)
|
||||||
|
└── closes ── #35 Schema version drift (CLOSED)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Mermaid Output
|
||||||
|
|
||||||
|
```mermaid
|
||||||
|
graph LR
|
||||||
|
I42["#42 Login timeout"] -->|closes| MR234["!234 Refactor auth"]
|
||||||
|
MR234 -->|mentioned| I38["#38 Connection timeout"]
|
||||||
|
MR234 -->|mentioned| I51["#51 Token refresh"]
|
||||||
|
I42 -->|related| I45["#45 Auth docs"]
|
||||||
|
I42 -->|mentioned| MR228["!228 DB migration"]
|
||||||
|
MR228 -->|closes| I35["#35 Schema drift"]
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Overlaps somewhat with timeline (but different focus: structure vs chronology)
|
||||||
|
- High fan-out for popular entities (need depth + limit controls)
|
||||||
|
- Unresolved cross-project references appear as dead ends
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore graph --format dot` — GraphViz DOT output
|
||||||
|
- `lore graph --format mermaid` — Mermaid diagram
|
||||||
|
- `lore graph --include-discussions` — show discussion threads as nodes
|
||||||
|
- Interactive HTML visualization (future web UI)
|
||||||
70
docs/ideas/hotspots.md
Normal file
70
docs/ideas/hotspots.md
Normal file
@@ -0,0 +1,70 @@
|
|||||||
|
# File Hotspot Report
|
||||||
|
|
||||||
|
- **Command:** `lore hotspots [--since <date>]`
|
||||||
|
- **Confidence:** 85%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** low — single query on mr_file_changes (requires Gate 4 population)
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Rank files by frequency of appearance in merged MRs over a time window. Show
|
||||||
|
change_type breakdown (modified vs added vs deleted). Optionally filter by project.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Hot files are where bugs live. This is a proven engineering metric (see "Your Code
|
||||||
|
as a Crime Scene" by Adam Tornhill). High-churn files deserve extra test coverage,
|
||||||
|
better documentation, and architectural review.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
- `mr_file_changes` (new_path, change_type, merge_request_id) — needs Gate 4 population
|
||||||
|
- `merge_requests` (merged_at, state='merged')
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
mfc.new_path,
|
||||||
|
p.path_with_namespace,
|
||||||
|
COUNT(*) as total_changes,
|
||||||
|
SUM(CASE WHEN mfc.change_type = 'modified' THEN 1 ELSE 0 END) as modifications,
|
||||||
|
SUM(CASE WHEN mfc.change_type = 'added' THEN 1 ELSE 0 END) as additions,
|
||||||
|
SUM(CASE WHEN mfc.change_type = 'deleted' THEN 1 ELSE 0 END) as deletions,
|
||||||
|
SUM(CASE WHEN mfc.change_type = 'renamed' THEN 1 ELSE 0 END) as renames,
|
||||||
|
COUNT(DISTINCT mr.author_username) as unique_authors
|
||||||
|
FROM mr_file_changes mfc
|
||||||
|
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
|
||||||
|
JOIN projects p ON mfc.project_id = p.id
|
||||||
|
WHERE mr.state = 'merged'
|
||||||
|
AND mr.merged_at >= ?1
|
||||||
|
GROUP BY mfc.new_path, p.path_with_namespace
|
||||||
|
ORDER BY total_changes DESC
|
||||||
|
LIMIT ?2;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
File Hotspots (last 90 days, top 20)
|
||||||
|
|
||||||
|
File Changes Authors Type Breakdown
|
||||||
|
src/auth/middleware.rs 18 4 14 mod, 3 add, 1 del
|
||||||
|
src/api/routes.rs 15 3 12 mod, 2 add, 1 rename
|
||||||
|
src/db/migrations.rs 12 2 8 mod, 4 add
|
||||||
|
tests/integration/auth_test.rs 11 3 9 mod, 2 add
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Requires `mr_file_changes` to be populated (Gate 4 ingestion)
|
||||||
|
- Doesn't distinguish meaningful changes from trivial ones (formatting, imports)
|
||||||
|
- Configuration files (CI, Cargo.toml) will rank high but aren't risky
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore hotspots --exclude "*.toml,*.yml"` — filter out config files
|
||||||
|
- `lore hotspots --dir src/auth/` — scope to directory
|
||||||
|
- Combine with `lore silos` for risk scoring: high churn + bus factor 1 = critical
|
||||||
|
- Complexity trend: correlate with discussion count (churn + many discussions = problematic)
|
||||||
69
docs/ideas/idle.md
Normal file
69
docs/ideas/idle.md
Normal file
@@ -0,0 +1,69 @@
|
|||||||
|
# Idle Work Detector
|
||||||
|
|
||||||
|
- **Command:** `lore idle [--days <N>] [--labels <pattern>]`
|
||||||
|
- **Confidence:** 73%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — label event querying with configurable patterns
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Find entities that received an "in progress" or similar label but have had no
|
||||||
|
discussion activity for N days. Cross-reference with assignee to show who might
|
||||||
|
have forgotten about something.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Forgotten WIP is invisible waste. Developers start work, get pulled to something
|
||||||
|
urgent, and the original task sits idle. This makes it visible before it becomes
|
||||||
|
a problem.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `resource_label_events` (label_name, action='add', created_at)
|
||||||
|
- `discussions` (last_note_at for entity activity)
|
||||||
|
- `issues` / `merge_requests` (state, assignees)
|
||||||
|
- `issue_assignees` / `mr_assignees`
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Query resource_label_events for labels matching "in progress" patterns
|
||||||
|
Default patterns: "in-progress", "in_progress", "doing", "wip",
|
||||||
|
"workflow::in-progress", "status::in-progress"
|
||||||
|
Configurable via --labels flag
|
||||||
|
2. For each entity with an "in progress" label still applied:
|
||||||
|
a. Check if the label was subsequently removed (if so, skip)
|
||||||
|
b. Get last_note_at from discussions for that entity
|
||||||
|
c. Flag if last_note_at is older than threshold
|
||||||
|
3. Join with assignees for attribution
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Idle Work (labeled "in progress" but no activity for 14+ days)
|
||||||
|
|
||||||
|
group/backend
|
||||||
|
#90 Rate limiting design assigned to: charlie idle 18 days
|
||||||
|
Last activity: label +priority::high by dave
|
||||||
|
#85 Cache invalidation fix assigned to: alice idle 21 days
|
||||||
|
Last activity: discussion comment by bob
|
||||||
|
|
||||||
|
group/frontend
|
||||||
|
!230 Dashboard redesign assigned to: eve idle 14 days
|
||||||
|
Last activity: DiffNote by dave
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Requires label naming conventions; no universal standard
|
||||||
|
- Work may be happening outside GitLab (local branch, design doc)
|
||||||
|
- "Idle" threshold is subjective; 14 days may be normal for large features
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore idle --assignee alice` — personal idle work check
|
||||||
|
- `lore idle --notify` — generate message templates for nudging owners
|
||||||
|
- Configurable label patterns in config.json for team-specific workflows
|
||||||
92
docs/ideas/impact-graph.md
Normal file
92
docs/ideas/impact-graph.md
Normal file
@@ -0,0 +1,92 @@
|
|||||||
|
# Cross-Project Impact Graph
|
||||||
|
|
||||||
|
- **Command:** `lore impact-graph [--format json|dot|mermaid]`
|
||||||
|
- **Confidence:** 75%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — aggregation over entity_references, graph output formatting
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Aggregate `entity_references` by project pair to produce a weighted adjacency matrix
|
||||||
|
showing how projects reference each other. Output as JSON, DOT, or Mermaid for
|
||||||
|
visualization.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Makes invisible architectural coupling visible. "Backend and frontend repos have
|
||||||
|
47 cross-references this quarter" tells you about tight coupling that may need
|
||||||
|
architectural attention.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `entity_references` (source/target entity IDs)
|
||||||
|
- `issues` / `merge_requests` (project_id for source/target)
|
||||||
|
- `projects` (path_with_namespace)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Project-to-project reference counts
|
||||||
|
WITH ref_projects AS (
|
||||||
|
SELECT
|
||||||
|
CASE er.source_entity_type
|
||||||
|
WHEN 'issue' THEN i_src.project_id
|
||||||
|
WHEN 'merge_request' THEN mr_src.project_id
|
||||||
|
END as source_project_id,
|
||||||
|
CASE er.target_entity_type
|
||||||
|
WHEN 'issue' THEN i_tgt.project_id
|
||||||
|
WHEN 'merge_request' THEN mr_tgt.project_id
|
||||||
|
END as target_project_id,
|
||||||
|
er.reference_type
|
||||||
|
FROM entity_references er
|
||||||
|
LEFT JOIN issues i_src ON er.source_entity_type = 'issue' AND er.source_entity_id = i_src.id
|
||||||
|
LEFT JOIN merge_requests mr_src ON er.source_entity_type = 'merge_request' AND er.source_entity_id = mr_src.id
|
||||||
|
LEFT JOIN issues i_tgt ON er.target_entity_type = 'issue' AND er.target_entity_id = i_tgt.id
|
||||||
|
LEFT JOIN merge_requests mr_tgt ON er.target_entity_type = 'merge_request' AND er.target_entity_id = mr_tgt.id
|
||||||
|
WHERE er.target_entity_id IS NOT NULL -- resolved references only
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
p_src.path_with_namespace as source_project,
|
||||||
|
p_tgt.path_with_namespace as target_project,
|
||||||
|
er.reference_type,
|
||||||
|
COUNT(*) as weight
|
||||||
|
FROM ref_projects rp
|
||||||
|
JOIN projects p_src ON rp.source_project_id = p_src.id
|
||||||
|
JOIN projects p_tgt ON rp.target_project_id = p_tgt.id
|
||||||
|
WHERE rp.source_project_id != rp.target_project_id -- cross-project only
|
||||||
|
GROUP BY p_src.path_with_namespace, p_tgt.path_with_namespace, er.reference_type
|
||||||
|
ORDER BY weight DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Output Formats
|
||||||
|
|
||||||
|
### Mermaid
|
||||||
|
```mermaid
|
||||||
|
graph LR
|
||||||
|
Backend -->|closes 23| Frontend
|
||||||
|
Backend -->|mentioned 47| Infrastructure
|
||||||
|
Frontend -->|mentioned 12| Backend
|
||||||
|
```
|
||||||
|
|
||||||
|
### DOT
|
||||||
|
```dot
|
||||||
|
digraph impact {
|
||||||
|
"group/backend" -> "group/frontend" [label="closes: 23"];
|
||||||
|
"group/backend" -> "group/infra" [label="mentioned: 47"];
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Requires multiple projects synced; limited value for single-project users
|
||||||
|
- "Mentioned" references are noisy (high volume, low signal)
|
||||||
|
- Doesn't capture coupling through shared libraries or APIs (code-level coupling)
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore impact-graph --since 90d` — time-scoped coupling analysis
|
||||||
|
- `lore impact-graph --type closes` — only meaningful reference types
|
||||||
|
- Include unresolved references to show dependencies on un-synced projects
|
||||||
|
- Coupling trend: is cross-project coupling increasing over time?
|
||||||
97
docs/ideas/label-audit.md
Normal file
97
docs/ideas/label-audit.md
Normal file
@@ -0,0 +1,97 @@
|
|||||||
|
# Label Hygiene Audit
|
||||||
|
|
||||||
|
- **Command:** `lore label-audit`
|
||||||
|
- **Confidence:** 82%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** low — straightforward aggregation queries
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Report on label health:
|
||||||
|
- Labels used only once (may be typos or abandoned experiments)
|
||||||
|
- Labels applied and removed within 1 hour (likely mistakes)
|
||||||
|
- Labels with no active issues/MRs (orphaned)
|
||||||
|
- Label name collisions across projects (same name, different meaning)
|
||||||
|
- Labels never used at all (defined but not applied)
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Label sprawl is real and makes filtering useless over time. Teams create labels
|
||||||
|
ad-hoc and never clean them up. This simple audit surfaces maintenance tasks.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `labels` (name, project_id)
|
||||||
|
- `issue_labels` / `mr_labels` (usage counts)
|
||||||
|
- `resource_label_events` (add/remove pairs for mistake detection)
|
||||||
|
- `issues` / `merge_requests` (state for "active" filtering)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Labels used only once
|
||||||
|
SELECT l.name, p.path_with_namespace, COUNT(*) as usage
|
||||||
|
FROM labels l
|
||||||
|
JOIN projects p ON l.project_id = p.id
|
||||||
|
LEFT JOIN issue_labels il ON il.label_id = l.id
|
||||||
|
LEFT JOIN mr_labels ml ON ml.label_id = l.id
|
||||||
|
GROUP BY l.id
|
||||||
|
HAVING COUNT(il.issue_id) + COUNT(ml.merge_request_id) = 1;
|
||||||
|
|
||||||
|
-- Flash labels (applied and removed within 1 hour)
|
||||||
|
SELECT
|
||||||
|
rle1.label_name,
|
||||||
|
rle1.created_at as added_at,
|
||||||
|
rle2.created_at as removed_at,
|
||||||
|
(rle2.created_at - rle1.created_at) / 60000 as minutes_active
|
||||||
|
FROM resource_label_events rle1
|
||||||
|
JOIN resource_label_events rle2
|
||||||
|
ON rle1.issue_id = rle2.issue_id
|
||||||
|
AND rle1.label_name = rle2.label_name
|
||||||
|
AND rle1.action = 'add'
|
||||||
|
AND rle2.action = 'remove'
|
||||||
|
AND rle2.created_at > rle1.created_at
|
||||||
|
AND (rle2.created_at - rle1.created_at) < 3600000;
|
||||||
|
|
||||||
|
-- Unused labels (defined but never applied)
|
||||||
|
SELECT l.name, p.path_with_namespace
|
||||||
|
FROM labels l
|
||||||
|
JOIN projects p ON l.project_id = p.id
|
||||||
|
LEFT JOIN issue_labels il ON il.label_id = l.id
|
||||||
|
LEFT JOIN mr_labels ml ON ml.label_id = l.id
|
||||||
|
WHERE il.issue_id IS NULL AND ml.merge_request_id IS NULL;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Label Audit
|
||||||
|
|
||||||
|
Unused Labels (4):
|
||||||
|
group/backend: deprecated-v1, needs-triage, wontfix-maybe
|
||||||
|
group/frontend: old-design
|
||||||
|
|
||||||
|
Single-Use Labels (3):
|
||||||
|
group/backend: perf-regression (1 issue)
|
||||||
|
group/frontend: ux-debt (1 MR), mobile-only (1 issue)
|
||||||
|
|
||||||
|
Flash Labels (applied < 1hr, 2):
|
||||||
|
group/backend #90: +priority::critical then -priority::critical (12 min)
|
||||||
|
group/backend #85: +blocked then -blocked (5 min)
|
||||||
|
|
||||||
|
Cross-Project Collisions (1):
|
||||||
|
"needs-review" used in group/backend (32 uses) AND group/frontend (8 uses)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Low glamour; this is janitorial work
|
||||||
|
- Single-use labels may be legitimate (one-off categorization)
|
||||||
|
- Cross-project collisions may be intentional (shared vocabulary)
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore label-audit --fix` — suggest deletions for unused labels
|
||||||
|
- Trend: label count over time (is sprawl increasing?)
|
||||||
74
docs/ideas/label-flow.md
Normal file
74
docs/ideas/label-flow.md
Normal file
@@ -0,0 +1,74 @@
|
|||||||
|
# Label Velocity
|
||||||
|
|
||||||
|
- **Command:** `lore label-flow <from-label> <to-label>`
|
||||||
|
- **Confidence:** 78%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — self-join on resource_label_events, percentile computation
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
For a given label pair (e.g., "needs-review" to "approved"), compute median and P90
|
||||||
|
transition times using `resource_label_events`. Shows how fast work moves through
|
||||||
|
your process labels.
|
||||||
|
|
||||||
|
Also supports: single label dwell time (how long does "in-progress" stay applied?).
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Process bottlenecks become quantifiable. "Our code review takes a median of 3 days"
|
||||||
|
is actionable data for retrospectives and process improvement.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `resource_label_events` (label_name, action, created_at, issue_id, merge_request_id)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Label A → Label B transition time
|
||||||
|
WITH add_a AS (
|
||||||
|
SELECT issue_id, merge_request_id, MIN(created_at) as added_at
|
||||||
|
FROM resource_label_events
|
||||||
|
WHERE label_name = ?1 AND action = 'add'
|
||||||
|
GROUP BY issue_id, merge_request_id
|
||||||
|
),
|
||||||
|
add_b AS (
|
||||||
|
SELECT issue_id, merge_request_id, MIN(created_at) as added_at
|
||||||
|
FROM resource_label_events
|
||||||
|
WHERE label_name = ?2 AND action = 'add'
|
||||||
|
GROUP BY issue_id, merge_request_id
|
||||||
|
)
|
||||||
|
SELECT
|
||||||
|
(b.added_at - a.added_at) / 3600000.0 as hours_transition
|
||||||
|
FROM add_a a
|
||||||
|
JOIN add_b b ON a.issue_id = b.issue_id OR a.merge_request_id = b.merge_request_id
|
||||||
|
WHERE b.added_at > a.added_at;
|
||||||
|
```
|
||||||
|
|
||||||
|
Then compute percentiles in Rust (median, P75, P90).
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Label Flow: "needs-review" → "approved"
|
||||||
|
|
||||||
|
Transitions: 42 issues/MRs in last 90 days
|
||||||
|
Median: 18.5 hours
|
||||||
|
P75: 36.2 hours
|
||||||
|
P90: 72.8 hours
|
||||||
|
Slowest: !234 Refactor auth (168 hours)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Only works if teams use label-based workflows consistently
|
||||||
|
- Labels may be applied out of order or skipped
|
||||||
|
- Self-join performance could be slow with many events
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore label-flow --dwell "in-progress"` — how long does a label stay?
|
||||||
|
- `lore label-flow --all` — auto-discover common transitions from event data
|
||||||
|
- Visualization: label state machine with median transition times on edges
|
||||||
81
docs/ideas/milestone-risk.md
Normal file
81
docs/ideas/milestone-risk.md
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
# Milestone Risk Report
|
||||||
|
|
||||||
|
- **Command:** `lore milestone-risk [title]`
|
||||||
|
- **Confidence:** 78%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — milestone + issue aggregation with scope change detection
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
For each active milestone (or a specific one): show total issues, % closed, issues
|
||||||
|
added after milestone creation (scope creep), issues with no assignee, issues with
|
||||||
|
overdue due_date. Flag milestones where completion rate is below expected trajectory.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Milestone health is usually assessed by gut feel. This provides objective signals
|
||||||
|
from data already ingested. Project managers can spot risks early.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `milestones` (title, state, due_date)
|
||||||
|
- `issues` (milestone_id, state, created_at, due_date, assignee)
|
||||||
|
- `issue_assignees` (for unassigned detection)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
m.title,
|
||||||
|
m.state,
|
||||||
|
m.due_date,
|
||||||
|
COUNT(*) as total_issues,
|
||||||
|
SUM(CASE WHEN i.state = 'closed' THEN 1 ELSE 0 END) as closed,
|
||||||
|
SUM(CASE WHEN i.state = 'opened' THEN 1 ELSE 0 END) as open,
|
||||||
|
SUM(CASE WHEN i.created_at > m.created_at THEN 1 ELSE 0 END) as scope_creep,
|
||||||
|
SUM(CASE WHEN ia.username IS NULL AND i.state = 'opened' THEN 1 ELSE 0 END) as unassigned,
|
||||||
|
SUM(CASE WHEN i.due_date < DATE('now') AND i.state = 'opened' THEN 1 ELSE 0 END) as overdue
|
||||||
|
FROM milestones m
|
||||||
|
JOIN issues i ON i.milestone_id = m.id
|
||||||
|
LEFT JOIN issue_assignees ia ON ia.issue_id = i.id
|
||||||
|
WHERE m.state = 'active'
|
||||||
|
GROUP BY m.id;
|
||||||
|
```
|
||||||
|
|
||||||
|
Note: `created_at` comparison for scope creep is approximate — GitLab doesn't
|
||||||
|
expose when an issue was added to a milestone via its milestone_events.
|
||||||
|
|
||||||
|
Actually we DO have `resource_milestone_events` — use those for precise scope change
|
||||||
|
detection.
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Milestone Risk Report
|
||||||
|
|
||||||
|
v2.0 (due Feb 15, 2025)
|
||||||
|
Progress: 14/20 closed (70%)
|
||||||
|
Scope: +3 issues added after milestone start
|
||||||
|
Risks: 2 issues overdue, 1 issue unassigned
|
||||||
|
Status: ON TRACK (70% complete, 60% time elapsed)
|
||||||
|
|
||||||
|
v2.1 (due Mar 30, 2025)
|
||||||
|
Progress: 2/15 closed (13%)
|
||||||
|
Scope: +8 issues added after milestone start
|
||||||
|
Risks: 5 issues unassigned
|
||||||
|
Status: AT RISK (13% complete, scope still growing)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Milestone semantics vary wildly between teams
|
||||||
|
- "Scope creep" detection is noisy if teams batch-add issues to milestones
|
||||||
|
- due_date comparison assumes consistent timezone handling
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore milestone-risk --history` — show scope changes over time
|
||||||
|
- Velocity estimation: at current closure rate, will the milestone finish on time?
|
||||||
|
- Combine with label-flow for "how fast are milestone issues moving through workflow"
|
||||||
67
docs/ideas/mr-pipeline.md
Normal file
67
docs/ideas/mr-pipeline.md
Normal file
@@ -0,0 +1,67 @@
|
|||||||
|
# MR Pipeline Efficiency
|
||||||
|
|
||||||
|
- **Command:** `lore mr-pipeline [--since <date>]`
|
||||||
|
- **Confidence:** 78%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — builds on bottleneck detector with more stages
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Track the full MR lifecycle: creation, first review, all reviews complete (threads
|
||||||
|
resolved), approval, merge. Compute time spent in each stage across all MRs.
|
||||||
|
Identify which stage is the bottleneck.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
"Our merge process is slow" is vague. This breaks it into stages so teams can target
|
||||||
|
the actual bottleneck. Maybe creation-to-review is fast but review-to-merge is slow
|
||||||
|
(merge queue issues). Maybe first review is fast but resolution takes forever
|
||||||
|
(contentious code).
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `merge_requests` (created_at, merged_at)
|
||||||
|
- `notes` (note_type='DiffNote', created_at, author_username)
|
||||||
|
- `discussions` (resolved, resolvable, merge_request_id)
|
||||||
|
- `resource_state_events` (state changes with timestamps)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
For each merged MR, compute:
|
||||||
|
1. **Created → First Review**: MIN(DiffNote.created_at) - mr.created_at
|
||||||
|
2. **First Review → All Resolved**: MAX(discussion.resolved_at) - MIN(DiffNote.created_at)
|
||||||
|
3. **All Resolved → Merged**: mr.merged_at - MAX(discussion.resolved_at)
|
||||||
|
|
||||||
|
Note: "resolved_at" isn't directly stored but can be approximated from the last
|
||||||
|
note in resolved discussions, or from state events.
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
MR Pipeline (last 30 days, 24 merged MRs)
|
||||||
|
|
||||||
|
Stage Median P75 P90
|
||||||
|
Created → First Review 4.2h 12.1h 28.3h
|
||||||
|
First Review → Resolved 8.1h 24.5h 72.0h <-- BOTTLENECK
|
||||||
|
Resolved → Merged 0.5h 1.2h 3.1h
|
||||||
|
|
||||||
|
Total (Created → Merged) 18.4h 48.2h 96.1h
|
||||||
|
|
||||||
|
Biggest bottleneck: Review resolution (median 8.1h)
|
||||||
|
Suggestion: Consider breaking large MRs into smaller reviewable chunks
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- "Resolved" timestamp approximation may be inaccurate
|
||||||
|
- Pipeline assumes linear flow; real MRs have back-and-forth cycles
|
||||||
|
- Draft MRs skew metrics (created early, reviewed late intentionally)
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore mr-pipeline --exclude-drafts` — cleaner metrics
|
||||||
|
- Per-project comparison: which project has the fastest pipeline?
|
||||||
|
- Trend line: weekly pipeline speed over time
|
||||||
|
- Break down by MR size (files changed) to normalize
|
||||||
265
docs/ideas/project-ergonomics.md
Normal file
265
docs/ideas/project-ergonomics.md
Normal file
@@ -0,0 +1,265 @@
|
|||||||
|
# Multi-Project Ergonomics
|
||||||
|
|
||||||
|
- **Confidence:** 90%
|
||||||
|
- **Tier:** 1
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium (multiple small improvements that compound)
|
||||||
|
|
||||||
|
## The Problem
|
||||||
|
|
||||||
|
Every command that touches project-scoped data requires `-p group/subgroup/project`
|
||||||
|
to disambiguate. For users with 5+ projects synced, this is:
|
||||||
|
|
||||||
|
- Repetitive: typing `-p infra/platform/auth-service` on every query
|
||||||
|
- Error-prone: mistyping long paths
|
||||||
|
- Discoverable only by failure: you don't know you need `-p` until you hit an
|
||||||
|
ambiguous error
|
||||||
|
|
||||||
|
The fuzzy matching in `resolve_project` is already good (suffix, substring,
|
||||||
|
case-insensitive) but it only kicks in on the `-p` value itself. There's no way to
|
||||||
|
set a default, group projects, or scope a whole session.
|
||||||
|
|
||||||
|
## Proposed Improvements
|
||||||
|
|
||||||
|
### 1. Project Aliases in Config
|
||||||
|
|
||||||
|
Let users define short aliases for long project paths.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"projects": [
|
||||||
|
{ "path": "infra/platform/auth-service", "alias": "auth" },
|
||||||
|
{ "path": "infra/platform/billing-service", "alias": "billing" },
|
||||||
|
{ "path": "frontend/customer-portal", "alias": "portal" },
|
||||||
|
{ "path": "frontend/admin-dashboard", "alias": "admin" }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then: `lore issues -p auth` resolves via alias before falling through to fuzzy match.
|
||||||
|
|
||||||
|
**Implementation:** Add optional `alias` field to `ProjectConfig`. In
|
||||||
|
`resolve_project`, check aliases before the existing exact/suffix/substring cascade.
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(Debug, Clone, Deserialize)]
|
||||||
|
pub struct ProjectConfig {
|
||||||
|
pub path: String,
|
||||||
|
#[serde(default)]
|
||||||
|
pub alias: Option<String>,
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Resolution order becomes:
|
||||||
|
1. Exact alias match (new)
|
||||||
|
2. Exact path match
|
||||||
|
3. Case-insensitive path match
|
||||||
|
4. Suffix match
|
||||||
|
5. Substring match
|
||||||
|
|
||||||
|
### 2. Default Project (`LORE_PROJECT` env var)
|
||||||
|
|
||||||
|
Set a default project for your shell session so you don't need `-p` at all.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
export LORE_PROJECT=auth
|
||||||
|
lore issues # scoped to auth-service
|
||||||
|
lore mrs --state opened # scoped to auth-service
|
||||||
|
lore search "timeout bug" # scoped to auth-service
|
||||||
|
lore issues -p billing # explicit -p overrides the env var
|
||||||
|
```
|
||||||
|
|
||||||
|
**Implementation:** In every command that accepts `-p`, fall back to
|
||||||
|
`std::env::var("LORE_PROJECT")` when the flag is absent. The `-p` flag always wins.
|
||||||
|
|
||||||
|
Could also support a config-level default:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"defaultProject": "auth"
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Precedence: CLI flag > env var > config default > (no filter).
|
||||||
|
|
||||||
|
### 3. `lore use <project>` — Session Context Switcher
|
||||||
|
|
||||||
|
A command that sets `LORE_PROJECT` for the current shell by writing to a dotfile.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
lore use auth
|
||||||
|
# writes ~/.local/state/lore/current-project containing "auth"
|
||||||
|
|
||||||
|
lore issues # reads current-project file, scopes to auth
|
||||||
|
lore use --clear # removes the file, back to all-project mode
|
||||||
|
lore use # shows current project context
|
||||||
|
```
|
||||||
|
|
||||||
|
This is similar to `kubectl config use-context`, `nvm use`, or `tfenv use`.
|
||||||
|
|
||||||
|
**Implementation:** Write a one-line file at a known state path. Each command reads
|
||||||
|
it as the lowest-priority default (below env var and CLI flag).
|
||||||
|
|
||||||
|
Precedence: CLI flag > env var > `lore use` state file > config default > (no filter).
|
||||||
|
|
||||||
|
### 4. `lore projects` — Project Listing and Discovery
|
||||||
|
|
||||||
|
A dedicated command to see what's synced, with aliases and activity stats.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
$ lore projects
|
||||||
|
|
||||||
|
Alias Path Issues MRs Last Sync
|
||||||
|
auth infra/platform/auth-service 142 87 2h ago
|
||||||
|
billing infra/platform/billing-service 56 34 2h ago
|
||||||
|
portal frontend/customer-portal 203 112 2h ago
|
||||||
|
admin frontend/admin-dashboard 28 15 3d ago
|
||||||
|
- data/ml-pipeline 89 45 2h ago
|
||||||
|
```
|
||||||
|
|
||||||
|
Robot mode returns the same as JSON with alias, path, counts, and last sync time.
|
||||||
|
|
||||||
|
**Implementation:** Query `projects` joined with `COUNT(issues)`, `COUNT(mrs)`,
|
||||||
|
and `MAX(sync_runs.finished_at)`. Overlay aliases from config.
|
||||||
|
|
||||||
|
### 5. Project Groups in Config
|
||||||
|
|
||||||
|
Let users define named groups of projects for batch scoping.
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"projectGroups": {
|
||||||
|
"backend": ["auth", "billing", "data/ml-pipeline"],
|
||||||
|
"frontend": ["portal", "admin"],
|
||||||
|
"all-infra": ["auth", "billing"]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then: `lore issues -p @backend` (or `--group backend`) queries across all projects
|
||||||
|
in the group.
|
||||||
|
|
||||||
|
**Implementation:** When `-p` value starts with `@`, look up the group and resolve
|
||||||
|
each member project. Pass as a `Vec<i64>` of project IDs to the query layer.
|
||||||
|
|
||||||
|
This is especially powerful for:
|
||||||
|
- `lore search "auth bug" -p @backend` — search across related repos
|
||||||
|
- `lore digest --since 7d -p @frontend` — team-scoped activity digest
|
||||||
|
- `lore timeline "deployment" -p @all-infra` — cross-repo timeline
|
||||||
|
|
||||||
|
### 6. Git-Aware Project Detection
|
||||||
|
|
||||||
|
When running `lore` from inside a git repo that matches a synced project, auto-scope
|
||||||
|
to that project without any flags.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/code/auth-service
|
||||||
|
lore issues # auto-detects this is infra/platform/auth-service
|
||||||
|
```
|
||||||
|
|
||||||
|
**Implementation:** Read `.git/config` for the remote URL, extract the project path,
|
||||||
|
check if it matches a synced project. Only activate when exactly one project matches.
|
||||||
|
|
||||||
|
Detection logic:
|
||||||
|
```
|
||||||
|
1. Check if cwd is inside a git repo (find .git)
|
||||||
|
2. Parse git remote origin URL
|
||||||
|
3. Extract path component (e.g., "infra/platform/auth-service.git" → "infra/platform/auth-service")
|
||||||
|
4. Match against synced projects
|
||||||
|
5. If exactly one match, use as implicit -p
|
||||||
|
6. If ambiguous or no match, do nothing (fall through to normal behavior)
|
||||||
|
```
|
||||||
|
|
||||||
|
Precedence: CLI flag > env var > `lore use` > config default > git detection > (no filter).
|
||||||
|
|
||||||
|
This is similar to how `gh` (GitHub CLI) auto-detects the repo you're in.
|
||||||
|
|
||||||
|
### 7. Prompt Integration / Shell Function
|
||||||
|
|
||||||
|
Provide a shell function that shows the current project context in the prompt.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# In .bashrc / .zshrc
|
||||||
|
eval "$(lore completions zsh)"
|
||||||
|
PROMPT='$(lore-prompt)%~ %# '
|
||||||
|
```
|
||||||
|
|
||||||
|
Output: `[lore:auth] ~/code/auth-service %`
|
||||||
|
|
||||||
|
Shows which project `lore` commands will scope to, using the same precedence chain.
|
||||||
|
Helps users understand what context they're in before running a query.
|
||||||
|
|
||||||
|
### 8. Short Project References in Output
|
||||||
|
|
||||||
|
Once aliases exist, use them everywhere in output for brevity:
|
||||||
|
|
||||||
|
**Before:**
|
||||||
|
```
|
||||||
|
infra/platform/auth-service#42 Login timeout bug
|
||||||
|
infra/platform/auth-service!234 Refactor auth middleware
|
||||||
|
```
|
||||||
|
|
||||||
|
**After:**
|
||||||
|
```
|
||||||
|
auth#42 Login timeout bug
|
||||||
|
auth!234 Refactor auth middleware
|
||||||
|
```
|
||||||
|
|
||||||
|
With `--full-paths` flag to get the verbose form when needed.
|
||||||
|
|
||||||
|
## Combined UX Flow
|
||||||
|
|
||||||
|
With all improvements, a typical session looks like:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# One-time config
|
||||||
|
lore init # sets up aliases during interactive setup
|
||||||
|
|
||||||
|
# Daily use
|
||||||
|
lore use auth # set context
|
||||||
|
lore issues --state opened # no -p needed
|
||||||
|
lore search "timeout" # scoped to auth
|
||||||
|
lore timeline "login flow" # scoped to auth
|
||||||
|
lore issues -p @backend # cross-repo query via group
|
||||||
|
lore mrs -p billing # quick alias switch
|
||||||
|
lore use --clear # back to global
|
||||||
|
```
|
||||||
|
|
||||||
|
Or for the power user who never wants to type `lore use`:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
cd ~/code/auth-service
|
||||||
|
lore issues # git-aware auto-detection
|
||||||
|
```
|
||||||
|
|
||||||
|
Or for the scripter:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
LORE_PROJECT=auth lore --robot issues -n 50 # env var for automation
|
||||||
|
```
|
||||||
|
|
||||||
|
## Priority Order
|
||||||
|
|
||||||
|
Implement in this order for maximum incremental value:
|
||||||
|
|
||||||
|
1. **Project aliases** — smallest change, biggest daily friction reduction
|
||||||
|
2. **`LORE_PROJECT` env var** — trivial to implement, enables scripting
|
||||||
|
3. **`lore projects` command** — discoverability, completes the alias story
|
||||||
|
4. **`lore use` context** — nice-to-have for heavy users
|
||||||
|
5. **Project groups** — high value for multi-repo teams
|
||||||
|
6. **Git-aware detection** — polish, "it just works" feel
|
||||||
|
7. **Short refs in output** — ties into timeline issue #001
|
||||||
|
8. **Prompt integration** — extra polish
|
||||||
|
|
||||||
|
## Relationship to Issue #001
|
||||||
|
|
||||||
|
The timeline entity-ref ambiguity (issue #001) is solved naturally by items 7 and 8
|
||||||
|
here. Once aliases exist, `format_entity_ref` can use the alias as the short project
|
||||||
|
identifier in multi-project output:
|
||||||
|
|
||||||
|
```
|
||||||
|
auth#42 instead of infra/platform/auth-service#42
|
||||||
|
```
|
||||||
|
|
||||||
|
And in single-project timelines (detected via `lore use` or git-aware), the project
|
||||||
|
prefix is omitted entirely — matching the current behavior but now intentionally.
|
||||||
81
docs/ideas/recurring-patterns.md
Normal file
81
docs/ideas/recurring-patterns.md
Normal file
@@ -0,0 +1,81 @@
|
|||||||
|
# Recurring Bug Pattern Detector
|
||||||
|
|
||||||
|
- **Command:** `lore recurring-patterns [--min-cluster <N>]`
|
||||||
|
- **Confidence:** 76%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** high — vector clustering, threshold tuning
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Cluster closed issues by embedding similarity. Identify clusters of 3+ issues that
|
||||||
|
are semantically similar — these represent recurring problems that need a systemic
|
||||||
|
fix rather than one-off patches.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Finding the same bug filed 5 different ways is one of the most impactful things you
|
||||||
|
can surface. This is a sophisticated use of the embedding pipeline that no competing
|
||||||
|
tool offers. It turns "we keep having auth issues" from a gut feeling into data.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `documents` (source_type='issue', content_text)
|
||||||
|
- `embeddings` (768-dim vectors)
|
||||||
|
- `issues` (state='closed' for filtering)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Collect all embeddings for closed issue documents
|
||||||
|
2. For each issue, find K nearest neighbors (K=10)
|
||||||
|
3. Build adjacency graph: edge exists if similarity > threshold (e.g., 0.80)
|
||||||
|
4. Find connected components (simple DFS/BFS)
|
||||||
|
5. Filter to components with >= min-cluster members (default 3)
|
||||||
|
6. For each cluster:
|
||||||
|
a. Extract common terms (TF-IDF or simple word frequency)
|
||||||
|
b. Sort by recency (most recent issue first)
|
||||||
|
c. Report cluster with: theme, member issues, time span
|
||||||
|
```
|
||||||
|
|
||||||
|
### Similarity Threshold Tuning
|
||||||
|
|
||||||
|
This is the critical parameter. Too low = noise, too high = misses.
|
||||||
|
- Start at 0.80 cosine similarity
|
||||||
|
- Expose as `--threshold` flag for user tuning
|
||||||
|
- Report cluster cohesion score for transparency
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Recurring Patterns (3+ similar closed issues)
|
||||||
|
|
||||||
|
Cluster 1: "Authentication timeout errors" (5 issues, spanning 6 months)
|
||||||
|
#89 Login timeout on slow networks (closed 3d ago)
|
||||||
|
#72 Auth flow hangs on cellular (closed 2mo ago)
|
||||||
|
#58 Token refresh timeout (closed 3mo ago)
|
||||||
|
#45 SSO login timeout for remote users (closed 5mo ago)
|
||||||
|
#31 Connection timeout in auth middleware (closed 6mo ago)
|
||||||
|
Avg similarity: 0.87 | Suggested: systemic fix for auth timeout handling
|
||||||
|
|
||||||
|
Cluster 2: "Cache invalidation issues" (3 issues, spanning 2 months)
|
||||||
|
#85 Stale cache after deploy (closed 2w ago)
|
||||||
|
#77 Cache headers not updated (closed 1mo ago)
|
||||||
|
#69 Dashboard shows old data after settings change (closed 2mo ago)
|
||||||
|
Avg similarity: 0.82 | Suggested: review cache invalidation strategy
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Clustering quality depends on embedding quality and threshold tuning
|
||||||
|
- May produce false clusters (issues that mention similar terms but are different problems)
|
||||||
|
- Computationally expensive for large issue counts (N^2 comparisons)
|
||||||
|
- Need to handle multi-chunk documents (aggregate embeddings)
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore recurring-patterns --open` — find clusters in open issues (duplicates to merge)
|
||||||
|
- `lore recurring-patterns --cross-project` — patterns across repos
|
||||||
|
- Trend detection: are cluster sizes growing? (escalating problem)
|
||||||
|
- Export as report for engineering retrospectives
|
||||||
78
docs/ideas/review-coverage.md
Normal file
78
docs/ideas/review-coverage.md
Normal file
@@ -0,0 +1,78 @@
|
|||||||
|
# DiffNote Coverage Map
|
||||||
|
|
||||||
|
- **Command:** `lore review-coverage <mr-iid>`
|
||||||
|
- **Confidence:** 75%
|
||||||
|
- **Tier:** 3
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — join DiffNote positions with mr_file_changes
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
For a specific MR, show which files received review comments (DiffNotes) vs. which
|
||||||
|
files were changed but received no review attention. Highlights blind spots in code
|
||||||
|
review.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Large MRs often have files that get reviewed thoroughly and files that slip through
|
||||||
|
with no comments. This makes the review coverage visible so teams can decide if
|
||||||
|
un-reviewed files need a second look.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `mr_file_changes` (new_path per MR)
|
||||||
|
- `notes` (position_new_path, note_type='DiffNote', discussion_id)
|
||||||
|
- `discussions` (merge_request_id)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
mfc.new_path,
|
||||||
|
mfc.change_type,
|
||||||
|
COUNT(DISTINCT n.id) as review_comments,
|
||||||
|
COUNT(DISTINCT d.id) as review_threads,
|
||||||
|
CASE WHEN COUNT(n.id) = 0 THEN 'NOT REVIEWED' ELSE 'REVIEWED' END as status
|
||||||
|
FROM mr_file_changes mfc
|
||||||
|
LEFT JOIN notes n ON n.position_new_path = mfc.new_path
|
||||||
|
AND n.note_type = 'DiffNote'
|
||||||
|
AND n.is_system = 0
|
||||||
|
LEFT JOIN discussions d ON n.discussion_id = d.id
|
||||||
|
AND d.merge_request_id = mfc.merge_request_id
|
||||||
|
WHERE mfc.merge_request_id = ?1
|
||||||
|
GROUP BY mfc.new_path
|
||||||
|
ORDER BY review_comments DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Review Coverage for !234 — Refactor auth middleware
|
||||||
|
|
||||||
|
REVIEWED (5 files, 23 comments)
|
||||||
|
src/auth/middleware.rs 12 comments, 4 threads
|
||||||
|
src/auth/jwt.rs 6 comments, 2 threads
|
||||||
|
src/auth/session.rs 3 comments, 1 thread
|
||||||
|
tests/auth/middleware_test.rs 1 comment, 1 thread
|
||||||
|
src/auth/mod.rs 1 comment, 1 thread
|
||||||
|
|
||||||
|
NOT REVIEWED (3 files)
|
||||||
|
src/auth/types.rs modified [no review comments]
|
||||||
|
src/api/routes.rs modified [no review comments]
|
||||||
|
Cargo.toml modified [no review comments]
|
||||||
|
|
||||||
|
Coverage: 5/8 files (62.5%)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Reviewers may have reviewed a file without leaving comments (approval by silence)
|
||||||
|
- position_new_path matching may not cover all DiffNote position formats
|
||||||
|
- Config files (Cargo.toml) not being reviewed is usually fine
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore review-coverage --all --since 30d` — aggregate coverage across all MRs
|
||||||
|
- Per-reviewer breakdown: which reviewers cover which files?
|
||||||
|
- Coverage heatmap: files that consistently escape review across multiple MRs
|
||||||
90
docs/ideas/silos.md
Normal file
90
docs/ideas/silos.md
Normal file
@@ -0,0 +1,90 @@
|
|||||||
|
# Knowledge Silo Detection
|
||||||
|
|
||||||
|
- **Command:** `lore silos [--min-changes <N>]`
|
||||||
|
- **Confidence:** 87%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — requires mr_file_changes population (Gate 4)
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
For each file path (or directory), count unique MR authors. Flag paths where only
|
||||||
|
1 person has ever authored changes (bus factor = 1). Aggregate by directory to show
|
||||||
|
silo areas.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Bus factor analysis is critical for team resilience. If only one person has ever
|
||||||
|
touched the auth module, that's a risk. This uses data already ingested to surface
|
||||||
|
knowledge concentration that's otherwise invisible.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
- `mr_file_changes` (new_path, merge_request_id) — needs Gate 4 ingestion
|
||||||
|
- `merge_requests` (author_username, state='merged')
|
||||||
|
- `projects` (path_with_namespace)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Find directories with bus factor = 1
|
||||||
|
WITH file_authors AS (
|
||||||
|
SELECT
|
||||||
|
mfc.new_path,
|
||||||
|
mr.author_username,
|
||||||
|
p.path_with_namespace,
|
||||||
|
mfc.project_id
|
||||||
|
FROM mr_file_changes mfc
|
||||||
|
JOIN merge_requests mr ON mfc.merge_request_id = mr.id
|
||||||
|
JOIN projects p ON mfc.project_id = p.id
|
||||||
|
WHERE mr.state = 'merged'
|
||||||
|
),
|
||||||
|
directory_authors AS (
|
||||||
|
SELECT
|
||||||
|
project_id,
|
||||||
|
path_with_namespace,
|
||||||
|
-- Extract directory: everything before last '/'
|
||||||
|
CASE
|
||||||
|
WHEN INSTR(new_path, '/') > 0
|
||||||
|
THEN SUBSTR(new_path, 1, LENGTH(new_path) - LENGTH(REPLACE(RTRIM(new_path, REPLACE(new_path, '/', '')), '', '')))
|
||||||
|
ELSE '.'
|
||||||
|
END as directory,
|
||||||
|
COUNT(DISTINCT author_username) as unique_authors,
|
||||||
|
COUNT(*) as total_changes,
|
||||||
|
GROUP_CONCAT(DISTINCT author_username) as authors
|
||||||
|
FROM file_authors
|
||||||
|
GROUP BY project_id, directory
|
||||||
|
)
|
||||||
|
SELECT * FROM directory_authors
|
||||||
|
WHERE unique_authors = 1
|
||||||
|
AND total_changes >= ?1 -- min-changes threshold
|
||||||
|
ORDER BY total_changes DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Knowledge Silos (bus factor = 1, min 3 changes)
|
||||||
|
|
||||||
|
group/backend
|
||||||
|
src/auth/ alice (8 changes) HIGH RISK
|
||||||
|
src/billing/ bob (5 changes) HIGH RISK
|
||||||
|
src/utils/cache/ charlie (3 changes) MODERATE RISK
|
||||||
|
|
||||||
|
group/frontend
|
||||||
|
src/admin/ dave (12 changes) HIGH RISK
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Historical authors may have left the team; needs recency weighting
|
||||||
|
- Requires `mr_file_changes` to be populated (Gate 4)
|
||||||
|
- Single-author directories may be intentional (ownership model)
|
||||||
|
- Directory aggregation heuristic is imperfect for deep nesting
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore silos --since 180d` — only count recent activity
|
||||||
|
- `lore silos --depth 2` — aggregate at directory depth N
|
||||||
|
- Combine with `lore experts` to show both silos and experts in one view
|
||||||
|
- Risk scoring: weight by directory size, change frequency, recency
|
||||||
95
docs/ideas/similar-issues.md
Normal file
95
docs/ideas/similar-issues.md
Normal file
@@ -0,0 +1,95 @@
|
|||||||
|
# Similar Issues Finder
|
||||||
|
|
||||||
|
- **Command:** `lore similar <iid>`
|
||||||
|
- **Confidence:** 95%
|
||||||
|
- **Tier:** 1
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** low — infrastructure exists, needs one new query path
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Given an issue IID, find the N most semantically similar issues using the existing
|
||||||
|
vector embeddings. Show similarity score and overlapping keywords.
|
||||||
|
|
||||||
|
Can also work with MRs: `lore similar --mr <iid>`.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Duplicate detection is a constant problem on active projects. "Is this bug already
|
||||||
|
filed?" becomes a one-liner. This is the most natural use of the embedding pipeline
|
||||||
|
and the feature people expect when they hear "semantic search."
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `documents` table (source_type, source_id, content_text)
|
||||||
|
- `embeddings` virtual table (768-dim vectors via sqlite-vec)
|
||||||
|
- `embedding_metadata` (document_hash for staleness check)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```
|
||||||
|
1. Resolve IID → issue.id → document.id (via source_type='issue', source_id)
|
||||||
|
2. Look up embedding vector(s) for that document
|
||||||
|
3. Query sqlite-vec for K nearest neighbors (K = limit * 2 for headroom)
|
||||||
|
4. Filter to source_type='issue' (or 'merge_request' if --include-mrs)
|
||||||
|
5. Exclude self
|
||||||
|
6. Rank by cosine similarity
|
||||||
|
7. Return top N with: iid, title, project, similarity_score, url
|
||||||
|
```
|
||||||
|
|
||||||
|
### SQL Core
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Get the embedding for target document (chunk 0 = representative)
|
||||||
|
SELECT embedding FROM embeddings WHERE rowid = ?1 * 1000;
|
||||||
|
|
||||||
|
-- Find nearest neighbors
|
||||||
|
SELECT
|
||||||
|
rowid,
|
||||||
|
distance
|
||||||
|
FROM embeddings
|
||||||
|
WHERE embedding MATCH ?1
|
||||||
|
AND k = ?2
|
||||||
|
ORDER BY distance;
|
||||||
|
|
||||||
|
-- Resolve back to entities
|
||||||
|
SELECT d.source_type, d.source_id, d.title, d.url, i.iid, i.state
|
||||||
|
FROM documents d
|
||||||
|
JOIN issues i ON d.source_id = i.id AND d.source_type = 'issue'
|
||||||
|
WHERE d.id = ?;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Robot Mode Output
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ok": true,
|
||||||
|
"data": {
|
||||||
|
"query_issue": { "iid": 42, "title": "Login timeout on slow networks" },
|
||||||
|
"similar": [
|
||||||
|
{
|
||||||
|
"iid": 38,
|
||||||
|
"title": "Connection timeout in auth flow",
|
||||||
|
"project": "group/backend",
|
||||||
|
"similarity": 0.87,
|
||||||
|
"state": "closed",
|
||||||
|
"url": "https://gitlab.com/group/backend/-/issues/38"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
},
|
||||||
|
"meta": { "elapsed_ms": 45, "candidates_scanned": 200 }
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Embedding quality depends on description quality; short issues may not match well
|
||||||
|
- Multi-chunk documents need aggregation strategy (use chunk 0 or average?)
|
||||||
|
- Requires embeddings to be generated first (`lore embed`)
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore similar --open-only` to filter to unresolved issues (duplicate triage)
|
||||||
|
- `lore similar --text "free text query"` to find issues similar to arbitrary text
|
||||||
|
- Batch mode: find all potential duplicate clusters across the entire database
|
||||||
100
docs/ideas/stale-discussions.md
Normal file
100
docs/ideas/stale-discussions.md
Normal file
@@ -0,0 +1,100 @@
|
|||||||
|
# Stale Discussion Finder
|
||||||
|
|
||||||
|
- **Command:** `lore stale-discussions [--days <N>]`
|
||||||
|
- **Confidence:** 90%
|
||||||
|
- **Tier:** 1
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** low — single query, minimal formatting
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
List unresolved, resolvable discussions where `last_note_at` is older than a
|
||||||
|
threshold (default 14 days), grouped by parent entity. Prioritize by discussion
|
||||||
|
count per entity (more stale threads = more urgent).
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Unresolved discussions are silent blockers. They prevent MR merges, stall
|
||||||
|
decision-making, and represent forgotten conversations. This surfaces them so teams
|
||||||
|
can take action: resolve, respond, or explicitly mark as won't-fix.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `discussions` (resolved, resolvable, last_note_at)
|
||||||
|
- `issues` / `merge_requests` (for parent entity context)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
SELECT
|
||||||
|
d.id,
|
||||||
|
d.noteable_type,
|
||||||
|
CASE WHEN d.issue_id IS NOT NULL THEN i.iid ELSE mr.iid END as entity_iid,
|
||||||
|
CASE WHEN d.issue_id IS NOT NULL THEN i.title ELSE mr.title END as entity_title,
|
||||||
|
p.path_with_namespace,
|
||||||
|
d.last_note_at,
|
||||||
|
((?1 - d.last_note_at) / 86400000) as days_stale,
|
||||||
|
COUNT(*) OVER (PARTITION BY COALESCE(d.issue_id, d.merge_request_id), d.noteable_type) as stale_count_for_entity
|
||||||
|
FROM discussions d
|
||||||
|
JOIN projects p ON d.project_id = p.id
|
||||||
|
LEFT JOIN issues i ON d.issue_id = i.id
|
||||||
|
LEFT JOIN merge_requests mr ON d.merge_request_id = mr.id
|
||||||
|
WHERE d.resolved = 0
|
||||||
|
AND d.resolvable = 1
|
||||||
|
AND d.last_note_at < ?1
|
||||||
|
ORDER BY days_stale DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output Format
|
||||||
|
|
||||||
|
```
|
||||||
|
Stale Discussions (14+ days without activity)
|
||||||
|
|
||||||
|
group/backend !234 — Refactor auth middleware (3 stale threads)
|
||||||
|
Discussion #a1b2c3 (28d stale) "Should we use JWT or session tokens?"
|
||||||
|
Discussion #d4e5f6 (21d stale) "Error handling for expired tokens"
|
||||||
|
Discussion #g7h8i9 (14d stale) "Performance implications of per-request validation"
|
||||||
|
|
||||||
|
group/backend #90 — Rate limiting design (1 stale thread)
|
||||||
|
Discussion #j0k1l2 (18d stale) "Redis vs in-memory rate counter"
|
||||||
|
```
|
||||||
|
|
||||||
|
## Robot Mode Output
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ok": true,
|
||||||
|
"data": {
|
||||||
|
"threshold_days": 14,
|
||||||
|
"total_stale": 4,
|
||||||
|
"entities": [
|
||||||
|
{
|
||||||
|
"type": "merge_request",
|
||||||
|
"iid": 234,
|
||||||
|
"title": "Refactor auth middleware",
|
||||||
|
"project": "group/backend",
|
||||||
|
"stale_discussions": [
|
||||||
|
{
|
||||||
|
"discussion_id": "a1b2c3",
|
||||||
|
"days_stale": 28,
|
||||||
|
"first_note_preview": "Should we use JWT or session tokens?"
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Some discussions are intentionally left open (design docs, long-running threads)
|
||||||
|
- Could produce noise in repos with loose discussion hygiene
|
||||||
|
- Doesn't distinguish "stale and blocking" from "stale and irrelevant"
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore stale-discussions --mr-only` — focus on MR review threads (most actionable)
|
||||||
|
- `lore stale-discussions --author alice` — "threads I started that went quiet"
|
||||||
|
- `lore stale-discussions --assignee bob` — "threads on my MRs that need attention"
|
||||||
82
docs/ideas/unlinked.md
Normal file
82
docs/ideas/unlinked.md
Normal file
@@ -0,0 +1,82 @@
|
|||||||
|
# Unlinked MR Finder
|
||||||
|
|
||||||
|
- **Command:** `lore unlinked [--since <date>]`
|
||||||
|
- **Confidence:** 83%
|
||||||
|
- **Tier:** 2
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** low — LEFT JOIN queries
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Two reports:
|
||||||
|
1. Merged MRs with no entity_references at all (no "closes", no "mentioned",
|
||||||
|
no "related") — orphan MRs with no issue traceability
|
||||||
|
2. Closed issues with no MR reference — issues closed manually without code change
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Process compliance metric. Unlinked MRs mean lost traceability — you can't trace
|
||||||
|
a code change back to a requirement. Manually closed issues might mean work was done
|
||||||
|
outside the tracked process, or issues were closed prematurely.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
All exists today:
|
||||||
|
- `merge_requests` (state, merged_at)
|
||||||
|
- `issues` (state, closed/updated_at)
|
||||||
|
- `entity_references` (for join/anti-join)
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Orphan merged MRs (no references at all)
|
||||||
|
SELECT mr.iid, mr.title, mr.author_username, mr.merged_at,
|
||||||
|
p.path_with_namespace
|
||||||
|
FROM merge_requests mr
|
||||||
|
JOIN projects p ON mr.project_id = p.id
|
||||||
|
LEFT JOIN entity_references er
|
||||||
|
ON er.source_entity_type = 'merge_request' AND er.source_entity_id = mr.id
|
||||||
|
WHERE mr.state = 'merged'
|
||||||
|
AND mr.merged_at >= ?1
|
||||||
|
AND er.id IS NULL
|
||||||
|
ORDER BY mr.merged_at DESC;
|
||||||
|
|
||||||
|
-- Closed issues with no MR reference
|
||||||
|
SELECT i.iid, i.title, i.author_username, i.updated_at,
|
||||||
|
p.path_with_namespace
|
||||||
|
FROM issues i
|
||||||
|
JOIN projects p ON i.project_id = p.id
|
||||||
|
LEFT JOIN entity_references er
|
||||||
|
ON er.target_entity_type = 'issue' AND er.target_entity_id = i.id
|
||||||
|
AND er.source_entity_type = 'merge_request'
|
||||||
|
WHERE i.state = 'closed'
|
||||||
|
AND i.updated_at >= ?1
|
||||||
|
AND er.id IS NULL
|
||||||
|
ORDER BY i.updated_at DESC;
|
||||||
|
```
|
||||||
|
|
||||||
|
## Human Output
|
||||||
|
|
||||||
|
```
|
||||||
|
Unlinked MRs (merged with no issue reference, last 30 days)
|
||||||
|
|
||||||
|
!245 Fix typo in README (alice, merged 2d ago)
|
||||||
|
!239 Update CI pipeline (bob, merged 1w ago)
|
||||||
|
!236 Bump dependency versions (charlie, merged 2w ago)
|
||||||
|
|
||||||
|
Orphan Closed Issues (closed without any MR, last 30 days)
|
||||||
|
|
||||||
|
#92 Update documentation for v2 (closed by dave, 3d ago)
|
||||||
|
#88 Investigate memory usage (closed by eve, 2w ago)
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Some MRs legitimately don't reference issues (chores, CI fixes, dependency bumps)
|
||||||
|
- Some issues are legitimately closed without code (questions, duplicates, won't-fix)
|
||||||
|
- Noise level depends on team discipline
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore unlinked --ignore-labels "chore,ci"` — filter out expected orphans
|
||||||
|
- Compliance score: % of MRs with issue links over time (trend metric)
|
||||||
102
docs/ideas/weekly-digest.md
Normal file
102
docs/ideas/weekly-digest.md
Normal file
@@ -0,0 +1,102 @@
|
|||||||
|
# Weekly Digest Generator
|
||||||
|
|
||||||
|
- **Command:** `lore weekly [--since <date>]`
|
||||||
|
- **Confidence:** 90%
|
||||||
|
- **Tier:** 1
|
||||||
|
- **Status:** proposed
|
||||||
|
- **Effort:** medium — builds on digest infrastructure, adds markdown formatting
|
||||||
|
|
||||||
|
## What
|
||||||
|
|
||||||
|
Auto-generate a markdown document summarizing the week: MRs merged (grouped by
|
||||||
|
project), issues closed, new issues opened, ongoing discussions, milestone progress.
|
||||||
|
Formatted for pasting into Slack, email, or team standup notes.
|
||||||
|
|
||||||
|
Default window is 7 days. `--since` overrides.
|
||||||
|
|
||||||
|
## Why
|
||||||
|
|
||||||
|
Every team lead writes a weekly status update. This writes itself from the data.
|
||||||
|
Leverages everything gitlore has ingested. Saves 30-60 minutes of manual summarization
|
||||||
|
per week.
|
||||||
|
|
||||||
|
## Data Required
|
||||||
|
|
||||||
|
Same as digest (all exists today):
|
||||||
|
- `resource_state_events`, `merge_requests`, `issues`, `discussions`
|
||||||
|
- `milestones` for progress tracking
|
||||||
|
|
||||||
|
## Implementation Sketch
|
||||||
|
|
||||||
|
This is essentially `lore digest --since 7d --format markdown` with:
|
||||||
|
1. Section headers for each category
|
||||||
|
2. Milestone progress bars (X/Y issues closed)
|
||||||
|
3. "Highlights" section with the most-discussed items
|
||||||
|
4. "Risks" section with overdue issues and stale MRs
|
||||||
|
|
||||||
|
### Markdown Template
|
||||||
|
|
||||||
|
```markdown
|
||||||
|
# Weekly Summary — Jan 20-27, 2025
|
||||||
|
|
||||||
|
## Highlights
|
||||||
|
- **!234** Refactor auth middleware merged (12 discussions, 4 reviewers)
|
||||||
|
- **#95** New critical bug: Rate limiting returns 500
|
||||||
|
|
||||||
|
## Merged (3)
|
||||||
|
| MR | Title | Author | Reviewers |
|
||||||
|
|----|-------|--------|-----------|
|
||||||
|
| !234 | Refactor auth middleware | alice | bob, charlie |
|
||||||
|
| !231 | Fix connection pool leak | bob | alice |
|
||||||
|
| !45 | Update dashboard layout | eve | dave |
|
||||||
|
|
||||||
|
## Closed Issues (2)
|
||||||
|
- **#89** Login timeout on slow networks (closed by alice)
|
||||||
|
- **#87** Stale cache headers (closed by bob)
|
||||||
|
|
||||||
|
## New Issues (3)
|
||||||
|
- **#95** Rate limiting returns 500 (priority::high, assigned to charlie)
|
||||||
|
- **#94** Add rate limit documentation (priority::low)
|
||||||
|
- **#93** Flaky test in CI pipeline (assigned to dave)
|
||||||
|
|
||||||
|
## Milestone Progress
|
||||||
|
- **v2.0** — 14/20 issues closed (70%) — due Feb 15
|
||||||
|
- **v1.9-hotfix** — 3/3 issues closed (100%) — COMPLETE
|
||||||
|
|
||||||
|
## Active Discussions
|
||||||
|
- **#90** 8 new comments this week (needs-review)
|
||||||
|
- **!230** 5 review threads unresolved
|
||||||
|
```
|
||||||
|
|
||||||
|
## Robot Mode Output
|
||||||
|
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ok": true,
|
||||||
|
"data": {
|
||||||
|
"period": { "from": "2025-01-20", "to": "2025-01-27" },
|
||||||
|
"merged_count": 3,
|
||||||
|
"closed_count": 2,
|
||||||
|
"opened_count": 3,
|
||||||
|
"highlights": [...],
|
||||||
|
"merged": [...],
|
||||||
|
"closed": [...],
|
||||||
|
"opened": [...],
|
||||||
|
"milestones": [...],
|
||||||
|
"active_discussions": [...]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
## Downsides
|
||||||
|
|
||||||
|
- Formatting preferences vary by team; hard to please everyone
|
||||||
|
- "Highlights" ranking is heuristic (discussion count as proxy for importance)
|
||||||
|
- Doesn't capture work done outside GitLab
|
||||||
|
|
||||||
|
## Extensions
|
||||||
|
|
||||||
|
- `lore weekly --project group/backend` — single project scope
|
||||||
|
- `lore weekly --author alice` — personal weekly summary
|
||||||
|
- `lore weekly --output weekly.md` — write to file
|
||||||
|
- Scheduled generation via cron + robot mode
|
||||||
140
docs/issues/001-timeline-missing-project-in-entity-ref.md
Normal file
140
docs/issues/001-timeline-missing-project-in-entity-ref.md
Normal file
@@ -0,0 +1,140 @@
|
|||||||
|
# 001: Timeline human output omits project path from entity references
|
||||||
|
|
||||||
|
- **Severity:** medium
|
||||||
|
- **Component:** `src/cli/commands/timeline.rs`
|
||||||
|
- **Status:** open
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
The `lore timeline` human-readable output renders entity references as bare `#42` or
|
||||||
|
`!234` without the project path. When multiple projects are synced, this makes the
|
||||||
|
output ambiguous — issue `#42` in `group/backend` and `#42` in `group/frontend` are
|
||||||
|
indistinguishable.
|
||||||
|
|
||||||
|
### Affected code
|
||||||
|
|
||||||
|
`format_entity_ref` at `src/cli/commands/timeline.rs:201-207`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
fn format_entity_ref(entity_type: &str, iid: i64) -> String {
|
||||||
|
match entity_type {
|
||||||
|
"issue" => format!("#{iid}"),
|
||||||
|
"merge_request" => format!("!{iid}"),
|
||||||
|
_ => format!("{entity_type}:{iid}"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
This function is called in three places:
|
||||||
|
|
||||||
|
1. **Event lines** (`print_timeline_event`, line 130) — each event row shows `#42`
|
||||||
|
with no project context
|
||||||
|
2. **Footer seed list** (`print_timeline_footer`, line 161) — seed entities listed as
|
||||||
|
`#42, !234` with no project disambiguation
|
||||||
|
3. **Collect stage summaries** (`timeline_collect.rs:107`) — the `summary` field itself
|
||||||
|
bakes in `"Issue #42 created: ..."` without project
|
||||||
|
|
||||||
|
### Current output (ambiguous)
|
||||||
|
|
||||||
|
```
|
||||||
|
2025-01-20 CREATED #42 Issue #42 created: Login timeout bug @alice
|
||||||
|
2025-01-21 LABEL+ #42 Label added: priority::high @dave
|
||||||
|
2025-01-22 CREATED !234 MR !234 created: Refactor auth middleware @alice
|
||||||
|
2025-01-25 MERGED !234 MR !234 merged @bob
|
||||||
|
|
||||||
|
Seed entities: #42, !234
|
||||||
|
```
|
||||||
|
|
||||||
|
When multiple projects are synced, a reader cannot tell which project `#42` belongs to.
|
||||||
|
|
||||||
|
## Robot mode is partially affected
|
||||||
|
|
||||||
|
The robot JSON output (`EventJson`, line 387-416) DOES include a `project` field per
|
||||||
|
event, so programmatic consumers can disambiguate. However, the `summary` string field
|
||||||
|
still bakes in bare `#42` without project context, which is misleading if an agent uses
|
||||||
|
the summary for display.
|
||||||
|
|
||||||
|
## Proposed fix
|
||||||
|
|
||||||
|
### 1. Add project to `format_entity_ref`
|
||||||
|
|
||||||
|
Pass `project_path` into `format_entity_ref` and use GitLab's full reference format:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
fn format_entity_ref(entity_type: &str, iid: i64, project_path: &str) -> String {
|
||||||
|
match entity_type {
|
||||||
|
"issue" => format!("{project_path}#{iid}"),
|
||||||
|
"merge_request" => format!("{project_path}!{iid}"),
|
||||||
|
_ => format!("{project_path}/{entity_type}:{iid}"),
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
### 2. Smart elision for single-project timelines
|
||||||
|
|
||||||
|
When all events belong to the same project, the full path is visual noise. Detect
|
||||||
|
this and fall back to bare `#42` / `!234`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
fn should_show_project(events: &[TimelineEvent]) -> bool {
|
||||||
|
let mut projects = events.iter().map(|e| &e.project_path).collect::<HashSet<_>>();
|
||||||
|
projects.len() > 1
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Then conditionally format:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let entity_ref = if show_project {
|
||||||
|
format_entity_ref(&event.entity_type, event.entity_iid, &event.project_path)
|
||||||
|
} else {
|
||||||
|
format_entity_ref_short(&event.entity_type, event.entity_iid)
|
||||||
|
};
|
||||||
|
```
|
||||||
|
|
||||||
|
### 3. Fix summary strings in collect stage
|
||||||
|
|
||||||
|
`timeline_collect.rs:107` bakes the summary as `"Issue #42 created: title"`. This
|
||||||
|
should include the project when multi-project:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
let prefix = if multi_project {
|
||||||
|
format!("{type_label} {project_path}#{iid}")
|
||||||
|
} else {
|
||||||
|
format!("{type_label} #{iid}")
|
||||||
|
};
|
||||||
|
summary = format!("{prefix} created: {title_str}");
|
||||||
|
```
|
||||||
|
|
||||||
|
Same pattern for the merge summary at lines 317 and 347.
|
||||||
|
|
||||||
|
### 4. Update footer seed list
|
||||||
|
|
||||||
|
`print_timeline_footer` (line 155-164) should also use the project-aware format:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
result.seed_entities.iter()
|
||||||
|
.map(|e| format_entity_ref(&e.entity_type, e.entity_iid, &e.project_path))
|
||||||
|
```
|
||||||
|
|
||||||
|
## Expected output after fix
|
||||||
|
|
||||||
|
### Single project (no change)
|
||||||
|
|
||||||
|
```
|
||||||
|
2025-01-20 CREATED #42 Issue #42 created: Login timeout bug @alice
|
||||||
|
```
|
||||||
|
|
||||||
|
### Multi-project (project path added)
|
||||||
|
|
||||||
|
```
|
||||||
|
2025-01-20 CREATED group/backend#42 Issue group/backend#42 created: Login timeout @alice
|
||||||
|
2025-01-22 CREATED group/frontend#42 Issue group/frontend#42 created: Broken layout @eve
|
||||||
|
```
|
||||||
|
|
||||||
|
## Impact
|
||||||
|
|
||||||
|
- Human output: ambiguous for multi-project users (the primary use case for gitlore)
|
||||||
|
- Robot output: summary field misleading, but `project` field provides workaround
|
||||||
|
- Timeline footer: seed entity list ambiguous
|
||||||
|
- Collect-stage summaries: baked-in bare references propagate to both renderers
|
||||||
179
docs/performance-audit-2026-02-12.md
Normal file
179
docs/performance-audit-2026-02-12.md
Normal file
@@ -0,0 +1,179 @@
|
|||||||
|
# Deep Performance Audit Report
|
||||||
|
|
||||||
|
**Date:** 2026-02-12
|
||||||
|
**Branch:** `perf-audit` (e9bacc94)
|
||||||
|
**Parent:** `039ab1c2` (master, v0.6.1)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Methodology
|
||||||
|
|
||||||
|
1. **Baseline** — measured p50/p95 latency for all major commands with warm cache
|
||||||
|
2. **Profile** — used macOS `sample` profiler and `EXPLAIN QUERY PLAN` to identify hotspots
|
||||||
|
3. **Golden output** — captured exact numeric outputs before changes as equivalence oracle
|
||||||
|
4. **One lever per change** — each optimization isolated and independently benchmarked
|
||||||
|
5. **Revert threshold** — any optimization <1.1x speedup reverted per audit rules
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Baseline Measurements (warm cache, release build)
|
||||||
|
|
||||||
|
| Command | Latency | Notes |
|
||||||
|
|---------|---------|-------|
|
||||||
|
| `who --path src/core/db.rs` (expert) | 2200ms | **Hotspot** |
|
||||||
|
| `who --active` | 83-93ms | Acceptable |
|
||||||
|
| `who workload` | 22ms | Fast |
|
||||||
|
| `stats` | 107-112ms | **Hotspot** |
|
||||||
|
| `search "authentication"` | 1030ms | **Hotspot** (library-level) |
|
||||||
|
| `list issues -n 50` | ~40ms | Fast |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Optimization 1: INDEXED BY for DiffNote Queries
|
||||||
|
|
||||||
|
**Target:** `src/cli/commands/who.rs` — expert and reviews query paths
|
||||||
|
|
||||||
|
**Problem:** SQLite query planner chose `idx_notes_system` (38% selectivity, 106K rows) over `idx_notes_diffnote_path_created` (9.3% selectivity, 26K rows) for path-filtered DiffNote queries. The partial index `WHERE noteable_type = 'MergeRequest' AND type = 'DiffNote'` is far more selective but the planner's cost model didn't pick it.
|
||||||
|
|
||||||
|
**Change:** Added `INDEXED BY idx_notes_diffnote_path_created` to all 8 SQL queries across `query_expert`, `query_expert_details`, `query_reviews`, `build_path_query` (probes 1 & 2), and `suffix_probe`.
|
||||||
|
|
||||||
|
**Results:**
|
||||||
|
|
||||||
|
| Query | Before | After | Speedup |
|
||||||
|
|-------|--------|-------|---------|
|
||||||
|
| expert (specific path) | 2200ms | 56-58ms | **38x** |
|
||||||
|
| expert (broad path) | 2200ms | 83ms | **26x** |
|
||||||
|
| reviews | 1800ms | 24ms | **75x** |
|
||||||
|
|
||||||
|
**Isomorphism proof:** `INDEXED BY` only changes which index the planner uses, not the query semantics. Same rows matched, same ordering, same output. Verified by golden output comparison across 5+ runs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Optimization 2: Conditional Aggregates in Stats
|
||||||
|
|
||||||
|
**Target:** `src/cli/commands/stats.rs`
|
||||||
|
|
||||||
|
**Problem:** 12+ sequential `COUNT(*)` queries each requiring a full table scan of `documents` (61K rows). Each scan touched the same pages but couldn't share work.
|
||||||
|
|
||||||
|
**Changes:**
|
||||||
|
- Documents: 5 sequential COUNTs -> 1 query with `SUM(CASE WHEN ... THEN 1 END)`
|
||||||
|
- FTS count: `SELECT COUNT(*) FROM documents_fts` (virtual table, slow) -> `SELECT COUNT(*) FROM documents_fts_docsize` (shadow B-tree table, 19x faster)
|
||||||
|
- Embeddings: 2 queries -> 1 with `COUNT(DISTINCT document_id), COUNT(*)`
|
||||||
|
- Dirty sources: 2 queries -> 1 with conditional aggregates
|
||||||
|
- Pending fetches: 2 queries -> 1 each (discussions, dependents)
|
||||||
|
|
||||||
|
**Results:**
|
||||||
|
|
||||||
|
| Metric | Before | After | Speedup |
|
||||||
|
|--------|--------|-------|---------|
|
||||||
|
| Warm median | 112ms | 66ms | **1.70x** |
|
||||||
|
| Cold | 1220ms | ~700ms | ~1.7x |
|
||||||
|
|
||||||
|
**Golden output verified:**
|
||||||
|
|
||||||
|
```
|
||||||
|
total:61652, issues:8241, mrs:10018, discussions:43393, truncated:63
|
||||||
|
fts:61652, embedded:61652, chunks:88161
|
||||||
|
```
|
||||||
|
|
||||||
|
All values match exactly across before/after runs.
|
||||||
|
|
||||||
|
**Isomorphism proof:** `SUM(CASE WHEN x THEN 1 END)` is algebraically identical to `COUNT(*) WHERE x`. The FTS5 shadow table `documents_fts_docsize` has exactly one row per FTS document by SQLite specification, so `COUNT(*)` on it equals the virtual table count.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Investigation: Two-Phase FTS Search (REVERTED)
|
||||||
|
|
||||||
|
**Target:** `src/search/fts.rs`, `src/cli/commands/search.rs`
|
||||||
|
|
||||||
|
**Hypothesis:** FTS5 `snippet()` generation is expensive. Splitting search into Phase 1 (score-only MATCH+bm25) and Phase 2 (snippet for filtered results only) should reduce work.
|
||||||
|
|
||||||
|
**Implementation:** Created `fetch_fts_snippets()` that retrieves snippets only for post-filter document IDs via `json_each()` join.
|
||||||
|
|
||||||
|
**Results:**
|
||||||
|
|
||||||
|
| Metric | Before | After | Improvement |
|
||||||
|
|--------|--------|-------|-------------|
|
||||||
|
| search (limit 20) | 1030ms | 995ms | 3.5% |
|
||||||
|
|
||||||
|
**Decision:** Reverted. Per audit rules, <1.1x speedup does not justify added code complexity.
|
||||||
|
|
||||||
|
**Root cause:** The bottleneck is not snippet generation but `MATCH` + `bm25()` scoring itself. Profiling showed `strspn` (FTS5 tokenizer) and `memmove` as the top CPU consumers. The same query runs in 30ms on system sqlite3 but 1030ms in rusqlite's bundled SQLite — a ~125x gap despite both being SQLite 3.51.x compiled at -O3.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Library-Level Finding: Bundled SQLite FTS5 Performance
|
||||||
|
|
||||||
|
**Observation:** FTS5 MATCH+bm25 queries are ~125x slower in rusqlite's bundled SQLite vs system sqlite3.
|
||||||
|
|
||||||
|
| Environment | Query Time | Notes |
|
||||||
|
|-------------|-----------|-------|
|
||||||
|
| System sqlite3 (macOS) | 30ms (with snippet), 8ms (without) | Same .db file |
|
||||||
|
| rusqlite bundled | 1030ms | `features = ["bundled"]`, OPT_LEVEL=3 |
|
||||||
|
|
||||||
|
**Profiler data (macOS `sample`):**
|
||||||
|
- Top hotspot: `strspn` in FTS5 tokenizer
|
||||||
|
- Secondary: `memmove` in FTS5 internals
|
||||||
|
- Scaling: ~5ms per result (limit 5 = 497ms, limit 20 = 995ms)
|
||||||
|
|
||||||
|
**Possible causes:**
|
||||||
|
- Bundled SQLite compiled without platform-specific optimizations (SIMD, etc.)
|
||||||
|
- Different memory allocator behavior
|
||||||
|
- Missing compile-time tuning flags
|
||||||
|
|
||||||
|
**Recommendation for future:** Investigate switching from `features = ["bundled"]` to system SQLite linkage, or audit the bundled compile flags in the `libsqlite3-sys` build script.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Exploration Agent Findings (Informational)
|
||||||
|
|
||||||
|
Four parallel exploration agents surveyed the entire codebase. Key findings beyond what was already addressed:
|
||||||
|
|
||||||
|
### Ingestion Pipeline
|
||||||
|
- Serial DB writes in async context (acceptable — rusqlite is synchronous)
|
||||||
|
- Label ingestion uses individual inserts (potential batch optimization, low priority)
|
||||||
|
|
||||||
|
### CLI / GitLab Client
|
||||||
|
- GraphQL client recreated per call (`client.rs:98-100`) — caches connection pool, minor
|
||||||
|
- Double JSON deserialization in GraphQL responses — medium priority
|
||||||
|
- N+1 subqueries in `list` command (`list.rs:408-423`) — 4 correlated subqueries per row
|
||||||
|
|
||||||
|
### Search / Embedding
|
||||||
|
- No N+1 patterns, no O(n^2) algorithms
|
||||||
|
- Chunking is O(n) single-pass with proper UTF-8 safety
|
||||||
|
- Ollama concurrency model is sound (parallel HTTP, serial DB writes)
|
||||||
|
|
||||||
|
### Database / Documents
|
||||||
|
- O(n^2) prefix sum in `truncation.rs` — low traffic path
|
||||||
|
- String allocation patterns in extractors — micro-optimization territory
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Opportunity Matrix
|
||||||
|
|
||||||
|
| Candidate | Impact | Confidence | Effort | Score | Status |
|
||||||
|
|-----------|--------|------------|--------|-------|--------|
|
||||||
|
| INDEXED BY for DiffNote | Very High | High | Low | **9.0** | Shipped |
|
||||||
|
| Stats conditional aggregates | Medium | High | Low | **7.0** | Shipped |
|
||||||
|
| Bundled SQLite FTS5 | Very High | Medium | High | 5.0 | Documented |
|
||||||
|
| List N+1 subqueries | Medium | Medium | Medium | 4.0 | Backlog |
|
||||||
|
| GraphQL double deser | Low | Medium | Low | 3.5 | Backlog |
|
||||||
|
| Truncation O(n^2) | Low | High | Low | 3.0 | Backlog |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files Modified
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `src/cli/commands/who.rs` | INDEXED BY hints on 8 SQL queries |
|
||||||
|
| `src/cli/commands/stats.rs` | Conditional aggregates, FTS5 shadow table, merged queries |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Quality Gates
|
||||||
|
|
||||||
|
- All 603 tests pass
|
||||||
|
- `cargo clippy --all-targets -- -D warnings` clean
|
||||||
|
- `cargo fmt --check` clean
|
||||||
|
- Golden output verified for both optimizations
|
||||||
456
docs/phase-a-spec.md
Normal file
456
docs/phase-a-spec.md
Normal file
@@ -0,0 +1,456 @@
|
|||||||
|
# Phase A: Complete API Field Capture
|
||||||
|
|
||||||
|
> **Status:** Draft
|
||||||
|
> **Guiding principle:** Mirror everything GitLab gives us.
|
||||||
|
> - **Lossless mirror:** the raw API JSON stored behind `raw_payload_id`. This is the true complete representation of every API response.
|
||||||
|
> - **Relational projection:** a stable, query-optimized subset of fields we commit to keeping current on every re-sync.
|
||||||
|
> This preserves maximum context for processing and analysis while avoiding unbounded schema growth.
|
||||||
|
> **Migration:** 007_complete_field_capture.sql
|
||||||
|
> **Prerequisite:** None (independent of CP3)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Scope
|
||||||
|
|
||||||
|
One migration. Three categories of work:
|
||||||
|
|
||||||
|
1. **New columns** on `issues` and `merge_requests` for fields currently dropped by serde or dropped during transform
|
||||||
|
2. **New serde fields** on `GitLabIssue` and `GitLabMergeRequest` to deserialize currently-silently-dropped JSON fields
|
||||||
|
3. **Transformer + insert updates** to pass the new fields through to the DB
|
||||||
|
|
||||||
|
No new tables. No new API calls. No new endpoints. All data comes from responses we already receive.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Issues: Field Gap Inventory
|
||||||
|
|
||||||
|
### Currently stored
|
||||||
|
id, iid, project_id, title, description, state, author_username, created_at, updated_at, web_url, due_date, milestone_id, milestone_title, raw_payload_id, last_seen_at, discussions_synced_for_updated_at, labels (junction), assignees (junction)
|
||||||
|
|
||||||
|
### Currently deserialized but dropped during transform
|
||||||
|
| API Field | Status | Action |
|
||||||
|
|-----------|--------|--------|
|
||||||
|
| `closed_at` | Deserialized in serde struct, but no DB column exists and transformer never populates it | Add column in migration 007, wire up in IssueRow + transform + INSERT |
|
||||||
|
| `author.id` | Deserialized | Store as `author_id` column |
|
||||||
|
| `author.name` | Deserialized | Store as `author_name` column |
|
||||||
|
|
||||||
|
### Currently silently dropped by serde (not in GitLabIssue struct)
|
||||||
|
| API Field | Type | DB Column | Notes |
|
||||||
|
|-----------|------|-----------|-------|
|
||||||
|
| `issue_type` | Option\<String\> | `issue_type` | Canonical field (lowercase, e.g. "issue"); preferred for DB storage |
|
||||||
|
| `upvotes` | i64 | `upvotes` | |
|
||||||
|
| `downvotes` | i64 | `downvotes` | |
|
||||||
|
| `user_notes_count` | i64 | `user_notes_count` | Useful for discussion sync optimization |
|
||||||
|
| `merge_requests_count` | i64 | `merge_requests_count` | Count of linked MRs |
|
||||||
|
| `confidential` | bool | `confidential` | 0/1 |
|
||||||
|
| `discussion_locked` | bool | `discussion_locked` | 0/1 |
|
||||||
|
| `weight` | Option\<i64\> | `weight` | Premium/Ultimate, null on Free |
|
||||||
|
| `time_stats.time_estimate` | i64 | `time_estimate` | Seconds |
|
||||||
|
| `time_stats.total_time_spent` | i64 | `time_spent` | Seconds |
|
||||||
|
| `time_stats.human_time_estimate` | Option\<String\> | `human_time_estimate` | e.g. "3h 30m" |
|
||||||
|
| `time_stats.human_total_time_spent` | Option\<String\> | `human_time_spent` | e.g. "1h 15m" |
|
||||||
|
| `task_completion_status.count` | i64 | `task_count` | Checkbox total |
|
||||||
|
| `task_completion_status.completed_count` | i64 | `task_completed_count` | Checkboxes checked |
|
||||||
|
| `has_tasks` | bool | `has_tasks` | 0/1 |
|
||||||
|
| `severity` | Option\<String\> | `severity` | Incident severity |
|
||||||
|
| `closed_by` | Option\<object\> | `closed_by_username` | Who closed it (username only, consistent with author pattern) |
|
||||||
|
| `imported` | bool | `imported` | 0/1 |
|
||||||
|
| `imported_from` | Option\<String\> | `imported_from` | Import source |
|
||||||
|
| `moved_to_id` | Option\<i64\> | `moved_to_id` | Target issue if moved |
|
||||||
|
| `references.short` | String | `references_short` | e.g. "#42" |
|
||||||
|
| `references.relative` | String | `references_relative` | e.g. "#42" or "group/proj#42" |
|
||||||
|
| `references.full` | String | `references_full` | e.g. "group/project#42" |
|
||||||
|
| `health_status` | Option\<String\> | `health_status` | Ultimate only |
|
||||||
|
| `type` | Option\<String\> | (transform-only) | Uppercase category (e.g. "ISSUE"); fallback for `issue_type` -- lowercased before storage. Not stored as separate column; raw JSON remains lossless. |
|
||||||
|
| `epic.id` | Option\<i64\> | `epic_id` | Premium/Ultimate, null on Free |
|
||||||
|
| `epic.iid` | Option\<i64\> | `epic_iid` | |
|
||||||
|
| `epic.title` | Option\<String\> | `epic_title` | |
|
||||||
|
| `epic.url` | Option\<String\> | `epic_url` | |
|
||||||
|
| `epic.group_id` | Option\<i64\> | `epic_group_id` | |
|
||||||
|
| `iteration.id` | Option\<i64\> | `iteration_id` | Premium/Ultimate, null on Free |
|
||||||
|
| `iteration.iid` | Option\<i64\> | `iteration_iid` | |
|
||||||
|
| `iteration.title` | Option\<String\> | `iteration_title` | |
|
||||||
|
| `iteration.state` | Option\<i64\> | `iteration_state` | Enum: 1=upcoming, 2=current, 3=closed |
|
||||||
|
| `iteration.start_date` | Option\<String\> | `iteration_start_date` | ISO date |
|
||||||
|
| `iteration.due_date` | Option\<String\> | `iteration_due_date` | ISO date |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Merge Requests: Field Gap Inventory
|
||||||
|
|
||||||
|
### Currently stored
|
||||||
|
id, iid, project_id, title, description, state, draft, author_username, source_branch, target_branch, head_sha, references_short, references_full, detailed_merge_status, merge_user_username, created_at, updated_at, merged_at, closed_at, last_seen_at, web_url, raw_payload_id, discussions_synced_for_updated_at, discussions_sync_last_attempt_at, discussions_sync_attempts, discussions_sync_last_error, labels (junction), assignees (junction), reviewers (junction)
|
||||||
|
|
||||||
|
### Currently deserialized but dropped during transform
|
||||||
|
| API Field | Status | Action |
|
||||||
|
|-----------|--------|--------|
|
||||||
|
| `author.id` | Deserialized | Store as `author_id` column |
|
||||||
|
| `author.name` | Deserialized | Store as `author_name` column |
|
||||||
|
| `work_in_progress` | Used transiently for `draft` fallback | Already handled, no change needed |
|
||||||
|
| `merge_status` (legacy) | Used transiently for `detailed_merge_status` fallback | Already handled, no change needed |
|
||||||
|
| `merged_by` | Used transiently for `merge_user` fallback | Already handled, no change needed |
|
||||||
|
|
||||||
|
### Currently silently dropped by serde (not in GitLabMergeRequest struct)
|
||||||
|
| API Field | Type | DB Column | Notes |
|
||||||
|
|-----------|------|-----------|-------|
|
||||||
|
| `upvotes` | i64 | `upvotes` | |
|
||||||
|
| `downvotes` | i64 | `downvotes` | |
|
||||||
|
| `user_notes_count` | i64 | `user_notes_count` | |
|
||||||
|
| `source_project_id` | i64 | `source_project_id` | Fork source |
|
||||||
|
| `target_project_id` | i64 | `target_project_id` | Fork target |
|
||||||
|
| `milestone` | Option\<object\> | `milestone_id`, `milestone_title` | Reuse issue milestone pattern |
|
||||||
|
| `merge_when_pipeline_succeeds` | bool | `merge_when_pipeline_succeeds` | 0/1, auto-merge flag |
|
||||||
|
| `merge_commit_sha` | Option\<String\> | `merge_commit_sha` | Commit ref after merge |
|
||||||
|
| `squash_commit_sha` | Option\<String\> | `squash_commit_sha` | Commit ref after squash |
|
||||||
|
| `discussion_locked` | bool | `discussion_locked` | 0/1 |
|
||||||
|
| `should_remove_source_branch` | Option\<bool\> | `should_remove_source_branch` | 0/1 |
|
||||||
|
| `force_remove_source_branch` | Option\<bool\> | `force_remove_source_branch` | 0/1 |
|
||||||
|
| `squash` | bool | `squash` | 0/1 |
|
||||||
|
| `squash_on_merge` | bool | `squash_on_merge` | 0/1 |
|
||||||
|
| `has_conflicts` | bool | `has_conflicts` | 0/1 |
|
||||||
|
| `blocking_discussions_resolved` | bool | `blocking_discussions_resolved` | 0/1 |
|
||||||
|
| `time_stats.time_estimate` | i64 | `time_estimate` | Seconds |
|
||||||
|
| `time_stats.total_time_spent` | i64 | `time_spent` | Seconds |
|
||||||
|
| `time_stats.human_time_estimate` | Option\<String\> | `human_time_estimate` | |
|
||||||
|
| `time_stats.human_total_time_spent` | Option\<String\> | `human_time_spent` | |
|
||||||
|
| `task_completion_status.count` | i64 | `task_count` | |
|
||||||
|
| `task_completion_status.completed_count` | i64 | `task_completed_count` | |
|
||||||
|
| `closed_by` | Option\<object\> | `closed_by_username` | |
|
||||||
|
| `prepared_at` | Option\<String\> | `prepared_at` | ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable |
|
||||||
|
| `merge_after` | Option\<String\> | `merge_after` | ISO datetime in API; store as ms epoch via `iso_to_ms()`, nullable (scheduled merge) |
|
||||||
|
| `imported` | bool | `imported` | 0/1 |
|
||||||
|
| `imported_from` | Option\<String\> | `imported_from` | |
|
||||||
|
| `approvals_before_merge` | Option\<i64\> | `approvals_before_merge` | Deprecated, scheduled for removal in GitLab API v5; store best-effort, keep nullable |
|
||||||
|
| `references.relative` | String | `references_relative` | Currently only short + full stored |
|
||||||
|
| `confidential` | bool | `confidential` | 0/1 (MRs can be confidential too) |
|
||||||
|
| `iteration.id` | Option\<i64\> | `iteration_id` | Premium/Ultimate, null on Free |
|
||||||
|
| `iteration.iid` | Option\<i64\> | `iteration_iid` | |
|
||||||
|
| `iteration.title` | Option\<String\> | `iteration_title` | |
|
||||||
|
| `iteration.state` | Option\<i64\> | `iteration_state` | |
|
||||||
|
| `iteration.start_date` | Option\<String\> | `iteration_start_date` | ISO date |
|
||||||
|
| `iteration.due_date` | Option\<String\> | `iteration_due_date` | ISO date |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Migration 007: complete_field_capture.sql
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Migration 007: Capture all remaining GitLab API response fields.
|
||||||
|
-- Principle: mirror everything GitLab returns. No field left behind.
|
||||||
|
|
||||||
|
-- ============================================================
|
||||||
|
-- ISSUES: new columns
|
||||||
|
-- ============================================================
|
||||||
|
|
||||||
|
-- Fields currently deserialized but not stored
|
||||||
|
ALTER TABLE issues ADD COLUMN closed_at INTEGER; -- ms epoch, deserialized but never stored until now
|
||||||
|
ALTER TABLE issues ADD COLUMN author_id INTEGER; -- GitLab user ID
|
||||||
|
ALTER TABLE issues ADD COLUMN author_name TEXT; -- Display name
|
||||||
|
|
||||||
|
-- Issue metadata
|
||||||
|
ALTER TABLE issues ADD COLUMN issue_type TEXT; -- 'issue' | 'incident' | 'test_case'
|
||||||
|
ALTER TABLE issues ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE issues ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Engagement
|
||||||
|
ALTER TABLE issues ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE issues ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE issues ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE issues ADD COLUMN merge_requests_count INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Time tracking
|
||||||
|
ALTER TABLE issues ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0; -- seconds
|
||||||
|
ALTER TABLE issues ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0; -- seconds
|
||||||
|
ALTER TABLE issues ADD COLUMN human_time_estimate TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN human_time_spent TEXT;
|
||||||
|
|
||||||
|
-- Task lists
|
||||||
|
ALTER TABLE issues ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE issues ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE issues ADD COLUMN has_tasks INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- References (MRs already have short + full)
|
||||||
|
ALTER TABLE issues ADD COLUMN references_short TEXT; -- e.g. "#42"
|
||||||
|
ALTER TABLE issues ADD COLUMN references_relative TEXT; -- context-dependent
|
||||||
|
ALTER TABLE issues ADD COLUMN references_full TEXT; -- e.g. "group/project#42"
|
||||||
|
|
||||||
|
-- Close/move tracking
|
||||||
|
ALTER TABLE issues ADD COLUMN closed_by_username TEXT;
|
||||||
|
|
||||||
|
-- Premium/Ultimate fields (nullable, null on Free tier)
|
||||||
|
ALTER TABLE issues ADD COLUMN weight INTEGER;
|
||||||
|
ALTER TABLE issues ADD COLUMN severity TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN health_status TEXT;
|
||||||
|
|
||||||
|
-- Import tracking
|
||||||
|
ALTER TABLE issues ADD COLUMN imported INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE issues ADD COLUMN imported_from TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN moved_to_id INTEGER;
|
||||||
|
|
||||||
|
-- Epic (Premium/Ultimate, null on Free)
|
||||||
|
ALTER TABLE issues ADD COLUMN epic_id INTEGER;
|
||||||
|
ALTER TABLE issues ADD COLUMN epic_iid INTEGER;
|
||||||
|
ALTER TABLE issues ADD COLUMN epic_title TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN epic_url TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN epic_group_id INTEGER;
|
||||||
|
|
||||||
|
-- Iteration (Premium/Ultimate, null on Free)
|
||||||
|
ALTER TABLE issues ADD COLUMN iteration_id INTEGER;
|
||||||
|
ALTER TABLE issues ADD COLUMN iteration_iid INTEGER;
|
||||||
|
ALTER TABLE issues ADD COLUMN iteration_title TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN iteration_state INTEGER;
|
||||||
|
ALTER TABLE issues ADD COLUMN iteration_start_date TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN iteration_due_date TEXT;
|
||||||
|
|
||||||
|
-- ============================================================
|
||||||
|
-- MERGE REQUESTS: new columns
|
||||||
|
-- ============================================================
|
||||||
|
|
||||||
|
-- Author enrichment
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN author_id INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN author_name TEXT;
|
||||||
|
|
||||||
|
-- Engagement
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN upvotes INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN downvotes INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN user_notes_count INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Fork tracking
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN source_project_id INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN target_project_id INTEGER;
|
||||||
|
|
||||||
|
-- Milestone (parity with issues)
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN milestone_id INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN milestone_title TEXT;
|
||||||
|
|
||||||
|
-- Merge behavior
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN merge_when_pipeline_succeeds INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN merge_commit_sha TEXT;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN squash_commit_sha TEXT;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN squash INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN squash_on_merge INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Merge readiness
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN has_conflicts INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN blocking_discussions_resolved INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Branch cleanup
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN should_remove_source_branch INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN force_remove_source_branch INTEGER;
|
||||||
|
|
||||||
|
-- Discussion lock
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN discussion_locked INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Time tracking
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN time_estimate INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN time_spent INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN human_time_estimate TEXT;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN human_time_spent TEXT;
|
||||||
|
|
||||||
|
-- Task lists
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN task_count INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN task_completed_count INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Close tracking
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN closed_by_username TEXT;
|
||||||
|
|
||||||
|
-- Scheduling (API returns ISO datetimes; we store ms epoch for consistency)
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN prepared_at INTEGER; -- ms epoch after iso_to_ms()
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN merge_after INTEGER; -- ms epoch after iso_to_ms()
|
||||||
|
|
||||||
|
-- References (add relative, short + full already exist)
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN references_relative TEXT;
|
||||||
|
|
||||||
|
-- Import tracking
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN imported INTEGER NOT NULL DEFAULT 0;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN imported_from TEXT;
|
||||||
|
|
||||||
|
-- Premium/Ultimate
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN approvals_before_merge INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
-- Iteration (Premium/Ultimate, null on Free)
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN iteration_id INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN iteration_iid INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN iteration_title TEXT;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN iteration_state INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN iteration_start_date TEXT;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN iteration_due_date TEXT;
|
||||||
|
|
||||||
|
-- Record migration version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (7, strftime('%s', 'now') * 1000, 'Complete API field capture for issues and merge requests');
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Serde Struct Changes
|
||||||
|
|
||||||
|
### Existing type changes
|
||||||
|
|
||||||
|
```
|
||||||
|
GitLabReferences // Add: relative: Option<String> (with #[serde(default)])
|
||||||
|
// Existing fields short + full remain unchanged
|
||||||
|
GitLabIssue // Add #[derive(Default)] for test ergonomics
|
||||||
|
GitLabMergeRequest // Add #[derive(Default)] for test ergonomics
|
||||||
|
```
|
||||||
|
|
||||||
|
### New helper types needed
|
||||||
|
|
||||||
|
```
|
||||||
|
GitLabTimeStats { time_estimate, total_time_spent, human_time_estimate, human_total_time_spent }
|
||||||
|
GitLabTaskCompletionStatus { count, completed_count }
|
||||||
|
GitLabClosedBy (reuse GitLabAuthor shape: id, username, name)
|
||||||
|
GitLabEpic { id, iid, title, url, group_id }
|
||||||
|
GitLabIteration { id, iid, title, state, start_date, due_date }
|
||||||
|
```
|
||||||
|
|
||||||
|
### GitLabIssue: add fields
|
||||||
|
|
||||||
|
```
|
||||||
|
type: Option<String> // #[serde(rename = "type")] -- fallback-only (uppercase category); "type" is reserved in Rust
|
||||||
|
upvotes: i64 // #[serde(default)]
|
||||||
|
downvotes: i64 // #[serde(default)]
|
||||||
|
user_notes_count: i64 // #[serde(default)]
|
||||||
|
merge_requests_count: i64 // #[serde(default)]
|
||||||
|
confidential: bool // #[serde(default)]
|
||||||
|
discussion_locked: bool // #[serde(default)]
|
||||||
|
weight: Option<i64>
|
||||||
|
time_stats: Option<GitLabTimeStats>
|
||||||
|
task_completion_status: Option<GitLabTaskCompletionStatus>
|
||||||
|
has_tasks: bool // #[serde(default)]
|
||||||
|
references: Option<GitLabReferences>
|
||||||
|
closed_by: Option<GitLabAuthor>
|
||||||
|
severity: Option<String>
|
||||||
|
health_status: Option<String>
|
||||||
|
imported: bool // #[serde(default)]
|
||||||
|
imported_from: Option<String>
|
||||||
|
moved_to_id: Option<i64>
|
||||||
|
issue_type: Option<String> // canonical field (lowercase); preferred for DB storage over `type`
|
||||||
|
epic: Option<GitLabEpic>
|
||||||
|
iteration: Option<GitLabIteration>
|
||||||
|
```
|
||||||
|
|
||||||
|
### GitLabMergeRequest: add fields
|
||||||
|
|
||||||
|
```
|
||||||
|
upvotes: i64 // #[serde(default)]
|
||||||
|
downvotes: i64 // #[serde(default)]
|
||||||
|
user_notes_count: i64 // #[serde(default)]
|
||||||
|
source_project_id: Option<i64>
|
||||||
|
target_project_id: Option<i64>
|
||||||
|
milestone: Option<GitLabMilestone> // reuse existing type
|
||||||
|
merge_when_pipeline_succeeds: bool // #[serde(default)]
|
||||||
|
merge_commit_sha: Option<String>
|
||||||
|
squash_commit_sha: Option<String>
|
||||||
|
squash: bool // #[serde(default)]
|
||||||
|
squash_on_merge: bool // #[serde(default)]
|
||||||
|
has_conflicts: bool // #[serde(default)]
|
||||||
|
blocking_discussions_resolved: bool // #[serde(default)]
|
||||||
|
should_remove_source_branch: Option<bool>
|
||||||
|
force_remove_source_branch: Option<bool>
|
||||||
|
discussion_locked: bool // #[serde(default)]
|
||||||
|
time_stats: Option<GitLabTimeStats>
|
||||||
|
task_completion_status: Option<GitLabTaskCompletionStatus>
|
||||||
|
closed_by: Option<GitLabAuthor>
|
||||||
|
prepared_at: Option<String>
|
||||||
|
merge_after: Option<String>
|
||||||
|
imported: bool // #[serde(default)]
|
||||||
|
imported_from: Option<String>
|
||||||
|
approvals_before_merge: Option<i64>
|
||||||
|
confidential: bool // #[serde(default)]
|
||||||
|
iteration: Option<GitLabIteration>
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Transformer Changes
|
||||||
|
|
||||||
|
### IssueRow: add fields
|
||||||
|
|
||||||
|
All new fields map 1:1 from the serde struct except:
|
||||||
|
- `closed_at` -> `iso_to_ms()` conversion (already in serde struct, just not passed through)
|
||||||
|
- `time_stats` -> flatten to 4 individual fields
|
||||||
|
- `task_completion_status` -> flatten to 2 individual fields
|
||||||
|
- `references` -> flatten to 3 individual fields
|
||||||
|
- `closed_by` -> extract `username` only (consistent with author pattern)
|
||||||
|
- `author` -> additionally extract `id` and `name` (currently only `username`)
|
||||||
|
- `issue_type` -> store as-is (canonical, lowercase); fallback to lowercased `type` field if `issue_type` absent
|
||||||
|
- `epic` -> flatten to 5 individual fields (id, iid, title, url, group_id)
|
||||||
|
- `iteration` -> flatten to 6 individual fields (id, iid, title, state, start_date, due_date)
|
||||||
|
|
||||||
|
### NormalizedMergeRequest: add fields
|
||||||
|
|
||||||
|
Same patterns as issues, plus:
|
||||||
|
- `milestone` -> reuse `upsert_milestone_tx` from issue pipeline, add `milestone_id` + `milestone_title`
|
||||||
|
- `prepared_at`, `merge_after` -> `iso_to_ms()` conversion (API provides ISO datetimes)
|
||||||
|
- `source_project_id`, `target_project_id` -> direct pass-through
|
||||||
|
- `iteration` -> flatten to 6 individual fields (same as issues)
|
||||||
|
|
||||||
|
### Insert statement changes
|
||||||
|
|
||||||
|
Both `process_issue_in_transaction` and `process_mr_in_transaction` need their INSERT and ON CONFLICT DO UPDATE statements extended with all new columns. The ON CONFLICT clause should update all new fields on re-sync.
|
||||||
|
|
||||||
|
**Implementation note (reliability):** Define a single authoritative list of persisted columns per entity and generate/compose both SQL fragments from it:
|
||||||
|
- INSERT column list + VALUES placeholders
|
||||||
|
- ON CONFLICT DO UPDATE assignments
|
||||||
|
|
||||||
|
This prevents drift where a new field is added to one clause but not the other -- the most likely bug class with 40+ new columns.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Prerequisite refactors (prep commits before main Phase A work)
|
||||||
|
|
||||||
|
### 1. Align issue transformer on `core::time`
|
||||||
|
|
||||||
|
The issue transformer (`transformers/issue.rs`) has a local `parse_timestamp()` that duplicates `iso_to_ms_strict()` from `core::time`. The MR transformer already uses the shared module. Before adding Phase A's optional timestamp fields (especially `closed_at` as `Option<String>`), migrate the issue transformer to use `iso_to_ms_strict()` and `iso_to_ms_opt_strict()` from `core::time`. This avoids duplicating the `opt` variant locally and establishes one timestamp parsing path across the codebase.
|
||||||
|
|
||||||
|
**Changes:** Replace `parse_timestamp()` calls with `iso_to_ms_strict()`, adapt or remove `TransformError::TimestampParse` (MR transformer uses `String` errors; align on that or on a shared error type).
|
||||||
|
|
||||||
|
### 2. Extract shared ingestion helpers
|
||||||
|
|
||||||
|
`upsert_milestone_tx` (in `ingestion/issues.rs`) and `upsert_label_tx` (duplicated in both `ingestion/issues.rs` and `ingestion/merge_requests.rs`) should be moved to a shared module (e.g., `src/ingestion/shared.rs`). MR ingestion needs `upsert_milestone_tx` for Phase A milestone support, and the label helper is already copy-pasted between files.
|
||||||
|
|
||||||
|
**Changes:** Create `src/ingestion/shared.rs`, move `upsert_milestone_tx`, `upsert_label_tx`, and `MilestoneRow` there. Update imports in both issue and MR ingestion modules.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Files touched
|
||||||
|
|
||||||
|
| File | Change |
|
||||||
|
|------|--------|
|
||||||
|
| `migrations/007_complete_field_capture.sql` | New file |
|
||||||
|
| `src/gitlab/types.rs` | Add `#[derive(Default)]` to `GitLabIssue` and `GitLabMergeRequest`; add `relative: Option<String>` to `GitLabReferences`; add fields to both structs; add `GitLabTimeStats`, `GitLabTaskCompletionStatus`, `GitLabEpic`, `GitLabIteration` |
|
||||||
|
| `src/gitlab/transformers/issue.rs` | Remove local `parse_timestamp()`, switch to `core::time`; extend IssueRow, IssueWithMetadata, transform_issue() |
|
||||||
|
| `src/gitlab/transformers/merge_request.rs` | Extend NormalizedMergeRequest, MergeRequestWithMetadata, transform_merge_request(); extract `references_relative` |
|
||||||
|
| `src/ingestion/shared.rs` | New file: shared `upsert_milestone_tx`, `upsert_label_tx`, `MilestoneRow` |
|
||||||
|
| `src/ingestion/issues.rs` | Extend INSERT/UPSERT SQL; import from shared module |
|
||||||
|
| `src/ingestion/merge_requests.rs` | Extend INSERT/UPSERT SQL; import from shared module; add milestone upsert |
|
||||||
|
| `src/core/db.rs` | Register migration 007 in `MIGRATIONS` array |
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## What this does NOT include
|
||||||
|
|
||||||
|
- No new API endpoints called
|
||||||
|
- No new tables (except reusing existing `milestones` for MRs)
|
||||||
|
- No CLI changes (new fields are stored but not yet surfaced in `lore issues` / `lore mrs` output)
|
||||||
|
- No changes to discussion/note ingestion (Phase A is issues + MRs only)
|
||||||
|
- No observability instrumentation (that's Phase B)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Rollout / Backfill Note
|
||||||
|
|
||||||
|
After applying Migration 007 and shipping transformer + UPSERT updates, **existing rows will not have the new columns populated** until issues/MRs are reprocessed. Plan on a **one-time full re-sync** (`lore ingest --type issues --full` and `lore ingest --type mrs --full`) to backfill the new fields. Until then, queries on new columns will return NULL/default values for previously-synced entities.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Resolved decisions
|
||||||
|
|
||||||
|
| Field | Decision | Rationale |
|
||||||
|
|-------|----------|-----------|
|
||||||
|
| `subscribed` | **Excluded** | User-relative field (reflects token holder's subscription state, not an entity property). Changes meaning if the token is rotated to a different user. Not entity data. |
|
||||||
|
| `_links` | **Excluded** | HATEOAS API navigation metadata, not entity data. Every URL is deterministically constructable from `project_id` + `iid` + GitLab base URL. Note: `closed_as_duplicate_of` inside `_links` contains a real entity reference -- extracting that is deferred to a future phase. |
|
||||||
|
| `epic` / `iteration` | **Flatten to columns** | Same denormalization pattern as milestones. Epic gets 5 columns (`epic_id`, `epic_iid`, `epic_title`, `epic_url`, `epic_group_id`). Iteration gets 6 columns (`iteration_id`, `iteration_iid`, `iteration_title`, `iteration_state`, `iteration_start_date`, `iteration_due_date`). Both nullable (null on Free tier). |
|
||||||
|
| `approvals_before_merge` | **Store best-effort** | Deprecated and scheduled for removal in GitLab API v5. Keep as `Option<i64>` / nullable column. Never depend on it for correctness -- it may disappear in a future GitLab release. |
|
||||||
1009
docs/phase-b-temporal-intelligence.md
Normal file
1009
docs/phase-b-temporal-intelligence.md
Normal file
File diff suppressed because it is too large
Load Diff
866
docs/prd-observability.md
Normal file
866
docs/prd-observability.md
Normal file
@@ -0,0 +1,866 @@
|
|||||||
|
# PRD: Observability Infrastructure for lore CLI
|
||||||
|
|
||||||
|
**Status:** Draft
|
||||||
|
**Author:** Taylor + Claude
|
||||||
|
**Date:** 2026-02-04
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 1. Problem Statement
|
||||||
|
|
||||||
|
lore currently has minimal observability. Logging is ephemeral (stderr only), there are no persistent log files, no performance metrics, no structured JSON log output, no verbosity controls beyond `RUST_LOG`, and no way to diagnose issues after the fact. When a sync fails at 3 AM in a cron job, or an embedding run takes 10x longer than usual, there is zero forensic data available.
|
||||||
|
|
||||||
|
### Current State
|
||||||
|
|
||||||
|
| Capability | Status |
|
||||||
|
|---|---|
|
||||||
|
| Log destination | stderr only, ephemeral |
|
||||||
|
| Log persistence | None |
|
||||||
|
| Structured output | Human-readable fmt only |
|
||||||
|
| Verbosity control | `RUST_LOG` env var (no CLI flag) |
|
||||||
|
| Performance metrics | Ad-hoc `Instant::now()` in 2 commands |
|
||||||
|
| Timing in robot JSON | `elapsed_ms` in search and sync `meta` only |
|
||||||
|
| Spans / correlation | None |
|
||||||
|
| Log rotation | None |
|
||||||
|
| Per-stage timing | None |
|
||||||
|
| Rate limit / retry visibility | `tracing::warn!` only |
|
||||||
|
| Error aggregation | None |
|
||||||
|
| Historical comparison | None |
|
||||||
|
|
||||||
|
### What's Already in Place (to build on)
|
||||||
|
|
||||||
|
- `tracing` (0.1) + `tracing-subscriber` (0.3) with `env-filter` feature
|
||||||
|
- Registry-based subscriber initialized in `src/main.rs:44-58` with a single `fmt::layer()` using `SuspendingWriter`
|
||||||
|
- `SuspendingWriter` (`src/cli/progress.rs:25-73`) that coordinates log output with indicatif `MultiProgress` — buffers each log line, calls `MULTI.suspend()` on drop to clear progress bars before writing to stderr
|
||||||
|
- `IngestDisplay` struct (`src/cli/commands/ingest.rs:65-104`) controlling UI verbosity with three modes: `interactive()` / `silent()` / `progress_only()`
|
||||||
|
- Robot mode JSON envelope: `{ "ok": true, "data": {...}, "meta": {...} }` — used consistently in sync, search, sync-status, and doctor commands
|
||||||
|
- XDG-compliant data directory at `~/.local/share/lore/`
|
||||||
|
- `sync_runs` table (migration 001) with schema: `id`, `started_at`, `heartbeat_at`, `finished_at`, `status`, `command`, `error`, `metrics_json` — **exists but is never written to** (no INSERT anywhere in the codebase; `sync_status.rs` reads from it but always gets zero rows)
|
||||||
|
- `uuid` crate (v1, v4 feature) already a dependency
|
||||||
|
- Structured fields used in tracing calls (e.g., `info!(owner = %self.owner, ...)`)
|
||||||
|
- `EnvFilter` currently hardcoded: `lore=info` + `warn` default directives
|
||||||
|
- Global CLI flags in `src/cli/mod.rs:9-43`: `--config`, `--robot`, `-J`, `--color`, `--quiet` (all `global = true`)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 2. Goals
|
||||||
|
|
||||||
|
### Primary
|
||||||
|
|
||||||
|
1. **Post-mortem debugging**: Any failed or slow run can be diagnosed after the fact from persistent, structured log files.
|
||||||
|
2. **Performance visibility**: Every sync/ingest/embed/search operation reports granular stage-level timing, both to the terminal and to persistent storage.
|
||||||
|
3. **Ergonomic verbosity**: Users and agents control log verbosity through CLI flags (`-v`, `-vv`, `-vvv`) without needing to know `RUST_LOG` syntax.
|
||||||
|
4. **Machine-parseable logs**: A JSON log mode for piping into log aggregators (jq, Datadog, Loki, etc.).
|
||||||
|
5. **Agent-friendly metrics**: Robot mode JSON output includes comprehensive timing breakdowns for every command, enabling automated monitoring.
|
||||||
|
|
||||||
|
### Secondary
|
||||||
|
|
||||||
|
6. **Log rotation and retention**: Log files don't grow unbounded; old logs are automatically cleaned up.
|
||||||
|
7. **Correlation IDs**: Every sync run gets a unique ID that connects log lines, database records, and robot output.
|
||||||
|
8. **Rate limit and retry transparency**: Every rate-limited request and retry is visible in logs with full context.
|
||||||
|
9. **Sync history with metrics**: The `sync_runs` table is enriched with per-stage timing, item counts, and error counts so `lore sync-status` becomes a real dashboard.
|
||||||
|
|
||||||
|
### Non-Goals
|
||||||
|
|
||||||
|
- External telemetry export (OpenTelemetry, Prometheus) -- out of scope for v1.
|
||||||
|
- Real-time log streaming / tailing UI.
|
||||||
|
- Alerting or notification systems.
|
||||||
|
- Distributed tracing across multiple lore instances.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 3. Research Foundation
|
||||||
|
|
||||||
|
### 3.1 The Three Pillars of Observability
|
||||||
|
|
||||||
|
Academic and industry consensus (Gholamian & Ward 2021, "A Comprehensive Survey of Logging in Software") identifies three pillars:
|
||||||
|
|
||||||
|
1. **Logs** -- Discrete events with context. The foundation.
|
||||||
|
2. **Metrics** -- Numerical measurements over time (counters, gauges, histograms).
|
||||||
|
3. **Traces** -- Causally ordered spans representing operations.
|
||||||
|
|
||||||
|
For a CLI tool (not a long-running service), the mapping is:
|
||||||
|
|
||||||
|
| Pillar | CLI Equivalent |
|
||||||
|
|---|---|
|
||||||
|
| Logs | Structured log files per invocation |
|
||||||
|
| Metrics | Per-stage timing, item counts, error counts stored in DB |
|
||||||
|
| Traces | Span hierarchy within a single invocation (sync -> ingest issues -> fetch page N -> sync discussions) |
|
||||||
|
|
||||||
|
### 3.2 Structured Logging Best Practices
|
||||||
|
|
||||||
|
From Duan et al. 2025 ("PDLogger: Automated Logging Framework for Practical Software Development") and industry practice:
|
||||||
|
|
||||||
|
- **Always structured**: JSON or key=value, never free-form prose in production logs.
|
||||||
|
- **Contextual fields propagate**: A sync_run_id set at the top level appears in every downstream log line.
|
||||||
|
- **Levels have semantic meaning**:
|
||||||
|
- `ERROR`: Operation failed, requires attention.
|
||||||
|
- `WARN`: Degraded behavior (rate limited, retry, skip).
|
||||||
|
- `INFO`: Significant state transitions (stage start/complete, items processed).
|
||||||
|
- `DEBUG`: Detailed operational data (page boundaries, individual API calls).
|
||||||
|
- `TRACE`: Wire-level detail (request/response bodies, SQL queries).
|
||||||
|
|
||||||
|
### 3.3 CLI Verbosity Conventions
|
||||||
|
|
||||||
|
From the GNU Coding Standards, POSIX conventions, and modern Rust CLI tools (ripgrep, fd, cargo):
|
||||||
|
|
||||||
|
| Pattern | Meaning | Precedent |
|
||||||
|
|---|---|---|
|
||||||
|
| (default) | INFO for app, WARN for deps | cargo, rustc |
|
||||||
|
| `-q` / `--quiet` | Suppress non-error output | ripgrep, fd, cargo |
|
||||||
|
| `-v` | DEBUG for app | ripgrep, fd |
|
||||||
|
| `-vv` | DEBUG for app + deps | cargo |
|
||||||
|
| `-vvv` | TRACE for everything | cargo, curl |
|
||||||
|
| `RUST_LOG=...` | Fine-grained override | Universal in Rust |
|
||||||
|
|
||||||
|
The `-v` flag should feel familiar to anyone who has used cargo, curl, or ssh.
|
||||||
|
|
||||||
|
### 3.4 Log File Rotation
|
||||||
|
|
||||||
|
`tracing-appender` (from the tokio-rs/tracing ecosystem) provides:
|
||||||
|
|
||||||
|
- **Daily rotation**: New file per day, named `lore.2026-02-04.log`.
|
||||||
|
- **Non-blocking writes**: Dedicated writer thread, zero impact on main async runtime.
|
||||||
|
- **Configurable retention**: Delete files older than N days.
|
||||||
|
|
||||||
|
This is the canonical solution in the Rust tracing ecosystem and requires no custom code.
|
||||||
|
|
||||||
|
### 3.5 Performance Metrics for CLI Tools
|
||||||
|
|
||||||
|
Inspired by hyperfine's approach to benchmarking and cargo's `--timings` flag:
|
||||||
|
|
||||||
|
- Report wall-clock time per stage.
|
||||||
|
- Report item throughput (items/sec).
|
||||||
|
- Store historical runs for trend comparison.
|
||||||
|
- Present timing data in both human-readable and machine-readable formats.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 4. Design
|
||||||
|
|
||||||
|
### 4.1 Architecture Overview
|
||||||
|
|
||||||
|
```
|
||||||
|
CLI Invocation
|
||||||
|
|
|
||||||
|
+----------+----------+
|
||||||
|
| |
|
||||||
|
Interactive Mode Robot Mode
|
||||||
|
| |
|
||||||
|
+---stderr (human fmt) stdout (JSON envelope)
|
||||||
|
| | |
|
||||||
|
| progress bars { ok, data, meta: {
|
||||||
|
| colored output elapsed_ms,
|
||||||
|
| stages: [...],
|
||||||
|
| run_id
|
||||||
|
| }}
|
||||||
|
|
|
||||||
|
Log Subscribers (layered)
|
||||||
|
|
|
||||||
|
+----+----+--------+
|
||||||
|
| | |
|
||||||
|
stderr file (future:
|
||||||
|
(fmt) (JSON) OTLP)
|
||||||
|
```
|
||||||
|
|
||||||
|
### 4.2 Subscriber Stack
|
||||||
|
|
||||||
|
Replace the current single-layer subscriber with a layered registry. Each layer has its own filter:
|
||||||
|
|
||||||
|
```
|
||||||
|
registry()
|
||||||
|
.with(stderr_layer.with_filter(stderr_filter)) // Human-readable, SuspendingWriter, -v controlled
|
||||||
|
.with(file_layer.with_filter(file_filter)) // JSON, daily rotation, always DEBUG+
|
||||||
|
```
|
||||||
|
|
||||||
|
**stderr layer**: Same `fmt::layer()` as today with `SuspendingWriter`, but level controlled by `-v` flags. When `--log-format json` is passed, this layer switches to `fmt::layer().json()` (same JSON format as file layer, but still routed through `SuspendingWriter` for progress bar coordination).
|
||||||
|
|
||||||
|
**file layer**: Always-on JSON output to `~/.local/share/lore/logs/`, daily rotation via `tracing-appender`. Uses its own `EnvFilter` set to `lore=debug,warn` regardless of `-v` flags, ensuring post-mortem data is always available. The file layer does NOT use `SuspendingWriter` — it writes to a file, not stderr, so progress bar coordination is unnecessary.
|
||||||
|
|
||||||
|
**Filter architecture**: Per-layer filtering (not a single shared `EnvFilter`) is required because the file layer must always be at DEBUG+ while stderr follows `-v`. `tracing-subscriber`'s `Layer::with_filter()` method enables this.
|
||||||
|
|
||||||
|
**`RUST_LOG` override**: When `RUST_LOG` is set, it overrides BOTH layer filters. This is the expert escape hatch.
|
||||||
|
|
||||||
|
**Current subscriber** (`src/main.rs:44-58`):
|
||||||
|
```rust
|
||||||
|
tracing_subscriber::registry()
|
||||||
|
.with(
|
||||||
|
tracing_subscriber::fmt::layer()
|
||||||
|
.with_target(false)
|
||||||
|
.with_writer(lore::cli::progress::SuspendingWriter),
|
||||||
|
)
|
||||||
|
.with(
|
||||||
|
EnvFilter::from_default_env()
|
||||||
|
.add_directive("lore=info".parse().unwrap())
|
||||||
|
.add_directive("warn".parse().unwrap()),
|
||||||
|
)
|
||||||
|
.init();
|
||||||
|
```
|
||||||
|
|
||||||
|
This will be replaced by the dual-layer setup. The `SuspendingWriter` integration and `with_target(false)` on the stderr layer remain unchanged.
|
||||||
|
|
||||||
|
### 4.3 Verbosity Levels
|
||||||
|
|
||||||
|
#### stderr layer (controlled by `-v` flags)
|
||||||
|
|
||||||
|
| Flags | App Level | Dep Level | Behavior |
|
||||||
|
|---|---|---|---|
|
||||||
|
| (none) | INFO | WARN | Default. Stage transitions, summaries. |
|
||||||
|
| `-q` | WARN | ERROR | Errors and warnings only. |
|
||||||
|
| `-v` | DEBUG | WARN | Detailed app behavior. API pages, skip reasons. |
|
||||||
|
| `-vv` | DEBUG | INFO | App + dependency detail. HTTP client, SQLite. |
|
||||||
|
| `-vvv` | TRACE | DEBUG | Everything. Wire-level detail. |
|
||||||
|
| `RUST_LOG=...` | (overrides all) | (overrides all) | Expert escape hatch. |
|
||||||
|
|
||||||
|
Precedence: `RUST_LOG` > `-v` flags > defaults. This matches cargo's behavior.
|
||||||
|
|
||||||
|
#### file layer (independent of `-v` flags)
|
||||||
|
|
||||||
|
| Condition | App Level | Dep Level |
|
||||||
|
|---|---|---|
|
||||||
|
| Always (default) | DEBUG | WARN |
|
||||||
|
| `RUST_LOG=...` set | (overrides) | (overrides) |
|
||||||
|
|
||||||
|
The file layer always captures DEBUG+ for the `lore` crate and WARN+ for dependencies. This ensures post-mortem data is available even when the user ran with default stderr verbosity. `RUST_LOG` overrides both layers when set.
|
||||||
|
|
||||||
|
#### New CLI flags
|
||||||
|
|
||||||
|
Add to the `Cli` struct (`src/cli/mod.rs`):
|
||||||
|
|
||||||
|
```rust
|
||||||
|
/// Increase log verbosity (-v, -vv, -vvv)
|
||||||
|
#[arg(short = 'v', long = "verbose", action = clap::ArgAction::Count, global = true)]
|
||||||
|
pub verbose: u8,
|
||||||
|
|
||||||
|
/// Log format for stderr output: text (default) or json
|
||||||
|
#[arg(long = "log-format", global = true, value_parser = ["text", "json"], default_value = "text")]
|
||||||
|
pub log_format: String,
|
||||||
|
```
|
||||||
|
|
||||||
|
The `-v` flag uses `clap::ArgAction::Count` to support `-v`, `-vv`, `-vvv` as a single flag with increasing count. The `--log-format` flag controls whether stderr emits human-readable or JSON-formatted log lines.
|
||||||
|
|
||||||
|
### 4.4 Structured Log File Output
|
||||||
|
|
||||||
|
**Location**: `~/.local/share/lore/logs/lore.YYYY-MM-DD.log`
|
||||||
|
|
||||||
|
**Format**: One JSON object per line (JSONL), produced by `tracing-subscriber`'s `fmt::layer().json()`:
|
||||||
|
|
||||||
|
```json
|
||||||
|
{"timestamp":"2026-02-04T14:32:01.123Z","level":"INFO","target":"lore::ingestion","fields":{"message":"Discussion sync complete","project":"group/repo","issues_synced":42,"elapsed_ms":1234},"span":{"name":"ingest_issues","run_id":"a1b2c3"}}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Rotation**: Daily via `tracing-appender::rolling::daily()`.
|
||||||
|
|
||||||
|
**Retention**: Configurable, default 30 days. A `logs.retention_days` config field. Cleanup runs at startup (check directory, delete files older than N days).
|
||||||
|
|
||||||
|
### 4.5 Tracing Spans
|
||||||
|
|
||||||
|
Introduce spans for causal correlation within a single invocation:
|
||||||
|
|
||||||
|
```
|
||||||
|
sync (run_id=uuid)
|
||||||
|
+-- ingest_issues
|
||||||
|
| +-- fetch_pages (project="group/repo")
|
||||||
|
| +-- sync_discussions (project="group/repo")
|
||||||
|
| +-- fetch_resource_events (project="group/repo")
|
||||||
|
+-- ingest_mrs
|
||||||
|
| +-- fetch_pages (project="group/repo")
|
||||||
|
| +-- sync_discussions (project="group/repo")
|
||||||
|
+-- generate_docs
|
||||||
|
+-- embed
|
||||||
|
```
|
||||||
|
|
||||||
|
Each span records `elapsed_ms` on close. The `run_id` propagates to all child spans and log events, enabling `jq '.span.run_id == "a1b2c3"' lore.2026-02-04.log` to extract an entire run.
|
||||||
|
|
||||||
|
### 4.6 Performance Metrics
|
||||||
|
|
||||||
|
#### 4.6.1 Per-Stage Timing
|
||||||
|
|
||||||
|
Every command collects a `Vec<StageTiming>`:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(Debug, Clone, Serialize)]
|
||||||
|
pub struct StageTiming {
|
||||||
|
pub name: String, // "ingest_issues", "fetch_pages", etc.
|
||||||
|
#[serde(skip_serializing_if = "Option::is_none")]
|
||||||
|
pub project: Option<String>, // Which project, if applicable
|
||||||
|
pub elapsed_ms: u64,
|
||||||
|
pub items_processed: usize,
|
||||||
|
#[serde(skip_serializing_if = "is_zero")]
|
||||||
|
pub items_skipped: usize,
|
||||||
|
#[serde(skip_serializing_if = "is_zero")]
|
||||||
|
pub errors: usize,
|
||||||
|
#[serde(skip_serializing_if = "Vec::is_empty")]
|
||||||
|
pub sub_stages: Vec<StageTiming>, // Nested child stages
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
**Collection mechanism**: Stage timing is materialized from tracing spans, not plumbed manually through function signatures. Phase 2 adds `#[instrument]` spans to each sync stage. Phase 3 adds a custom `tracing-subscriber` layer that records span enter/exit times and structured fields, then extracts the span tree into `Vec<StageTiming>` when the root span closes.
|
||||||
|
|
||||||
|
This means:
|
||||||
|
- No mutable timing collector threaded through `run_ingest` → `fetch_pages` → `sync_discussions`
|
||||||
|
- Spans are the single source of truth for timing
|
||||||
|
- `StageTiming` is a materialized view of the span tree
|
||||||
|
- The custom layer implements `on_close` to capture `elapsed` and `on_record` to capture structured fields like `items_processed`
|
||||||
|
|
||||||
|
**Where to define**: `src/core/metrics.rs` (new file — genuinely new functionality that doesn't fit in any existing file)
|
||||||
|
|
||||||
|
#### 4.6.2 Robot JSON Meta Enhancement
|
||||||
|
|
||||||
|
Currently:
|
||||||
|
```json
|
||||||
|
{ "ok": true, "data": {...}, "meta": { "elapsed_ms": 1234 } }
|
||||||
|
```
|
||||||
|
|
||||||
|
Proposed:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ok": true,
|
||||||
|
"data": { ... },
|
||||||
|
"meta": {
|
||||||
|
"run_id": "a1b2c3d4",
|
||||||
|
"elapsed_ms": 45230,
|
||||||
|
"stages": [
|
||||||
|
{
|
||||||
|
"name": "ingest_issues",
|
||||||
|
"elapsed_ms": 12340,
|
||||||
|
"items_processed": 150,
|
||||||
|
"items_skipped": 30,
|
||||||
|
"errors": 0,
|
||||||
|
"sub_stages": [
|
||||||
|
{ "name": "fetch_pages", "project": "group/repo", "elapsed_ms": 5200, "items_processed": 150 },
|
||||||
|
{ "name": "sync_discussions", "project": "group/repo", "elapsed_ms": 6800, "items_processed": 42, "items_skipped": 108 }
|
||||||
|
]
|
||||||
|
},
|
||||||
|
{
|
||||||
|
"name": "ingest_mrs",
|
||||||
|
"elapsed_ms": 18900,
|
||||||
|
"items_processed": 85,
|
||||||
|
"items_skipped": 12,
|
||||||
|
"errors": 1
|
||||||
|
},
|
||||||
|
{ "name": "generate_docs", "elapsed_ms": 8500, "items_processed": 235 },
|
||||||
|
{ "name": "embed", "elapsed_ms": 5490, "items_processed": 1024 }
|
||||||
|
]
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
#### 4.6.3 Sync History Enrichment
|
||||||
|
|
||||||
|
**Prerequisite bug fix**: The `sync_runs` table (migration 001) exists with columns `id`, `started_at`, `heartbeat_at`, `finished_at`, `status`, `command`, `error`, `metrics_json` — but **no code ever writes to it**. The `sync_status.rs` command reads from it but always gets zero rows. This must be fixed before enrichment.
|
||||||
|
|
||||||
|
**Step 1: Wire up sync_runs lifecycle** (prerequisite, in Phase 4)
|
||||||
|
|
||||||
|
Add INSERT/UPDATE calls to the sync and ingest command handlers:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
// On sync/ingest start:
|
||||||
|
INSERT INTO sync_runs (started_at, heartbeat_at, status, command)
|
||||||
|
VALUES (?now_ms, ?now_ms, 'running', ?command_name)
|
||||||
|
RETURNING id;
|
||||||
|
|
||||||
|
// On sync/ingest success:
|
||||||
|
UPDATE sync_runs
|
||||||
|
SET finished_at = ?now_ms, status = 'succeeded', metrics_json = ?metrics
|
||||||
|
WHERE id = ?run_id;
|
||||||
|
|
||||||
|
// On sync/ingest failure:
|
||||||
|
UPDATE sync_runs
|
||||||
|
SET finished_at = ?now_ms, status = 'failed', error = ?error_msg, metrics_json = ?metrics
|
||||||
|
WHERE id = ?run_id;
|
||||||
|
```
|
||||||
|
|
||||||
|
**Where**: Add a `SyncRunRecorder` helper in `src/core/db.rs` or `src/core/sync_run.rs` that encapsulates the INSERT/UPDATE lifecycle. Called from `run_sync()` in `src/cli/commands/sync.rs` and `run_ingest()` in `src/cli/commands/ingest.rs`.
|
||||||
|
|
||||||
|
**Step 2: Schema migration** (migration 014)
|
||||||
|
|
||||||
|
Add dedicated queryable columns alongside the existing `metrics_json`:
|
||||||
|
|
||||||
|
```sql
|
||||||
|
-- Migration 014: sync_runs enrichment for observability
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
|
||||||
|
|
||||||
|
-- Index for correlation queries
|
||||||
|
CREATE INDEX idx_sync_runs_run_id ON sync_runs(run_id);
|
||||||
|
```
|
||||||
|
|
||||||
|
The existing `metrics_json` column stores the detailed `Vec<StageTiming>` as a JSON array. No need for a separate `stages_json` column.
|
||||||
|
|
||||||
|
**Step 3: Enhanced sync-status display**
|
||||||
|
|
||||||
|
`lore sync-status` (`src/cli/commands/sync_status.rs`) currently shows only the last run. Enhance to show recent runs with metrics:
|
||||||
|
|
||||||
|
```
|
||||||
|
Recent sync runs:
|
||||||
|
Run a1b2c3 | 2026-02-04 14:32 | 45.2s | 235 items | 1 error
|
||||||
|
Run d4e5f6 | 2026-02-03 14:30 | 38.1s | 220 items | 0 errors
|
||||||
|
Run g7h8i9 | 2026-02-02 14:29 | 42.7s | 228 items | 0 errors
|
||||||
|
```
|
||||||
|
|
||||||
|
Robot mode (`lore --robot sync-status`):
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"ok": true,
|
||||||
|
"data": {
|
||||||
|
"runs": [
|
||||||
|
{
|
||||||
|
"run_id": "a1b2c3d4-e5f6-4a7b-8c9d-0e1f2a3b4c5d",
|
||||||
|
"started_at": "2026-02-04T14:32:01.123Z",
|
||||||
|
"elapsed_ms": 45230,
|
||||||
|
"status": "succeeded",
|
||||||
|
"command": "sync",
|
||||||
|
"total_items_processed": 235,
|
||||||
|
"total_errors": 1,
|
||||||
|
"stages": [...]
|
||||||
|
}
|
||||||
|
],
|
||||||
|
"cursors": [...],
|
||||||
|
"summary": {...}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
The `stages` array is parsed from `metrics_json` and included in the robot output. Interactive mode shows the summary table above; `lore --robot sync-status --run a1b2c3` shows a single run's full stage breakdown.
|
||||||
|
|
||||||
|
#### 4.6.4 Human-Readable Timing
|
||||||
|
|
||||||
|
At the end of `lore sync` (interactive mode), print a timing summary:
|
||||||
|
|
||||||
|
```
|
||||||
|
Sync complete in 45.2s
|
||||||
|
Ingest issues .... 12.3s (150 items, 42 discussions)
|
||||||
|
Ingest MRs ....... 18.9s (85 items, 1 error)
|
||||||
|
Generate docs .... 8.5s (235 documents)
|
||||||
|
Embed ............ 5.5s (1024 chunks)
|
||||||
|
```
|
||||||
|
|
||||||
|
Gated behind `display.show_text` so it doesn't appear in progress_only or silent modes.
|
||||||
|
|
||||||
|
### 4.7 Rate Limit and Retry Transparency
|
||||||
|
|
||||||
|
Currently, rate limits emit a `tracing::warn!`. Enhance to:
|
||||||
|
|
||||||
|
- Log at INFO level (not just WARN) with structured fields: `info!(path, attempt, retry_after_secs, "Rate limited, retrying")`.
|
||||||
|
- Count total rate-limit hits per run and include in stage timing.
|
||||||
|
- In `-v` mode, show retry progress on stderr: ` Retrying /api/v4/projects/123/issues (429, waiting 2s)`.
|
||||||
|
|
||||||
|
### 4.8 Configuration
|
||||||
|
|
||||||
|
Add a new `logging` section to `Config` (`src/core/config.rs`):
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[derive(Debug, Clone, Deserialize)]
|
||||||
|
pub struct LoggingConfig {
|
||||||
|
/// Directory for log files. Default: ~/.local/share/lore/logs/
|
||||||
|
#[serde(default)]
|
||||||
|
pub log_dir: Option<String>,
|
||||||
|
|
||||||
|
/// Days to retain log files. Default: 30. Set to 0 to disable file logging.
|
||||||
|
#[serde(default = "default_retention_days")]
|
||||||
|
pub retention_days: u32,
|
||||||
|
|
||||||
|
/// Enable JSON log files. Default: true.
|
||||||
|
#[serde(default = "default_true")]
|
||||||
|
pub file_logging: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
fn default_retention_days() -> u32 { 30 }
|
||||||
|
fn default_true() -> bool { true }
|
||||||
|
```
|
||||||
|
|
||||||
|
Add to the `Config` struct:
|
||||||
|
|
||||||
|
```rust
|
||||||
|
#[serde(default)]
|
||||||
|
pub logging: LoggingConfig,
|
||||||
|
```
|
||||||
|
|
||||||
|
With `config.json`:
|
||||||
|
```json
|
||||||
|
{
|
||||||
|
"logging": {
|
||||||
|
"log_dir": null,
|
||||||
|
"retention_days": 30,
|
||||||
|
"file_logging": true
|
||||||
|
}
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
Defaults are sane so existing configs continue working with zero changes.
|
||||||
|
|
||||||
|
**CLI flags** (added to `Cli` struct in `src/cli/mod.rs`):
|
||||||
|
|
||||||
|
| Flag | Type | Default | Description |
|
||||||
|
|---|---|---|---|
|
||||||
|
| `-v` / `--verbose` | count (u8) | 0 | Increase stderr log verbosity. Stacks: `-v`, `-vv`, `-vvv`. |
|
||||||
|
| `--log-format` | text \| json | text | Stderr log format. `json` emits one JSON object per log line (same schema as file layer). |
|
||||||
|
|
||||||
|
These are global flags (`global = true`) consistent with the existing `--quiet`, `--robot`, etc.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 5. Implementation Plan
|
||||||
|
|
||||||
|
### Phase 1: Verbosity Flags + Structured File Logging
|
||||||
|
|
||||||
|
**Scope**: CLI flags, dual-layer subscriber, file logging, rotation, retention, `--log-format`.
|
||||||
|
|
||||||
|
**Files touched**:
|
||||||
|
- `Cargo.toml` — add `tracing-appender` dependency
|
||||||
|
- `src/cli/mod.rs` — add `-v`/`--verbose` (count) and `--log-format` flags to `Cli` struct
|
||||||
|
- `src/main.rs` — replace subscriber initialization (lines 44-58) with dual-layer setup
|
||||||
|
- `src/core/config.rs` — add `LoggingConfig` struct and `logging` field to `Config`
|
||||||
|
- `src/core/paths.rs` — add `get_log_dir()` helper (XDG data dir + `/logs/`)
|
||||||
|
- `src/cli/commands/doctor.rs` — add log file location and disk usage check
|
||||||
|
|
||||||
|
**Implementation steps**:
|
||||||
|
1. Add `-v` / `--verbose` (count, `u8`) and `--log-format` (text|json) flags to `Cli` struct.
|
||||||
|
2. Add `tracing-appender` dependency to `Cargo.toml`.
|
||||||
|
3. Add `LoggingConfig` to `Config` with `#[serde(default)]`.
|
||||||
|
4. Add `get_log_dir()` to `src/core/paths.rs` (mirrors `get_db_path()` pattern).
|
||||||
|
5. Replace subscriber init in `main.rs`:
|
||||||
|
- Build `stderr_filter` from `-v` count (or `RUST_LOG` if set).
|
||||||
|
- Build `file_filter` as `lore=debug,warn` (or `RUST_LOG` if set).
|
||||||
|
- stderr layer: `fmt::layer().with_writer(SuspendingWriter)` with `stderr_filter`. When `--log-format json`, chain `.json()`.
|
||||||
|
- file layer: `fmt::layer().json().with_writer(tracing_appender::rolling::daily(log_dir, "lore"))` with `file_filter`.
|
||||||
|
- Combine via `registry().with(stderr_layer.with_filter(stderr_filter)).with(file_layer.with_filter(file_filter))`.
|
||||||
|
6. Implement log retention at startup: scan `log_dir`, delete files matching `lore.*.log` pattern older than `retention_days`. Run before subscriber init so deleted files aren't held open.
|
||||||
|
7. Add log file check to `lore doctor`: report log directory path, number of log files, total disk usage. In robot mode, add a `logging` field to `DoctorChecks` with `log_dir`, `file_count`, `total_bytes`, `oldest_file`.
|
||||||
|
|
||||||
|
**New dependencies**: `tracing-appender` (0.2)
|
||||||
|
|
||||||
|
**Interaction with `-q`/`--quiet`**: The existing `--quiet` flag suppresses non-error terminal output via `IngestDisplay::silent()`. It should NOT affect file logging (file layer is always on). When `-q` and `-v` are both passed, `-q` wins for stderr (set stderr filter to WARN+). File layer remains at DEBUG+.
|
||||||
|
|
||||||
|
**Tests** (see Section 6.1 for details):
|
||||||
|
- Unit: `EnvFilter` construction from verbosity count (0→INFO, 1→DEBUG, 2→DEBUG+deps, 3→TRACE)
|
||||||
|
- Unit: `RUST_LOG` overrides `-v` flags
|
||||||
|
- Unit: `-q` + `-v` interaction (quiet wins)
|
||||||
|
- Unit: `LoggingConfig` deserialization with missing/partial/full fields
|
||||||
|
- Unit: Log retention deletes old files, preserves recent ones
|
||||||
|
- Integration: Subscriber produces JSON lines to a test file
|
||||||
|
- Integration: `SuspendingWriter` still works with dual-layer stack (no garbled output)
|
||||||
|
|
||||||
|
### Phase 2: Spans + Correlation IDs
|
||||||
|
|
||||||
|
**Scope**: Tracing spans, UUID-based `run_id`, span recording for JSON logs.
|
||||||
|
|
||||||
|
**Depends on**: Phase 1 (subscriber must support span recording).
|
||||||
|
|
||||||
|
**Files touched**:
|
||||||
|
- `src/cli/commands/sync.rs` — add root span with `run_id` field to `run_sync()`
|
||||||
|
- `src/cli/commands/ingest.rs` — add `#[instrument]` spans to `run_ingest()` and its stages
|
||||||
|
- `src/ingestion/orchestrator.rs` — add spans for `fetch_pages`, `sync_discussions`, `fetch_resource_events`
|
||||||
|
- `src/documents/regenerator.rs` — add span for `generate_docs` stage
|
||||||
|
- `src/embedding/pipeline.rs` — add span for `embed` stage
|
||||||
|
- `src/main.rs` — generate `run_id` before calling command handler, pass as field
|
||||||
|
|
||||||
|
**Implementation steps**:
|
||||||
|
1. Generate `run_id` using `Uuid::new_v4().to_string()[..8]` (first 8 chars of UUIDv4) at command entry in `main.rs`. No new dependency needed — `uuid` v1 with v4 feature is already in `Cargo.toml`.
|
||||||
|
2. Create root span: `let _root = tracing::info_span!("sync", run_id = %run_id).entered();` (or equivalent for each command).
|
||||||
|
3. Add `#[instrument(skip_all, fields(stage = "ingest_issues"))]` to ingest stages.
|
||||||
|
4. Add `#[instrument(skip_all, fields(project = %project_path))]` to per-project functions.
|
||||||
|
5. Ensure the file layer's JSON formatter includes span context. `tracing-subscriber`'s `fmt::layer().json()` includes the current span chain by default when the registry has span storage enabled.
|
||||||
|
6. Verify: parse a log file, confirm every line includes `span.run_id`.
|
||||||
|
|
||||||
|
**New dependencies**: None (`uuid` already present).
|
||||||
|
|
||||||
|
**Tests**:
|
||||||
|
- Unit: `run_id` is a valid 8-character hex string
|
||||||
|
- Integration: Run a sync-like operation with spans, parse JSON log output, verify every line contains `run_id` in span context
|
||||||
|
- Integration: Nested spans produce correct parent-child relationships in JSON output
|
||||||
|
|
||||||
|
### Phase 3: Performance Metrics Collection
|
||||||
|
|
||||||
|
**Scope**: `StageTiming` struct, span-to-metrics extraction, robot JSON enrichment, timing summary.
|
||||||
|
|
||||||
|
**Depends on**: Phase 2 (spans must exist to extract timing from).
|
||||||
|
|
||||||
|
**Files touched**:
|
||||||
|
- `src/core/metrics.rs` — new file: `StageTiming` struct, `MetricsLayer` (custom tracing layer), span-to-timing extraction
|
||||||
|
- `src/cli/commands/sync.rs` — consume `Vec<StageTiming>` from `MetricsLayer`, include in `SyncMeta`
|
||||||
|
- `src/cli/commands/ingest.rs` — same pattern for standalone ingest
|
||||||
|
- `src/main.rs` — register `MetricsLayer` in the subscriber stack
|
||||||
|
|
||||||
|
**Implementation steps**:
|
||||||
|
1. Define `StageTiming` struct with `sub_stages: Vec<StageTiming>` in `src/core/metrics.rs`.
|
||||||
|
2. Implement `MetricsLayer` as a custom `tracing_subscriber::Layer`:
|
||||||
|
- `on_new_span`: Record span ID, name, parent, start time.
|
||||||
|
- `on_record`: Capture structured fields (`items_processed`, `items_skipped`, `errors`) recorded via `Span::record()`.
|
||||||
|
- `on_close`: Calculate `elapsed_ms`, build `StageTiming` entry, attach to parent.
|
||||||
|
- Provide `fn extract_timings(&self, run_id: &str) -> Vec<StageTiming>` to materialize the span tree after the root span closes.
|
||||||
|
3. Store `MetricsLayer` reference (behind `Arc`) so command handlers can call `extract_timings()` after `run_sync()` completes.
|
||||||
|
4. Extend `SyncMeta` and `SyncJsonOutput` to include `run_id: String` and `stages: Vec<StageTiming>`.
|
||||||
|
5. Print human-readable timing summary at end of interactive sync (gated behind `IngestDisplay::show_text`).
|
||||||
|
|
||||||
|
**Span field recording**: Sync stages must record item counts as span fields for `MetricsLayer` to capture:
|
||||||
|
```rust
|
||||||
|
let span = tracing::info_span!("ingest_issues");
|
||||||
|
let _guard = span.enter();
|
||||||
|
// ... do work ...
|
||||||
|
span.record("items_processed", count);
|
||||||
|
span.record("items_skipped", skipped);
|
||||||
|
```
|
||||||
|
|
||||||
|
**Tests**:
|
||||||
|
- Unit: `StageTiming` serialization matches expected JSON (including nested `sub_stages`)
|
||||||
|
- Unit: `MetricsLayer` correctly builds span tree from synthetic span events
|
||||||
|
- Unit: `MetricsLayer` handles spans with no children (leaf stages like `embed`)
|
||||||
|
- Unit: `MetricsLayer` handles concurrent spans (multiple projects in parallel)
|
||||||
|
- Integration: `lore --robot sync` output includes `meta.stages` array with correct nesting
|
||||||
|
- Integration: Interactive sync prints timing summary table to stderr
|
||||||
|
|
||||||
|
### Phase 4: Sync History Enrichment
|
||||||
|
|
||||||
|
**Scope**: Wire up `sync_runs` INSERT/UPDATE lifecycle, schema migration, enhanced sync-status.
|
||||||
|
|
||||||
|
**Depends on**: Phase 3 (needs `Vec<StageTiming>` to store in `metrics_json`).
|
||||||
|
|
||||||
|
**Files touched**:
|
||||||
|
- `migrations/014_sync_runs_enrichment.sql` — new migration: add `run_id`, `total_items_processed`, `total_errors` columns + index
|
||||||
|
- `src/core/sync_run.rs` — new file: `SyncRunRecorder` struct encapsulating INSERT on start, UPDATE on finish
|
||||||
|
- `src/cli/commands/sync.rs` — create `SyncRunRecorder` before pipeline, finalize after
|
||||||
|
- `src/cli/commands/ingest.rs` — same pattern for standalone ingest
|
||||||
|
- `src/cli/commands/sync_status.rs` — enhance to show recent runs with metrics, parse `metrics_json`
|
||||||
|
|
||||||
|
**Implementation steps**:
|
||||||
|
1. Create migration `014_sync_runs_enrichment.sql`:
|
||||||
|
```sql
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
|
||||||
|
CREATE INDEX idx_sync_runs_run_id ON sync_runs(run_id);
|
||||||
|
```
|
||||||
|
Note: Migration number 014 assumes no other migration is added before this phase. If concurrent work adds migration 014, renumber accordingly.
|
||||||
|
2. Implement `SyncRunRecorder`:
|
||||||
|
```rust
|
||||||
|
pub struct SyncRunRecorder { id: i64, conn: Connection }
|
||||||
|
impl SyncRunRecorder {
|
||||||
|
pub fn start(conn: &Connection, command: &str, run_id: &str) -> Result<Self>;
|
||||||
|
pub fn succeed(self, metrics: &[StageTiming], total_items: usize, total_errors: usize) -> Result<()>;
|
||||||
|
pub fn fail(self, error: &str, metrics: Option<&[StageTiming]>) -> Result<()>;
|
||||||
|
}
|
||||||
|
```
|
||||||
|
3. In `run_sync()`: create `SyncRunRecorder::start()` before pipeline, call `.succeed()` or `.fail()` after.
|
||||||
|
4. In `run_ingest()`: same pattern.
|
||||||
|
5. Enhance `sync_status.rs`:
|
||||||
|
- Query last N runs (default 10) instead of just the last 1.
|
||||||
|
- Parse `metrics_json` column to extract stage breakdown.
|
||||||
|
- Show `run_id`, duration, item counts, error counts in both interactive and robot modes.
|
||||||
|
- Add `--run <run_id>` flag to `sync-status` for single-run detail view.
|
||||||
|
|
||||||
|
**Tests**:
|
||||||
|
- Unit: `SyncRunRecorder::start` inserts a row with status='running'
|
||||||
|
- Unit: `SyncRunRecorder::succeed` updates status, sets finished_at, writes metrics_json
|
||||||
|
- Unit: `SyncRunRecorder::fail` updates status, sets error, sets finished_at
|
||||||
|
- Unit: Migration 014 applies cleanly on top of migration 013
|
||||||
|
- Integration: `lore sync` creates a sync_runs row; `lore sync-status` displays it
|
||||||
|
- Integration: `lore --robot sync-status` JSON includes `runs` array with stage breakdowns
|
||||||
|
- Integration: Failed sync records error in sync_runs with partial metrics
|
||||||
|
|
||||||
|
### Phase 5: Rate Limit + Retry Instrumentation
|
||||||
|
|
||||||
|
**Scope**: Enhanced logging in GitLab client, retry counters in stage timing.
|
||||||
|
|
||||||
|
**Depends on**: Phase 2 (spans for context), Phase 3 (StageTiming for counters).
|
||||||
|
|
||||||
|
**Files touched**:
|
||||||
|
- `src/gitlab/client.rs` (or wherever the HTTP client with retry logic lives) — add structured fields to retry/rate-limit log events
|
||||||
|
- `src/core/metrics.rs` — add `rate_limit_hits` and `retries` fields to `StageTiming`
|
||||||
|
|
||||||
|
**Implementation steps**:
|
||||||
|
1. Find the retry/rate-limit handling code (likely in the GitLab HTTP client). Add structured tracing fields:
|
||||||
|
```rust
|
||||||
|
info!(
|
||||||
|
path = %request_path,
|
||||||
|
attempt = attempt_number,
|
||||||
|
retry_after_secs = retry_after,
|
||||||
|
status_code = 429,
|
||||||
|
"Rate limited, retrying"
|
||||||
|
);
|
||||||
|
```
|
||||||
|
2. Add `rate_limit_hits: usize` and `retries: usize` fields to `StageTiming` (with `#[serde(skip_serializing_if = "is_zero")]`).
|
||||||
|
3. In `MetricsLayer`, count rate-limit and retry events within each span and include in `StageTiming`.
|
||||||
|
4. In `-v` mode, the existing stderr layer already shows INFO+ events, so retry activity becomes visible automatically. No additional work needed beyond step 1.
|
||||||
|
|
||||||
|
**Tests**:
|
||||||
|
- Unit: Rate-limit log events include all required structured fields
|
||||||
|
- Unit: `StageTiming` serialization includes `rate_limit_hits` and `retries` when non-zero, omits when zero
|
||||||
|
- Integration: Simulate 429 response, verify log line has `path`, `attempt`, `retry_after_secs` fields
|
||||||
|
- Integration: After simulated retries, `StageTiming` counts match expected values
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 6. Acceptance Criteria
|
||||||
|
|
||||||
|
### 6.1 Phase 1: Verbosity Flags + Structured File Logging
|
||||||
|
|
||||||
|
**Functional criteria**:
|
||||||
|
- [ ] `lore sync` writes JSON log lines to `~/.local/share/lore/logs/lore.YYYY-MM-DD.log` with zero configuration.
|
||||||
|
- [ ] `lore -v sync` shows DEBUG-level `lore::*` output on stderr; dependency output stays at WARN.
|
||||||
|
- [ ] `lore -vv sync` shows DEBUG-level `lore::*` + INFO-level dependency output on stderr.
|
||||||
|
- [ ] `lore -vvv sync` shows TRACE-level output for everything on stderr.
|
||||||
|
- [ ] `RUST_LOG=lore::gitlab=trace lore sync` overrides `-v` flags for both stderr and file layers.
|
||||||
|
- [ ] `lore --log-format json sync` emits JSON-formatted log lines on stderr (same schema as file layer).
|
||||||
|
- [ ] Log files rotate daily (new file per calendar day).
|
||||||
|
- [ ] Files matching `lore.*.log` older than `retention_days` are deleted on startup.
|
||||||
|
- [ ] Existing behavior is unchanged when no new flags are passed (INFO on stderr, human-readable format).
|
||||||
|
- [ ] `--quiet` suppresses non-error stderr output. `-q` + `-v` together: `-q` wins (stderr at WARN+).
|
||||||
|
- [ ] `--quiet` does NOT affect file logging (file layer remains at DEBUG+).
|
||||||
|
- [ ] `lore doctor` reports: log directory path, number of log files, total disk usage in bytes. Robot mode includes a `logging` field in the checks JSON.
|
||||||
|
- [ ] File layer always logs at DEBUG+ for `lore::*` crate regardless of `-v` flags.
|
||||||
|
|
||||||
|
**Test specifications**:
|
||||||
|
- `test_verbosity_filter_construction`: Given verbosity count 0/1/2/3, assert the resulting `EnvFilter` matches the expected directives table.
|
||||||
|
- `test_rust_log_overrides_verbose`: Set `RUST_LOG=lore=trace`, pass `-v` (count=1), assert the filter uses TRACE (not DEBUG).
|
||||||
|
- `test_quiet_overrides_verbose`: Pass `-q` and `-v` together, assert stderr filter is WARN+.
|
||||||
|
- `test_logging_config_defaults`: Deserialize an empty `{}` JSON as `LoggingConfig`, assert `retention_days=30`, `file_logging=true`, `log_dir=None`.
|
||||||
|
- `test_logging_config_partial`: Deserialize `{"retention_days": 7}`, assert `file_logging=true` default preserved.
|
||||||
|
- `test_log_retention_cleanup`: Create temp dir with files named `lore.2026-01-01.log` through `lore.2026-02-04.log`. Run retention with `retention_days=7`. Assert files older than 7 days are deleted, recent files preserved.
|
||||||
|
- `test_log_retention_ignores_non_log_files`: Create temp dir with `lore.2026-01-01.log` and `other.txt`. Run retention. Assert `other.txt` is NOT deleted.
|
||||||
|
- `test_json_log_output_format`: Capture file layer output, parse each line as JSON, assert keys: `timestamp`, `level`, `target`, `fields`, `span`.
|
||||||
|
- `test_suspending_writer_dual_layer`: Run a tracing event with both layers active and a progress bar. Assert no garbled output on stderr (no interleaved progress bar fragments in log lines).
|
||||||
|
|
||||||
|
### 6.2 Phase 2: Spans + Correlation IDs
|
||||||
|
|
||||||
|
**Functional criteria**:
|
||||||
|
- [ ] Every log line within a sync run includes `run_id` in the JSON span context.
|
||||||
|
- [ ] `jq 'select(.spans[] | .run_id != null)' lore.2026-02-04.log` extracts all lines from a run.
|
||||||
|
- [ ] Nested spans produce a chain: log lines inside `fetch_pages` include both the `fetch_pages` span and the parent `ingest_issues` span in their span context.
|
||||||
|
- [ ] `run_id` is an 8-character hex string (truncated UUIDv4).
|
||||||
|
- [ ] Spans are visible in `-vv` stderr output as bracketed context.
|
||||||
|
|
||||||
|
**Test specifications**:
|
||||||
|
- `test_run_id_format`: Generate 100 run_ids, assert each is 8 chars, all hex characters.
|
||||||
|
- `test_run_id_uniqueness`: Generate 1000 run_ids, assert no duplicates.
|
||||||
|
- `test_span_context_in_json_logs`: Run a mock sync with spans, capture JSON log output, parse and verify each line has `spans` array containing `run_id`.
|
||||||
|
- `test_nested_span_chain`: Create parent span "sync" with child "ingest_issues" with child "fetch_pages". Emit a log event inside "fetch_pages". Assert the JSON log line's span chain includes all three span names.
|
||||||
|
- `test_span_elapsed_on_close`: Create a span, sleep 10ms, close it. Verify the close event records `elapsed_ms >= 10`.
|
||||||
|
|
||||||
|
### 6.3 Phase 3: Performance Metrics Collection
|
||||||
|
|
||||||
|
**Functional criteria**:
|
||||||
|
- [ ] `lore --robot sync` JSON includes `meta.run_id` (string) and `meta.stages` (array).
|
||||||
|
- [ ] Each stage in `meta.stages` has: `name`, `elapsed_ms`, `items_processed`.
|
||||||
|
- [ ] Top-level stages (ingest_issues, ingest_mrs, generate_docs, embed) have `sub_stages` arrays.
|
||||||
|
- [ ] Sub-stages include `project` field when applicable.
|
||||||
|
- [ ] `lore sync` (interactive) prints a timing summary table on stderr, gated behind `IngestDisplay::show_text`.
|
||||||
|
- [ ] `lore -q sync` does NOT print the timing summary.
|
||||||
|
- [ ] Zero-value fields (`items_skipped: 0`, `errors: 0`) are omitted from JSON output.
|
||||||
|
|
||||||
|
**Test specifications**:
|
||||||
|
- `test_stage_timing_serialization`: Create a `StageTiming` with sub_stages, serialize to JSON, assert structure matches PRD example.
|
||||||
|
- `test_stage_timing_zero_fields_omitted`: Create `StageTiming` with `errors: 0`, serialize, assert no `errors` key in output.
|
||||||
|
- `test_metrics_layer_single_span`: Create `MetricsLayer`, enter/exit one span with recorded fields, extract timings, assert one `StageTiming` entry.
|
||||||
|
- `test_metrics_layer_nested_spans`: Create parent + child spans, extract timings, assert parent has child in `sub_stages`.
|
||||||
|
- `test_metrics_layer_parallel_spans`: Create two sibling spans (simulating two projects), extract timings, assert both appear as sub_stages of parent.
|
||||||
|
- `test_sync_meta_includes_stages`: Mock a sync pipeline, verify robot JSON output parses correctly with `meta.stages`.
|
||||||
|
- `test_timing_summary_format`: Capture stderr during interactive sync, verify timing table format matches PRD example.
|
||||||
|
|
||||||
|
### 6.4 Phase 4: Sync History Enrichment
|
||||||
|
|
||||||
|
**Functional criteria**:
|
||||||
|
- [ ] `lore sync` creates a row in `sync_runs` with status='running' at start, updated to 'succeeded'/'failed' at finish.
|
||||||
|
- [ ] `lore ingest issues` also creates a `sync_runs` row.
|
||||||
|
- [ ] `sync_runs.run_id` matches the `run_id` in log files and robot JSON.
|
||||||
|
- [ ] `sync_runs.metrics_json` contains the serialized `Vec<StageTiming>`.
|
||||||
|
- [ ] `sync_runs.total_items_processed` and `total_errors` are populated.
|
||||||
|
- [ ] `lore sync-status` shows the last 10 runs with: run_id, timestamp, duration, item count, error count.
|
||||||
|
- [ ] `lore --robot sync-status` JSON includes `runs` array with `stages` parsed from `metrics_json`.
|
||||||
|
- [ ] Failed syncs record the error message and any partial metrics collected before failure.
|
||||||
|
- [ ] Migration 014 applies cleanly and is idempotent (safe to re-run).
|
||||||
|
|
||||||
|
**Test specifications**:
|
||||||
|
- `test_sync_run_recorder_start`: Call `start()`, query sync_runs, assert one row with status='running'.
|
||||||
|
- `test_sync_run_recorder_succeed`: Call `start()` then `succeed()`, assert row has status='succeeded', finished_at set, metrics_json parseable.
|
||||||
|
- `test_sync_run_recorder_fail`: Call `start()` then `fail()`, assert row has status='failed', error set.
|
||||||
|
- `test_sync_run_recorder_fail_with_partial_metrics`: Call `start()`, collect some metrics, then `fail()`. Assert metrics_json contains partial data.
|
||||||
|
- `test_migration_014_applies`: Apply all migrations 001-014 on a fresh DB. Assert `sync_runs` has `run_id`, `total_items_processed`, `total_errors` columns.
|
||||||
|
- `test_migration_014_idempotent`: Apply migration 014 twice. Assert no error on second apply.
|
||||||
|
- `test_sync_status_shows_runs`: Insert 3 sync_runs rows, run `print_sync_status()`, assert output includes all 3 with correct formatting.
|
||||||
|
- `test_sync_status_json_includes_stages`: Insert a sync_runs row with metrics_json, run robot-mode sync-status, parse JSON, assert `runs[0].stages` is an array.
|
||||||
|
|
||||||
|
### 6.5 Phase 5: Rate Limit + Retry Instrumentation
|
||||||
|
|
||||||
|
**Functional criteria**:
|
||||||
|
- [ ] Rate-limit events (HTTP 429) log at INFO with structured fields: `path`, `attempt`, `retry_after_secs`, `status_code`.
|
||||||
|
- [ ] Retry events (non-429 transient errors) log with: `path`, `attempt`, `error`.
|
||||||
|
- [ ] `StageTiming` includes `rate_limit_hits` and `retries` counts (omitted when zero).
|
||||||
|
- [ ] `lore -v sync` shows retry activity on stderr (visible because it's INFO+).
|
||||||
|
- [ ] Rate limit counts are included in `metrics_json` stored in `sync_runs`.
|
||||||
|
|
||||||
|
**Test specifications**:
|
||||||
|
- `test_rate_limit_log_fields`: Simulate a 429 response, capture log output, parse JSON, assert fields: `path`, `attempt`, `retry_after_secs`, `status_code`.
|
||||||
|
- `test_retry_log_fields`: Simulate a transient error + retry, capture log, assert fields: `path`, `attempt`, `error`.
|
||||||
|
- `test_stage_timing_rate_limit_counts`: Simulate 3 rate-limit hits within a span, extract `StageTiming`, assert `rate_limit_hits == 3`.
|
||||||
|
- `test_stage_timing_retry_counts`: Simulate 2 retries, extract `StageTiming`, assert `retries == 2`.
|
||||||
|
- `test_rate_limit_fields_omitted_when_zero`: Create `StageTiming` with zero rate limits, serialize, assert no `rate_limit_hits` key.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 7. Resolved Decisions
|
||||||
|
|
||||||
|
1. **Log format**: Use `tracing-subscriber`'s built-in JSON formatter (`fmt::layer().json()`). Zero custom code, battle-tested, and ecosystem tools (Grafana Loki, Datadog) already parse this format. The schema difference from our robot JSON envelope is cosmetic and not worth the maintenance burden of a custom formatter.
|
||||||
|
|
||||||
|
2. **Span recording**: Always-on. lore is I/O-bound (GitLab API + SQLite), so the nanosecond-level overhead of span storage and chain lookup is unmeasurable against our millisecond-scale operations. Conditional recording would add subscriber construction complexity for zero practical benefit.
|
||||||
|
|
||||||
|
3. **Log file location**: `~/.local/share/lore/logs/` (XDG data directory). Logs are NOT reproducible — you can generate new logs, but you cannot regenerate the exact diagnostic output from a past run. They are forensic artifacts that users would notice missing, so they belong in data, not cache.
|
||||||
|
|
||||||
|
4. **Retention**: In scope for Phase 1. Startup cleanup: scan log directory, delete files matching `lore.*.log` older than `retention_days` (default 30). Simple, no background threads, no external dependencies. Runs before subscriber initialization so deleted file handles aren't held.
|
||||||
|
|
||||||
|
5. **Stage timing granularity**: Per-project with nested sub-stages. When one project has 500 MRs and another has 3, knowing which one consumed the time budget is the difference between "sync was slow" and actionable diagnosis. The `StageTiming` struct includes an optional `project` field and a `sub_stages: Vec<StageTiming>` field for nesting.
|
||||||
|
|
||||||
|
6. **Stage timing collection mechanism**: Materialized from tracing spans, not plumbed manually. A custom `MetricsLayer` in the subscriber stack records span enter/exit/record events and builds the `StageTiming` tree. This avoids threading a mutable collector through every function signature and makes spans the single source of truth for timing data. Phase 2 adds spans; Phase 3 adds the layer that reads them.
|
||||||
|
|
||||||
|
7. **run_id format**: First 8 characters of `Uuid::new_v4().to_string()` (e.g., `"a1b2c3d4"`). The `uuid` crate (v1, v4 feature) is already a dependency. No new crate needed. 8 characters provide ~4 billion unique values — more than sufficient for local CLI invocations.
|
||||||
|
|
||||||
|
8. **File log level**: Always DEBUG+ for `lore::*` crate, WARN+ for dependencies, regardless of `-v` flags. This ensures post-mortem data is always richer than what was shown on stderr. `RUST_LOG` overrides both layers when set.
|
||||||
|
|
||||||
|
9. **sync_runs lifecycle**: The table exists (migration 001) but nothing writes to it. Phase 4 wires up the INSERT (on start) / UPDATE (on finish) lifecycle AND adds enrichment columns in a single migration. The existing `metrics_json` column stores the detailed `Vec<StageTiming>` array — no need for a separate `stages_json` column.
|
||||||
|
|
||||||
|
10. **JSON stderr via --log-format**: A `--log-format text|json` global flag controls stderr log format. Default is `text` (human-readable). When `json`, stderr uses the same JSON formatter as the file layer, routed through `SuspendingWriter` for progress bar coordination. This enables `lore sync 2>&1 | jq` workflows without reading log files.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 8. Phase Dependency Graph
|
||||||
|
|
||||||
|
```
|
||||||
|
Phase 1 (Subscriber + Flags)
|
||||||
|
|
|
||||||
|
v
|
||||||
|
Phase 2 (Spans + run_id)
|
||||||
|
|
|
||||||
|
+------+------+
|
||||||
|
| |
|
||||||
|
v v
|
||||||
|
Phase 3 Phase 5
|
||||||
|
(Metrics) (Rate Limit Logging)
|
||||||
|
| |
|
||||||
|
v |
|
||||||
|
Phase 4 |
|
||||||
|
(Sync History) <--+
|
||||||
|
```
|
||||||
|
|
||||||
|
**Parallelization opportunities**:
|
||||||
|
- Phase 1 must complete before anything else.
|
||||||
|
- Phase 2 must complete before Phase 3 or Phase 5.
|
||||||
|
- Phase 3 and Phase 5 can run in parallel (Phase 5 only needs spans from Phase 2, not MetricsLayer from Phase 3).
|
||||||
|
- Phase 4 depends on Phase 3 (needs `Vec<StageTiming>` to store). Phase 5's `rate_limit_hits`/`retries` fields on `StageTiming` can be added to Phase 4's stored data after Phase 5 completes, or Phase 4 can store them as zero initially.
|
||||||
|
|
||||||
|
**Agent assignment suggestion**:
|
||||||
|
- Agent A: Phase 1 → Phase 2 (sequential, foundational infrastructure)
|
||||||
|
- Agent B: Phase 3 (after Phase 2 completes)
|
||||||
|
- Agent C: Phase 5 (after Phase 2 completes, parallel with Phase 3)
|
||||||
|
- Agent B or D: Phase 4 (after Phase 3 completes)
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## 9. References
|
||||||
|
|
||||||
|
- Gholamian, S. & Ward, P. (2021). "A Comprehensive Survey of Logging in Software." arXiv:2110.12489.
|
||||||
|
- Duan, S. et al. (2025). "PDLogger: Automated Logging Framework for Practical Software Development." arXiv:2507.19951.
|
||||||
|
- tokio-rs/tracing ecosystem: `tracing`, `tracing-subscriber`, `tracing-appender`.
|
||||||
|
- GNU Coding Standards: Verbosity and diagnostic output conventions.
|
||||||
|
Rust CLI Working Group: Recommendations for error reporting and verbosity.
|
||||||
174
docs/prd-per-note-search.feedback-1.md
Normal file
174
docs/prd-per-note-search.feedback-1.md
Normal file
@@ -0,0 +1,174 @@
|
|||||||
|
Highest-impact gaps I see in the current plan:
|
||||||
|
|
||||||
|
1. `for-issue` / `for-mr` filtering is ambiguous across projects and can return incorrect rows.
|
||||||
|
2. `lore notes` has no pagination contract, so large exports and deterministic resumption are weak.
|
||||||
|
3. Migration `022` is high-risk (table rebuild + FTS + junction tables) without explicit integrity gates.
|
||||||
|
4. Note-doc freshness is incomplete for upstream note deletions and parent metadata changes (labels/title).
|
||||||
|
|
||||||
|
Below are my best revisions, each with rationale and a git-diff-style plan edit.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
1. **Add gated rollout + rollback controls**
|
||||||
|
Rationale: You can still “ship together” while reducing blast radius. This makes recovery fast if note-doc generation causes DB/embedding pressure.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Design
|
||||||
|
-Two phases, shipped together as one feature:
|
||||||
|
+Two phases, shipped together as one feature, but with runtime gates:
|
||||||
|
+
|
||||||
|
+- `feature.notes_cli` (Phase 1 surface)
|
||||||
|
+- `feature.note_documents` (Phase 2 indexing/extraction path)
|
||||||
|
+
|
||||||
|
+Rollout order:
|
||||||
|
+1) Enable `notes_cli`
|
||||||
|
+2) Run note-doc backfill in bounded batches
|
||||||
|
+3) Enable `note_documents` for continuous updates
|
||||||
|
+
|
||||||
|
+Rollback:
|
||||||
|
+- Disabling `feature.note_documents` stops new note-doc generation without affecting issue/MR/discussion docs.
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Add keyset pagination + deterministic ordering**
|
||||||
|
Rationale: Needed for year-long reviewer analysis and reliable “continue where I left off” behavior under concurrent updates.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ pub struct NoteListFilters<'a> {
|
||||||
|
pub limit: usize,
|
||||||
|
+ pub cursor: Option<&'a str>, // keyset token "<sort_ms>:<id>"
|
||||||
|
+ pub include_total_count: bool, // avoid COUNT(*) in hot paths
|
||||||
|
@@
|
||||||
|
- pub sort: &'a str, // "created" (default) | "updated"
|
||||||
|
+ pub sort: &'a str, // "created" | "updated"
|
||||||
|
@@ query_notes SQL
|
||||||
|
-ORDER BY {sort_column} {order}
|
||||||
|
+ORDER BY {sort_column} {order}, n.id {order}
|
||||||
|
LIMIT ?
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Make `for-issue` / `for-mr` project-scoped**
|
||||||
|
Rationale: IIDs are not globally unique. Requiring project avoids false positives and hard-to-debug cross-project leakage.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ pub struct NotesArgs {
|
||||||
|
- #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr")]
|
||||||
|
+ #[arg(long = "for-issue", help_heading = "Filters", conflicts_with = "for_mr", requires = "project")]
|
||||||
|
pub for_issue: Option<i64>,
|
||||||
|
@@
|
||||||
|
- #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue")]
|
||||||
|
+ #[arg(long = "for-mr", help_heading = "Filters", conflicts_with = "for_issue", requires = "project")]
|
||||||
|
pub for_mr: Option<i64>,
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Upgrade path filtering semantics**
|
||||||
|
Rationale: Review comments often reference renames/moves. Restricting to `position_new_path` misses relevant notes.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ pub struct NotesArgs {
|
||||||
|
- /// Filter by file path (trailing / for prefix match)
|
||||||
|
+ /// Filter by file path
|
||||||
|
#[arg(long, help_heading = "Filters")]
|
||||||
|
pub path: Option<String>,
|
||||||
|
+ /// Path mode: exact|prefix|glob
|
||||||
|
+ #[arg(long = "path-mode", value_parser = ["exact","prefix","glob"], default_value = "exact", help_heading = "Filters")]
|
||||||
|
+ pub path_mode: String,
|
||||||
|
+ /// Match against old path as well as new path
|
||||||
|
+ #[arg(long = "match-old-path", help_heading = "Filters")]
|
||||||
|
+ pub match_old_path: bool,
|
||||||
|
@@ query_notes filter mappings
|
||||||
|
-- `path` ... n.position_new_path ...
|
||||||
|
+- `path` applies to `n.position_new_path` and optionally `n.position_old_path`.
|
||||||
|
+- `glob` mode translates `*`/`?` to SQL LIKE with escaping.
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Add explicit performance indexes (new migration)**
|
||||||
|
Rationale: `notes` becomes a first-class query surface; without indexes, filters degrade quickly at 10k+ note scale.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Phase 1: `lore notes` Command
|
||||||
|
+### Work Chunk 1E: Query Performance Indexes
|
||||||
|
+**Files:** `migrations/023_notes_query_indexes.sql`, `src/core/db.rs`
|
||||||
|
+
|
||||||
|
+Add indexes:
|
||||||
|
+- `notes(project_id, created_at DESC, id DESC)`
|
||||||
|
+- `notes(author_username, created_at DESC, id DESC) WHERE is_system = 0`
|
||||||
|
+- `notes(discussion_id)`
|
||||||
|
+- `notes(position_new_path)`
|
||||||
|
+- `notes(position_old_path)`
|
||||||
|
+- `discussions(issue_id)`
|
||||||
|
+- `discussions(merge_request_id)`
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Harden migration 022 with transactional integrity checks**
|
||||||
|
Rationale: This is the riskiest part of the plan. Add hard fail-fast checks so corruption cannot silently pass.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ### Work Chunk 2A: Schema Migration (022)
|
||||||
|
+Migration safety requirements:
|
||||||
|
+- Execute in a single `BEGIN IMMEDIATE ... COMMIT` transaction.
|
||||||
|
+- Capture and compare pre/post row counts for `documents`, `document_labels`, `document_paths`, `dirty_sources`.
|
||||||
|
+- Run `PRAGMA foreign_key_check` and abort on any violation.
|
||||||
|
+- Run `PRAGMA integrity_check` and abort on non-`ok`.
|
||||||
|
+- Rebuild FTS and assert `documents_fts` rowcount equals `documents` rowcount.
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Add note deletion + parent-change propagation**
|
||||||
|
Rationale: Current plan handles create/update ingestion but not all staleness paths. Without this, note documents drift.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Phase 2: Per-Note Documents
|
||||||
|
+### Work Chunk 2G: Freshness Propagation
|
||||||
|
+**Files:** `src/ingestion/discussions.rs`, `src/ingestion/mr_discussions.rs`, `src/documents/regenerator.rs`
|
||||||
|
+
|
||||||
|
+Rules:
|
||||||
|
+- If a previously stored note is missing from upstream payload, delete local note row and enqueue `(note, id)` for document deletion.
|
||||||
|
+- When parent issue/MR title or labels change, enqueue descendant note docs dirty (notes inherit parent metadata).
|
||||||
|
+- Keep idempotent behavior for repeated syncs.
|
||||||
|
```
|
||||||
|
|
||||||
|
8. **Separate FTS coverage from embedding coverage**
|
||||||
|
Rationale: Biggest cost/perf risk is embeddings. Index all notes in FTS, but embed selectively with policy knobs.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Estimated Document Volume Impact
|
||||||
|
-FTS5 handles this comfortably. Embedding generation time scales linearly (~4x increase).
|
||||||
|
+FTS5 handles this comfortably. Embedding generation is policy-controlled:
|
||||||
|
+- FTS: index all non-system note docs
|
||||||
|
+- Embeddings default: only notes with body length >= 40 chars (configurable)
|
||||||
|
+- Add config: `documents.note_embeddings.min_chars`, `documents.note_embeddings.enabled`
|
||||||
|
+- Prioritize unresolved DiffNotes before other notes during embedding backfill
|
||||||
|
```
|
||||||
|
|
||||||
|
9. **Bring structured reviewer profiling into scope (not narrative reporting)**
|
||||||
|
Rationale: This directly serves the stated use case and makes the feature compelling immediately.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Non-Goals
|
||||||
|
-- Adding a "reviewer profile" report command (that's a downstream use case built on this infrastructure)
|
||||||
|
+- Generating free-form narrative reviewer reports.
|
||||||
|
+ A structured profiling command is in scope.
|
||||||
|
+
|
||||||
|
+## Phase 3: Structured Reviewer Profiling
|
||||||
|
+Add `lore notes profile --author <user> --since <window>` returning:
|
||||||
|
+- top commented paths
|
||||||
|
+- top parent labels
|
||||||
|
+- unresolved-comment ratio
|
||||||
|
+- note-type distribution
|
||||||
|
+- median comment length
|
||||||
|
```
|
||||||
|
|
||||||
|
10. **Add operational SLOs + robot-mode status for note pipeline**
|
||||||
|
Rationale: Reliability improves when regressions are observable, not inferred from failures.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Verification Checklist
|
||||||
|
+Operational checks:
|
||||||
|
+- `lore -J stats` includes per-`source_type` document counts (including `note`)
|
||||||
|
+- Add queue lag metrics: oldest dirty note age, retry backlog size
|
||||||
|
+- Add extraction error breakdown by `source_type`
|
||||||
|
+- Add smoke assertion: disabling `feature.note_documents` leaves other source regeneration unaffected
|
||||||
|
```
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
If you want, I can produce a single consolidated revised PRD draft (fully merged text, not just diffs) as the next step.
|
||||||
200
docs/prd-per-note-search.feedback-2.md
Normal file
200
docs/prd-per-note-search.feedback-2.md
Normal file
@@ -0,0 +1,200 @@
|
|||||||
|
Below are the strongest revisions I’d make, excluding everything in your `## Rejected Recommendations` list.
|
||||||
|
|
||||||
|
1. **Add a Phase 0 for stable note identity before any note-doc generation**
|
||||||
|
Rationale: your current plan still allows note document churn because Issue discussion ingestion is delete/reinsert-based. That makes local `notes.id` unstable, causing unnecessary dirtying/regeneration and potential stale-doc edge cases. Stabilizing identity first (upsert-by-GitLab-ID + sweep stale) improves correctness and cuts repeated work.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Design
|
||||||
|
-Two phases, shipped together as one feature:
|
||||||
|
+Three phases, shipped together as one feature:
|
||||||
|
+- **Phase 0 (Foundation):** Stable note identity in local DB (upsert + sweep, no delete/reinsert churn)
|
||||||
|
- **Phase 1 (Option A):** `lore notes` command — direct SQL query over the `notes` table with rich filtering
|
||||||
|
- **Phase 2 (Option B):** Per-note documents — each non-system note becomes its own searchable document in the FTS/embedding pipeline
|
||||||
|
@@
|
||||||
|
+## Phase 0: Stable Note Identity
|
||||||
|
+
|
||||||
|
+### Work Chunk 0A: Upsert/Sweep for Issue Discussion Notes
|
||||||
|
+**Files:** `src/ingestion/discussions.rs`, `migrations/022_notes_identity_index.sql`, `src/core/db.rs`
|
||||||
|
+**Implementation:**
|
||||||
|
+- Add unique index: `UNIQUE(project_id, gitlab_id)` on `notes`
|
||||||
|
+- Replace delete/reinsert issue-note flow with upsert + `last_seen_at` sweep (same durability model as MR note sweep)
|
||||||
|
+- Ensure `insert_note/upsert_note` returns the stable local row id for both insert and update paths
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Replace `source_type` CHECK constraints with a registry table + FK in migration**
|
||||||
|
Rationale: table CHECKs force full table rebuild for every new source type forever. A `source_types` table with FK keeps DB-level integrity and future extensibility without rebuilding `documents`/`dirty_sources` every time. This is a major architecture hardening win.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ### Work Chunk 2A: Schema Migration (023)
|
||||||
|
-Current migration ... CHECK constraints limiting `source_type` ...
|
||||||
|
+Current migration ... CHECK constraints limiting `source_type` ...
|
||||||
|
+Revision: migrate to `source_types` registry table + FK constraints.
|
||||||
|
@@
|
||||||
|
-1. `dirty_sources` — add `'note'` to source_type CHECK
|
||||||
|
-2. `documents` — add `'note'` to source_type CHECK
|
||||||
|
+1. Create `source_types(name TEXT PRIMARY KEY)` and seed: `issue, merge_request, discussion, note`
|
||||||
|
+2. Rebuild `dirty_sources` and `documents` to replace CHECK with `REFERENCES source_types(name)`
|
||||||
|
+3. Future source-type additions become `INSERT INTO source_types(name) VALUES (?)` (no table rebuild)
|
||||||
|
@@
|
||||||
|
+#### Additional integrity tests
|
||||||
|
+#[test]
|
||||||
|
+fn test_source_types_registry_contains_note() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_documents_source_type_fk_enforced() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_dirty_sources_source_type_fk_enforced() { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Mark note documents dirty only when note semantics actually changed**
|
||||||
|
Rationale: current loops mark every non-system note dirty every sync. With 8k+ notes this creates avoidable queue pressure and regeneration time. Change-aware dirtying (inserted/changed only) gives major performance and stability improvements.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ### Work Chunk 2D: Regenerator & Dirty Tracking Integration
|
||||||
|
-for note in notes {
|
||||||
|
- let local_note_id = insert_note(&tx, local_discussion_id, ¬e, None)?;
|
||||||
|
- if !note.is_system {
|
||||||
|
- dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, local_note_id)?;
|
||||||
|
- }
|
||||||
|
-}
|
||||||
|
+for note in notes {
|
||||||
|
+ let outcome = upsert_note(&tx, local_discussion_id, ¬e, None)?;
|
||||||
|
+ if !note.is_system && outcome.changed_semantics {
|
||||||
|
+ dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
|
||||||
|
+ }
|
||||||
|
+}
|
||||||
|
@@
|
||||||
|
+// changed_semantics should include: body, note_type, path/line positions, resolvable/resolved/resolved_by, updated_at
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Expand filters to support real analysis windows and resolution state**
|
||||||
|
Rationale: reviewer profiling usually needs bounded windows and both resolved/unresolved views. Current `unresolved: bool` is too narrow and one-sided. Add `--until` and tri-state resolution filtering for better analytical power.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ pub struct NoteListFilters<'a> {
|
||||||
|
- pub since: Option<&'a str>,
|
||||||
|
+ pub since: Option<&'a str>,
|
||||||
|
+ pub until: Option<&'a str>,
|
||||||
|
@@
|
||||||
|
- pub unresolved: bool,
|
||||||
|
+ pub resolution: &'a str, // "any" (default) | "unresolved" | "resolved"
|
||||||
|
@@
|
||||||
|
- pub author: Option<&'a str>,
|
||||||
|
+ pub author: Option<&'a str>, // case-insensitive match
|
||||||
|
@@
|
||||||
|
- // Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
|
||||||
|
+ // Filter by start time (7d, 2w, 1m, or YYYY-MM-DD)
|
||||||
|
pub since: Option<String>,
|
||||||
|
+ /// Filter by end time (7d, 2w, 1m, or YYYY-MM-DD)
|
||||||
|
+ #[arg(long, help_heading = "Filters")]
|
||||||
|
+ pub until: Option<String>,
|
||||||
|
@@
|
||||||
|
- /// Only show unresolved review comments
|
||||||
|
- pub unresolved: bool,
|
||||||
|
+ /// Resolution filter: any, unresolved, resolved
|
||||||
|
+ #[arg(long, value_parser = ["any", "unresolved", "resolved"], default_value = "any", help_heading = "Filters")]
|
||||||
|
+ pub resolution: String,
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Broaden index strategy to match actual query shapes, not just author queries**
|
||||||
|
Rationale: `idx_notes_user_created` helps one path, but common usage also includes project+time scans and unresolved filters. Add two more partial composites for high-selectivity paths.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ### Work Chunk 1E: Composite Query Index
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_notes_user_created
|
||||||
|
ON notes(project_id, author_username, created_at DESC, id DESC)
|
||||||
|
WHERE is_system = 0;
|
||||||
|
+
|
||||||
|
+CREATE INDEX IF NOT EXISTS idx_notes_project_created
|
||||||
|
+ON notes(project_id, created_at DESC, id DESC)
|
||||||
|
+WHERE is_system = 0;
|
||||||
|
+
|
||||||
|
+CREATE INDEX IF NOT EXISTS idx_notes_unresolved_project_created
|
||||||
|
+ON notes(project_id, created_at DESC, id DESC)
|
||||||
|
+WHERE is_system = 0 AND resolvable = 1 AND resolved = 0;
|
||||||
|
@@
|
||||||
|
+#[test]
|
||||||
|
+fn test_notes_query_plan_uses_project_created_index_for_default_listing() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_notes_query_plan_uses_unresolved_index_when_resolution_unresolved() { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Improve per-note document payload with structured metadata header + minimal thread context**
|
||||||
|
Rationale: isolated single-note docs can lose meaning. A small structured header plus lightweight context (parent + one preceding note excerpt) improves semantic retrieval quality substantially without re-bundling full threads.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ### Work Chunk 2C: Note Document Extractor
|
||||||
|
-// 6. Format content:
|
||||||
|
-// [[Note]] {note_type or "Comment"} on {parent_type_prefix}: {parent_title}
|
||||||
|
-// Project: {path_with_namespace}
|
||||||
|
-// URL: {url}
|
||||||
|
-// Author: @{author}
|
||||||
|
-// Date: {format_date(created_at)}
|
||||||
|
-// Labels: {labels_json}
|
||||||
|
-// File: {position_new_path}:{position_new_line} (if DiffNote)
|
||||||
|
-//
|
||||||
|
-// --- Body ---
|
||||||
|
-//
|
||||||
|
-// {body}
|
||||||
|
+// 6. Format content with machine-readable header:
|
||||||
|
+// [[Note]]
|
||||||
|
+// source_type: note
|
||||||
|
+// note_gitlab_id: {gitlab_id}
|
||||||
|
+// project: {path_with_namespace}
|
||||||
|
+// parent_type: {Issue|MergeRequest}
|
||||||
|
+// parent_iid: {iid}
|
||||||
|
+// note_type: {DiffNote|DiscussionNote|Comment}
|
||||||
|
+// author: @{author}
|
||||||
|
+// created_at: {iso8601}
|
||||||
|
+// resolved: {true|false}
|
||||||
|
+// path: {position_new_path}:{position_new_line}
|
||||||
|
+// url: {url}
|
||||||
|
+//
|
||||||
|
+// --- Context ---
|
||||||
|
+// parent_title: {title}
|
||||||
|
+// previous_note_excerpt: {optional, max 200 chars}
|
||||||
|
+//
|
||||||
|
+// --- Body ---
|
||||||
|
+// {body}
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Add first-class export modes for downstream profiling pipelines**
|
||||||
|
Rationale: this makes the feature much more useful immediately (LLM prompts, notebook analysis, external scripts) without adding a profiling command. It stays within your non-goals and increases adoption.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ pub struct NotesArgs {
|
||||||
|
+ /// Output format
|
||||||
|
+ #[arg(long, value_parser = ["table", "json", "jsonl", "csv"], default_value = "table", help_heading = "Output")]
|
||||||
|
+ pub format: String,
|
||||||
|
@@
|
||||||
|
- if robot_mode {
|
||||||
|
+ if robot_mode || args.format == "json" || args.format == "jsonl" || args.format == "csv" {
|
||||||
|
print_list_notes_json(...)
|
||||||
|
} else {
|
||||||
|
print_list_notes(&result);
|
||||||
|
}
|
||||||
|
@@ ### Work Chunk 1C: Human & Robot Output Formatting
|
||||||
|
+Add `print_list_notes_csv()` and `print_list_notes_jsonl()`:
|
||||||
|
+- CSV columns mirror `NoteListRowJson` field names
|
||||||
|
+- JSONL emits one note object per line for streaming pipelines
|
||||||
|
```
|
||||||
|
|
||||||
|
8. **Strengthen verification with idempotence + migration data-preservation checks**
|
||||||
|
Rationale: this feature touches ingestion, migrations, indexing, and regeneration. Add explicit idempotence/perf checks so regressions surface early.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ ## Verification Checklist
|
||||||
|
cargo test
|
||||||
|
cargo clippy --all-targets -- -D warnings
|
||||||
|
cargo fmt --check
|
||||||
|
+cargo test test_note_ingestion_idempotent_across_two_syncs
|
||||||
|
+cargo test test_note_document_count_stable_after_second_generate_docs_full
|
||||||
|
@@
|
||||||
|
+lore sync
|
||||||
|
+lore generate-docs --full
|
||||||
|
+lore -J stats > /tmp/stats1.json
|
||||||
|
+lore generate-docs --full
|
||||||
|
+lore -J stats > /tmp/stats2.json
|
||||||
|
+# assert note doc count unchanged and dirty queue drains to zero
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want, I can turn this into a fully rewritten PRD v2 draft with these changes merged in-place and renumbered work chunks end-to-end.
|
||||||
162
docs/prd-per-note-search.feedback-3.md
Normal file
162
docs/prd-per-note-search.feedback-3.md
Normal file
@@ -0,0 +1,162 @@
|
|||||||
|
These are the highest-impact revisions I’d make. They avoid everything in your `## Rejected Recommendations` list.
|
||||||
|
|
||||||
|
1. Add immediate note-document deletion propagation (don’t wait for `generate-docs --full`)
|
||||||
|
Why: right now, deleted notes can leave stale `source_type='note'` documents until a full rebuild. That creates incorrect search/reporting results and weakens trust in the dataset.
|
||||||
|
```diff
|
||||||
|
@@ Phase 0: Stable Note Identity
|
||||||
|
+### Work Chunk 0B: Immediate Deletion Propagation
|
||||||
|
+
|
||||||
|
+When sweep deletes stale notes, propagate deletion to documents in the same transaction.
|
||||||
|
+Do not rely on eventual cleanup via `generate-docs --full`.
|
||||||
|
+
|
||||||
|
+#### Tests to Write First
|
||||||
|
+#[test]
|
||||||
|
+fn test_issue_note_sweep_deletes_note_documents_immediately() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_mr_note_sweep_deletes_note_documents_immediately() { ... }
|
||||||
|
+
|
||||||
|
+#### Implementation
|
||||||
|
+Use `DELETE ... RETURNING id, is_system` in note sweep functions.
|
||||||
|
+For returned non-system note ids:
|
||||||
|
+1) `DELETE FROM documents WHERE source_type='note' AND source_id=?`
|
||||||
|
+2) `DELETE FROM dirty_sources WHERE source_type='note' AND source_id=?`
|
||||||
|
```
|
||||||
|
|
||||||
|
2. Add one-time upgrade backfill for existing notes (migration 024)
|
||||||
|
Why: existing DBs will otherwise only get note-documents for changed/new notes. Historical notes remain invisible unless users manually run full rebuild.
|
||||||
|
```diff
|
||||||
|
@@ Phase 2: Per-Note Documents
|
||||||
|
+### Work Chunk 2H: Backfill Existing Notes After Upgrade (Migration 024)
|
||||||
|
+
|
||||||
|
+Create migration `024_note_dirty_backfill.sql`:
|
||||||
|
+INSERT INTO dirty_sources (source_type, source_id, queued_at)
|
||||||
|
+SELECT 'note', n.id, unixepoch('now') * 1000
|
||||||
|
+FROM notes n
|
||||||
|
+LEFT JOIN documents d
|
||||||
|
+ ON d.source_type='note' AND d.source_id=n.id
|
||||||
|
+WHERE n.is_system=0 AND d.id IS NULL
|
||||||
|
+ON CONFLICT(source_type, source_id) DO NOTHING;
|
||||||
|
+
|
||||||
|
+Add migration test asserting idempotence and expected queue size.
|
||||||
|
```
|
||||||
|
|
||||||
|
3. Fix `--since/--until` semantics and validation
|
||||||
|
Why: reusing `parse_since` for `until` creates ambiguous windows and off-by-boundary behavior; your own example `--since 90d --until 180d` is chronologically reversed.
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1A: Data Types & Query Layer
|
||||||
|
- since: parse_since(since_str) then n.created_at >= ?
|
||||||
|
- until: parse_since(until_str) then n.created_at <= ?
|
||||||
|
+ since: parse_since_start_bound(since_str) then n.created_at >= ?
|
||||||
|
+ until: parse_until_end_bound(until_str) then n.created_at <= ?
|
||||||
|
+ Validate since <= until; otherwise return a clear user error.
|
||||||
|
+
|
||||||
|
+#### Tests to Write First
|
||||||
|
+#[test] fn test_query_notes_invalid_time_window_rejected() { ... }
|
||||||
|
+#[test] fn test_query_notes_until_date_is_end_of_day_inclusive() { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
4. Separate semantic-change detection from housekeeping updates
|
||||||
|
Why: current proposed `WHERE` includes `updated_at`, which will cause unnecessary dirty churn. You want `last_seen_at` to always refresh, but regeneration only when searchable semantics changed.
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 0A: Upsert/Sweep for Issue Discussion Notes
|
||||||
|
- OR notes.updated_at IS NOT excluded.updated_at
|
||||||
|
+ -- updated_at-only changes should not mark semantic dirty
|
||||||
|
+
|
||||||
|
+Perform two-step logic:
|
||||||
|
+1) Upsert always updates persistence/housekeeping fields (`updated_at`, `last_seen_at`).
|
||||||
|
+2) `changed_semantics` is computed only from fields used by note documents/search filters
|
||||||
|
+ (body, note_type, resolved flags, paths, author, parent linkage).
|
||||||
|
+
|
||||||
|
+#### Tests to Write First
|
||||||
|
+#[test]
|
||||||
|
+fn test_issue_note_upsert_updated_at_only_does_not_mark_semantic_change() { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
5. Make indexes align with actual query collation and join strategy
|
||||||
|
Why: `author` uses `COLLATE NOCASE`; without collation-aware index, SQLite can skip index use. Also, IID filters via scalar subqueries are harder for planner than direct join predicates.
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1E: Composite Query Index
|
||||||
|
-CREATE INDEX ... ON notes(project_id, author_username, created_at DESC, id DESC) WHERE is_system = 0;
|
||||||
|
+CREATE INDEX ... ON notes(project_id, author_username COLLATE NOCASE, created_at DESC, id DESC) WHERE is_system = 0;
|
||||||
|
+
|
||||||
|
+CREATE INDEX IF NOT EXISTS idx_discussions_issue_id ON discussions(issue_id);
|
||||||
|
+CREATE INDEX IF NOT EXISTS idx_discussions_mr_id ON discussions(merge_request_id);
|
||||||
|
```
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1A: query_notes()
|
||||||
|
- d.issue_id = (SELECT id FROM issues WHERE iid = ? AND project_id = ?)
|
||||||
|
+ i.iid = ? AND i.project_id = ?
|
||||||
|
- d.merge_request_id = (SELECT id FROM merge_requests WHERE iid = ? AND project_id = ?)
|
||||||
|
+ m.iid = ? AND m.project_id = ?
|
||||||
|
```
|
||||||
|
|
||||||
|
6. Replace manual CSV escaping with `csv` crate
|
||||||
|
Why: manual RFC4180 escaping is fragile (quotes/newlines/multi-byte edge cases). This is exactly where a mature library reduces long-term bug risk.
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1C: Human & Robot Output Formatting
|
||||||
|
- Uses a minimal CSV writer (no external dependency — the format is simple enough for manual escaping).
|
||||||
|
+ Uses `csv::Writer` for RFC4180-compliant escaping and stable output across edge cases.
|
||||||
|
+
|
||||||
|
+#### Tests to Write First
|
||||||
|
+#[test] fn test_csv_output_multiline_and_quotes_roundtrip() { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
7. Add `--contains` lexical body filter to `lore notes`
|
||||||
|
Why: useful middle ground between metadata filtering and semantic search; great for reviewer-pattern mining without requiring FTS query syntax.
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1B: CLI Arguments & Command Wiring
|
||||||
|
+/// Filter by case-insensitive substring in note body
|
||||||
|
+#[arg(long, help_heading = "Filters")]
|
||||||
|
+pub contains: Option<String>;
|
||||||
|
```
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1A: NoteListFilters
|
||||||
|
+ pub contains: Option<&'a str>,
|
||||||
|
@@ query_notes dynamic filters
|
||||||
|
+ if contains.is_some() {
|
||||||
|
+ where_clauses.push("n.body LIKE ? COLLATE NOCASE");
|
||||||
|
+ params.push(format!("%{}%", escape_like(contains.unwrap())));
|
||||||
|
+ }
|
||||||
|
```
|
||||||
|
|
||||||
|
8. Reduce note-document embedding noise by slimming metadata header
|
||||||
|
Why: current verbose key-value header repeats low-signal tokens and consumes embedding budget. Keep context, but bias tokens toward actual review text.
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 2C: Note Document Extractor
|
||||||
|
- Build content with structured metadata header:
|
||||||
|
- [[Note]]
|
||||||
|
- source_type: note
|
||||||
|
- note_gitlab_id: ...
|
||||||
|
- project: ...
|
||||||
|
- ...
|
||||||
|
- --- Body ---
|
||||||
|
- {body}
|
||||||
|
+ Build content with compact, high-signal layout:
|
||||||
|
+ [[Note]]
|
||||||
|
+ @{author} on {Issue#|MR!}{iid} in {project_path}
|
||||||
|
+ path: {path:line} (only when available)
|
||||||
|
+ state: {resolved|unresolved} (only when resolvable)
|
||||||
|
+
|
||||||
|
+ {body}
|
||||||
|
+
|
||||||
|
+Keep detailed metadata in structured document columns/labels/paths/url,
|
||||||
|
+not repeated in verbose text.
|
||||||
|
```
|
||||||
|
|
||||||
|
9. Add explicit performance regression checks for the new hot paths
|
||||||
|
Why: this feature increases document volume ~4x; you should pin acceptable query behavior now so future changes don’t silently degrade.
|
||||||
|
```diff
|
||||||
|
@@ Verification Checklist
|
||||||
|
+Performance/plan checks:
|
||||||
|
+1) `EXPLAIN QUERY PLAN` for:
|
||||||
|
+ - author+since query
|
||||||
|
+ - project+date query
|
||||||
|
+ - for-mr / for-issue query
|
||||||
|
+2) Seed 50k-note synthetic fixture and assert:
|
||||||
|
+ - `lore notes --author ... --limit 100` stays under agreed local threshold
|
||||||
|
+ - `lore search --type note ...` remains deterministic and completes successfully
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want, I can also provide a fully merged “iteration 3” PRD text with these edits applied end-to-end so you can drop it in directly.
|
||||||
187
docs/prd-per-note-search.feedback-4.md
Normal file
187
docs/prd-per-note-search.feedback-4.md
Normal file
@@ -0,0 +1,187 @@
|
|||||||
|
1. **Canonical note identity for documents: use `notes.gitlab_id` as `source_id`**
|
||||||
|
Why this is better: the current plan still couples document identity to local row IDs. Even with upsert+sweep, local IDs are a storage artifact and can be reused in edge cases. Using GitLab note IDs as canonical document IDs makes regeneration, backfill, and deletion propagation more stable and portable.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
--- a/PRD.md
|
||||||
|
+++ b/PRD.md
|
||||||
|
@@ Phase 0: Stable Note Identity
|
||||||
|
-Phase 2 depends on `notes.id` as the `source_id` for note documents.
|
||||||
|
+Phase 2 uses `notes.gitlab_id` as the `source_id` for note documents.
|
||||||
|
+`notes.id` remains an internal relational key only.
|
||||||
|
|
||||||
|
@@ Work Chunk 0A
|
||||||
|
pub struct NoteUpsertOutcome {
|
||||||
|
pub local_note_id: i64,
|
||||||
|
+ pub document_source_id: i64, // notes.gitlab_id
|
||||||
|
pub changed_semantics: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ Work Chunk 2D
|
||||||
|
-if !note.is_system && outcome.changed_semantics {
|
||||||
|
- dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
|
||||||
|
+if !note.is_system && outcome.changed_semantics {
|
||||||
|
+ dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.document_source_id)?;
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ Work Chunk 2E
|
||||||
|
-SELECT 'note', n.id, ?1
|
||||||
|
+SELECT 'note', n.gitlab_id, ?1
|
||||||
|
|
||||||
|
@@ Work Chunk 2H
|
||||||
|
-ON d.source_type = 'note' AND d.source_id = n.id
|
||||||
|
+ON d.source_type = 'note' AND d.source_id = n.gitlab_id
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Prevent false deletions on partial/incomplete syncs**
|
||||||
|
Why this is better: sweep-based deletion is correct only when a discussion’s notes were fully fetched. If a page fails mid-fetch, current logic can incorrectly delete valid notes. Add an explicit “fetch complete” guard before sweep.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
--- a/PRD.md
|
||||||
|
+++ b/PRD.md
|
||||||
|
@@ Phase 0
|
||||||
|
+### Work Chunk 0C: Sweep Safety Guard (Partial Fetch Protection)
|
||||||
|
+
|
||||||
|
+Only run stale-note sweep when note pagination completed successfully for that discussion.
|
||||||
|
+If fetch is partial/interrupted, skip sweep and keep prior notes intact.
|
||||||
|
|
||||||
|
+#### Tests to Write First
|
||||||
|
+#[test]
|
||||||
|
+fn test_partial_fetch_does_not_sweep_notes() { /* ... */ }
|
||||||
|
+
|
||||||
|
+#[test]
|
||||||
|
+fn test_complete_fetch_runs_sweep_notes() { /* ... */ }
|
||||||
|
|
||||||
|
+#### Implementation
|
||||||
|
+if discussion_fetch_complete {
|
||||||
|
+ sweep_stale_issue_notes(...)?;
|
||||||
|
+} else {
|
||||||
|
+ tracing::warn!("Skipping stale sweep for discussion {} due to partial fetch", discussion_gitlab_id);
|
||||||
|
+}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Make deletion propagation set-based (not per-note loop)**
|
||||||
|
Why this is better: the current per-note DELETE loop is O(N) statements and gets slow on large threads. A temp-table/CTE set-based delete is faster, simpler to reason about, and remains atomic.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
--- a/PRD.md
|
||||||
|
+++ b/PRD.md
|
||||||
|
@@ Work Chunk 0B Implementation
|
||||||
|
- for note_id in stale_note_ids {
|
||||||
|
- conn.execute("DELETE FROM documents WHERE source_type = 'note' AND source_id = ?", [note_id])?;
|
||||||
|
- conn.execute("DELETE FROM dirty_sources WHERE source_type = 'note' AND source_id = ?", [note_id])?;
|
||||||
|
- }
|
||||||
|
+ CREATE TEMP TABLE _stale_note_source_ids(source_id INTEGER PRIMARY KEY) WITHOUT ROWID;
|
||||||
|
+ INSERT INTO _stale_note_source_ids
|
||||||
|
+ SELECT gitlab_id
|
||||||
|
+ FROM notes
|
||||||
|
+ WHERE discussion_id = ? AND last_seen_at < ? AND is_system = 0;
|
||||||
|
+
|
||||||
|
+ DELETE FROM notes
|
||||||
|
+ WHERE discussion_id = ? AND last_seen_at < ?;
|
||||||
|
+
|
||||||
|
+ DELETE FROM documents
|
||||||
|
+ WHERE source_type = 'note'
|
||||||
|
+ AND source_id IN (SELECT source_id FROM _stale_note_source_ids);
|
||||||
|
+
|
||||||
|
+ DELETE FROM dirty_sources
|
||||||
|
+ WHERE source_type = 'note'
|
||||||
|
+ AND source_id IN (SELECT source_id FROM _stale_note_source_ids);
|
||||||
|
+
|
||||||
|
+ DROP TABLE _stale_note_source_ids;
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Fix project-scoping and time-window semantics in `lore notes`**
|
||||||
|
Why this is better: the plan currently has a contradiction: clap `requires = "project"` blocks use of `defaultProject`, while query layer says default fallback is allowed. Also, `since/until` parsing should use one shared “now” to avoid subtle drift and inverted windows.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
--- a/PRD.md
|
||||||
|
+++ b/PRD.md
|
||||||
|
@@ Work Chunk 1B NotesArgs
|
||||||
|
-#[arg(long = "for-issue", ..., requires = "project")]
|
||||||
|
+#[arg(long = "for-issue", ...)]
|
||||||
|
pub for_issue: Option<i64>;
|
||||||
|
|
||||||
|
-#[arg(long = "for-mr", ..., requires = "project")]
|
||||||
|
+#[arg(long = "for-mr", ...)]
|
||||||
|
pub for_mr: Option<i64>;
|
||||||
|
|
||||||
|
@@ Work Chunk 1A Query Notes
|
||||||
|
-- `since`: `parse_since(since_str)` then `n.created_at >= ?`
|
||||||
|
-- `until`: `parse_since(until_str)` then `n.created_at <= ?`
|
||||||
|
+- Parse `since` and `until` with a single anchored `now_ms` captured once per command.
|
||||||
|
+- If user supplies `YYYY-MM-DD` for `--until`, interpret as end-of-day (23:59:59.999 UTC).
|
||||||
|
+- Validate `since <= until` after both parse with same anchor.
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Add an analytics mode (not a profile command): `lore notes --aggregate`**
|
||||||
|
Why this is better: this directly supports the stated use case (review patterns) without introducing the rejected “profile report” command. It keeps scope narrow and reuses existing filters.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
--- a/PRD.md
|
||||||
|
+++ b/PRD.md
|
||||||
|
@@ Phase 1
|
||||||
|
+### Work Chunk 1F: Aggregation Mode for Notes Listing
|
||||||
|
+
|
||||||
|
+Add optional aggregation on top of `lore notes`:
|
||||||
|
+- `--aggregate author|note_type|path|resolution`
|
||||||
|
+- `--top N` (default 20)
|
||||||
|
+
|
||||||
|
+Behavior:
|
||||||
|
+- Reuses all existing filters (`--since`, `--project`, `--for-mr`, etc.)
|
||||||
|
+- Returns grouped counts (+ percentage of filtered corpus)
|
||||||
|
+- Works in table/json/jsonl/csv
|
||||||
|
+
|
||||||
|
+Non-goal alignment:
|
||||||
|
+- This is not a narrative “reviewer profile” command.
|
||||||
|
+- It is a query primitive for downstream analysis.
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Prevent note backfill from starving other document regeneration**
|
||||||
|
Why this is better: after migration/backfill, note dirty entries can dominate the queue and delay issue/MR/discussion updates. Add source-type fairness in regenerator scheduling.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
--- a/PRD.md
|
||||||
|
+++ b/PRD.md
|
||||||
|
@@ Work Chunk 2D
|
||||||
|
+#### Scheduling Revision
|
||||||
|
+Process dirty sources with weighted fairness instead of strict FIFO:
|
||||||
|
+- issue: 3
|
||||||
|
+- merge_request: 3
|
||||||
|
+- discussion: 2
|
||||||
|
+- note: 1
|
||||||
|
+
|
||||||
|
+Implementation sketch:
|
||||||
|
+- fetch next batch by source_type buckets
|
||||||
|
+- interleave according to weights
|
||||||
|
+- preserve retry semantics per source
|
||||||
|
|
||||||
|
+#### Tests to Write First
|
||||||
|
+#[test]
|
||||||
|
+fn test_note_backfill_does_not_starve_issue_and_mr_regeneration() { /* ... */ }
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Harden migration 023: remove invalid SQL assertions and move integrity checks to tests**
|
||||||
|
Why this is better: `RAISE(ABORT, ...)` in standalone `SELECT` is not valid SQLite usage outside triggers/check expressions. Keep migration SQL minimal/portable and enforce invariants in migration tests.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
--- a/PRD.md
|
||||||
|
+++ b/PRD.md
|
||||||
|
@@ Work Chunk 2A Migration SQL
|
||||||
|
--- Step 10: Integrity verification
|
||||||
|
-SELECT CASE
|
||||||
|
- WHEN ... THEN RAISE(ABORT, '...')
|
||||||
|
-END;
|
||||||
|
+-- Step 10 removed from SQL migration.
|
||||||
|
+-- Integrity verification is enforced in migration tests:
|
||||||
|
+-- 1) pre/post row-count equality
|
||||||
|
+-- 2) `PRAGMA foreign_key_check` is empty
|
||||||
|
+-- 3) documents_fts row count matches documents row count after rebuild
|
||||||
|
|
||||||
|
@@ Work Chunk 2A Tests
|
||||||
|
+#[test]
|
||||||
|
+fn test_migration_023_integrity_checks_pass() {
|
||||||
|
+ // pre/post counts, foreign_key_check empty, fts parity
|
||||||
|
+}
|
||||||
|
```
|
||||||
|
|
||||||
|
These 7 revisions improve correctness under failure, reduce churn risk, improve large-sync performance, and make the feature materially more useful for reviewer-analysis workflows without reintroducing any rejected recommendations.
|
||||||
190
docs/prd-per-note-search.feedback-5.md
Normal file
190
docs/prd-per-note-search.feedback-5.md
Normal file
@@ -0,0 +1,190 @@
|
|||||||
|
Here are the highest-impact revisions I’d make. None of these repeat anything in your `## Rejected Recommendations`.
|
||||||
|
|
||||||
|
1. **Add immutable reviewer identity (`author_id`) as a first-class key**
|
||||||
|
Why this improves the plan: the PRD’s core use case is year-scale reviewer profiling. Usernames are mutable in GitLab, so username-only filtering will fragment one reviewer into multiple identities over time. Adding `author_id` closes that correctness hole and makes historical analysis reliable.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Problem Statement
|
||||||
|
-1. **Query individual notes by author** — the `--author` filter on `lore search` only matches the first note's author per discussion thread
|
||||||
|
+1. **Query individual notes by reviewer identity** — support both mutable username and immutable GitLab `author_id` for stable longitudinal analysis
|
||||||
|
|
||||||
|
@@ Phase 0: Stable Note Identity
|
||||||
|
+### Work Chunk 0D: Immutable Author Identity Capture
|
||||||
|
+**Files:** `migrations/025_notes_author_id.sql`, `src/ingestion/discussions.rs`, `src/ingestion/mr_discussions.rs`, `src/cli/commands/list.rs`
|
||||||
|
+
|
||||||
|
+#### Implementation
|
||||||
|
+- Add nullable `notes.author_id INTEGER` and backfill from future syncs.
|
||||||
|
+- Populate `author_id` from GitLab note payload (`note.author.id`) on both issue and MR note ingestion paths.
|
||||||
|
+- Add `--author-id <int>` filter to `lore notes`.
|
||||||
|
+- Keep `--author` for ergonomics; when both provided, require both to match.
|
||||||
|
+
|
||||||
|
+#### Indexing
|
||||||
|
+- Add `idx_notes_author_id_created ON notes(project_id, author_id, created_at DESC, id DESC) WHERE is_system = 0;`
|
||||||
|
+
|
||||||
|
+#### Tests
|
||||||
|
+- `test_query_notes_filter_author_id_survives_username_change`
|
||||||
|
+- `test_query_notes_author_and_author_id_intersection`
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Strengthen partial-fetch safety from a boolean to an explicit fetch state contract**
|
||||||
|
Why this improves the plan: `fetch_complete: bool` is easy to misuse and fragile under retries/crashes. A run-scoped state model makes sweep correctness auditable and prevents accidental deletions when ingestion aborts midway.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Phase 0: Stable Note Identity
|
||||||
|
-### Work Chunk 0C: Sweep Safety Guard (Partial Fetch Protection)
|
||||||
|
+### Work Chunk 0C: Sweep Safety Guard with Run-Scoped Fetch State
|
||||||
|
|
||||||
|
@@ Implementation
|
||||||
|
-Add a `fetch_complete` parameter to the discussion ingestion functions. Only run the stale-note sweep when the fetch completed successfully:
|
||||||
|
+Add a run-scoped fetch state:
|
||||||
|
+- `FetchState::Complete`
|
||||||
|
+- `FetchState::Partial`
|
||||||
|
+- `FetchState::Failed`
|
||||||
|
+
|
||||||
|
+Only run sweep on `FetchState::Complete`.
|
||||||
|
+Persist `run_seen_at` once per sync run and pass unchanged through all discussion/note upserts.
|
||||||
|
+Require `run_seen_at` monotonicity per discussion before sweep (skip and warn otherwise).
|
||||||
|
|
||||||
|
@@ Tests to Write First
|
||||||
|
+#[test]
|
||||||
|
+fn test_failed_fetch_never_sweeps_even_after_partial_upserts() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_non_monotonic_run_seen_at_skips_sweep() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_retry_after_failed_fetch_then_complete_sweeps_correctly() { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Add DB-level cleanup triggers for note-document referential integrity**
|
||||||
|
Why this improves the plan: Work Chunk 0B handles the sweep path, but not every possible delete path. DB triggers give defense-in-depth so stale note docs cannot survive even if a future code path deletes notes differently.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 0B: Immediate Deletion Propagation
|
||||||
|
-Update both sweep functions to propagate deletion to documents and dirty_sources using set-based SQL
|
||||||
|
+Keep set-based SQL in sweep functions, and add DB-level cleanup triggers as a safety net.
|
||||||
|
|
||||||
|
@@ Work Chunk 2A: Schema Migration (023)
|
||||||
|
+-- Cleanup trigger: deleting a non-system note must delete note document + dirty queue row
|
||||||
|
+CREATE TRIGGER notes_ad_cleanup AFTER DELETE ON notes
|
||||||
|
+WHEN old.is_system = 0
|
||||||
|
+BEGIN
|
||||||
|
+ DELETE FROM documents
|
||||||
|
+ WHERE source_type = 'note' AND source_id = old.id;
|
||||||
|
+ DELETE FROM dirty_sources
|
||||||
|
+ WHERE source_type = 'note' AND source_id = old.id;
|
||||||
|
+END;
|
||||||
|
+
|
||||||
|
+-- Cleanup trigger: if note flips to system, remove its document artifacts
|
||||||
|
+CREATE TRIGGER notes_au_system_cleanup AFTER UPDATE OF is_system ON notes
|
||||||
|
+WHEN old.is_system = 0 AND new.is_system = 1
|
||||||
|
+BEGIN
|
||||||
|
+ DELETE FROM documents
|
||||||
|
+ WHERE source_type = 'note' AND source_id = new.id;
|
||||||
|
+ DELETE FROM dirty_sources
|
||||||
|
+ WHERE source_type = 'note' AND source_id = new.id;
|
||||||
|
+END;
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Eliminate N+1 extraction cost with parent metadata caching in regeneration**
|
||||||
|
Why this improves the plan: backfilling ~8k notes with per-note parent/label lookups creates avoidable query amplification. Batch caching turns repeated joins into one-time lookups per parent entity and materially reduces rebuild time.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Phase 2: Per-Note Documents
|
||||||
|
+### Work Chunk 2I: Batch Parent Metadata Cache for Note Regeneration
|
||||||
|
+**Files:** `src/documents/regenerator.rs`, `src/documents/extractor.rs`
|
||||||
|
+
|
||||||
|
+#### Implementation
|
||||||
|
+- Add `NoteExtractionContext` cache keyed by `(noteable_type, parent_id)` containing:
|
||||||
|
+ - parent iid/title/url
|
||||||
|
+ - parent labels
|
||||||
|
+ - project path
|
||||||
|
+- In batch regeneration, prefetch parent metadata for note IDs in the current chunk.
|
||||||
|
+- Use cached metadata in `extract_note_document()` to avoid repeated parent/label queries.
|
||||||
|
+
|
||||||
|
+#### Tests
|
||||||
|
+- `test_note_regeneration_uses_parent_cache_consistently`
|
||||||
|
+- `test_note_regeneration_cache_hit_preserves_hash_determinism`
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Add embedding dedup cache keyed by semantic text hash**
|
||||||
|
Why this improves the plan: note docs will contain repeated short comments (“LGTM”, “nit: …”). Current doc-level hashing includes metadata, so identical semantic comments still re-embed many times. A semantic embedding hash cache cuts cost and speeds full rebuild/backfill without changing search behavior.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Phase 2: Per-Note Documents
|
||||||
|
+### Work Chunk 2J: Semantic Embedding Dedup for Notes
|
||||||
|
+**Files:** `migrations/026_embedding_cache.sql`, embedding pipeline module(s), `src/documents/extractor.rs`
|
||||||
|
+
|
||||||
|
+#### Implementation
|
||||||
|
+- Compute `embedding_text` for notes as: normalized note body + compact stable context (`parent_type`, `path`, `resolution`), excluding volatile fields.
|
||||||
|
+- Compute `embedding_hash = sha256(embedding_text)`.
|
||||||
|
+- Before embedding generation, lookup existing vector by `(model, embedding_hash)`.
|
||||||
|
+- Reuse cached vector when present; only call embedding model on misses.
|
||||||
|
+
|
||||||
|
+#### Tests
|
||||||
|
+- `test_identical_note_bodies_reuse_embedding_vector`
|
||||||
|
+- `test_embedding_hash_changes_when_semantic_context_changes`
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Add deterministic review-signal tags as derived labels**
|
||||||
|
Why this improves the plan: this makes output immediately more useful for reviewer-pattern analysis without adding a profile command (which is explicitly out of scope). It increases practical value of both `lore notes` and `lore search --type note` with low complexity.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Non-Goals
|
||||||
|
-- Adding a "reviewer profile" report command (that's a downstream use case built on this infrastructure)
|
||||||
|
+- Adding a "reviewer profile" report command (downstream), while allowing low-level derived signal tags as indexing primitives
|
||||||
|
|
||||||
|
@@ Phase 2: Per-Note Documents
|
||||||
|
+### Work Chunk 2K: Derived Review Signal Labels
|
||||||
|
+**Files:** `src/documents/extractor.rs`
|
||||||
|
+
|
||||||
|
+#### Implementation
|
||||||
|
+- Derive deterministic labels from note text + metadata:
|
||||||
|
+ - `signal:nit`
|
||||||
|
+ - `signal:blocking`
|
||||||
|
+ - `signal:security`
|
||||||
|
+ - `signal:performance`
|
||||||
|
+ - `signal:testing`
|
||||||
|
+- Attach via existing `document_labels` flow for note documents.
|
||||||
|
+- No new CLI mode required; existing label filters can consume these labels.
|
||||||
|
+
|
||||||
|
+#### Tests
|
||||||
|
+- `test_note_document_derives_signal_labels_nit`
|
||||||
|
+- `test_note_document_derives_signal_labels_security`
|
||||||
|
+- `test_signal_label_derivation_is_deterministic`
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Add high-precision note targeting filters (`--note-id`, `--gitlab-note-id`, `--discussion-id`)**
|
||||||
|
Why this improves the plan: debugging, incident response, and reproducibility all benefit from exact addressing. This is especially useful when validating sync correctness and cross-checking a specific note/document lifecycle.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1B: CLI Arguments & Command Wiring
|
||||||
|
pub struct NotesArgs {
|
||||||
|
+ /// Filter by local note row id
|
||||||
|
+ #[arg(long = "note-id", help_heading = "Filters")]
|
||||||
|
+ pub note_id: Option<i64>,
|
||||||
|
+
|
||||||
|
+ /// Filter by GitLab note id
|
||||||
|
+ #[arg(long = "gitlab-note-id", help_heading = "Filters")]
|
||||||
|
+ pub gitlab_note_id: Option<i64>,
|
||||||
|
+
|
||||||
|
+ /// Filter by local discussion id
|
||||||
|
+ #[arg(long = "discussion-id", help_heading = "Filters")]
|
||||||
|
+ pub discussion_id: Option<i64>,
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ Work Chunk 1A: Filter struct
|
||||||
|
pub struct NoteListFilters<'a> {
|
||||||
|
+ pub note_id: Option<i64>,
|
||||||
|
+ pub gitlab_note_id: Option<i64>,
|
||||||
|
+ pub discussion_id: Option<i64>,
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ Tests to Write First
|
||||||
|
+#[test]
|
||||||
|
+fn test_query_notes_filter_note_id_exact() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_query_notes_filter_gitlab_note_id_exact() { ... }
|
||||||
|
+#[test]
|
||||||
|
+fn test_query_notes_filter_discussion_id_exact() { ... }
|
||||||
|
```
|
||||||
|
|
||||||
|
If you want, I can produce a single consolidated “iteration 5” PRD diff that merges these into your exact section ordering and updates the dependency graph/migration numbering end-to-end.
|
||||||
131
docs/prd-per-note-search.feedback-6.md
Normal file
131
docs/prd-per-note-search.feedback-6.md
Normal file
@@ -0,0 +1,131 @@
|
|||||||
|
1. **Make immutable identity usable now (`--author-id`)**
|
||||||
|
Why: The plan captures `author_id` but intentionally defers using it, so the core longitudinal-analysis problem is only half-fixed.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Phase 1: `lore notes` Command / Work Chunk 1A
|
||||||
|
pub struct NoteListFilters<'a> {
|
||||||
|
+ pub author_id: Option<i64>, // immutable identity filter
|
||||||
|
@@
|
||||||
|
- pub author: Option<&'a str>, // case-insensitive match via COLLATE NOCASE
|
||||||
|
+ pub author: Option<&'a str>, // display-name filter
|
||||||
|
+ // If both author and author_id are provided, apply both (AND) for precision.
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
Filter mappings:
|
||||||
|
+ - `author_id`: `n.author_id = ?` (exact immutable identity)
|
||||||
|
- `author`: strip `@` prefix, `n.author_username = ? COLLATE NOCASE`
|
||||||
|
@@ Phase 1 / Work Chunk 1B (CLI)
|
||||||
|
+ /// Filter by immutable author id
|
||||||
|
+ #[arg(long = "author-id", help_heading = "Filters")]
|
||||||
|
+ pub author_id: Option<i64>,
|
||||||
|
@@ Phase 2 / Work Chunk 2F
|
||||||
|
+ Add `--author-id` support to `lore search` filtering for note documents.
|
||||||
|
@@ Phase 1 / Work Chunk 1E
|
||||||
|
+ CREATE INDEX IF NOT EXISTS idx_notes_project_author_id_created
|
||||||
|
+ ON notes(project_id, author_id, created_at DESC, id DESC)
|
||||||
|
+ WHERE is_system = 0 AND author_id IS NOT NULL;
|
||||||
|
```
|
||||||
|
|
||||||
|
2. **Fix document staleness on username changes**
|
||||||
|
Why: Current plan says username changes are “not semantic,” but note documents include username in content/title, so docs go stale/inconsistent.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 0D: Immutable Author Identity Capture
|
||||||
|
- Assert: changed_semantics = false (username change is not a semantic change for documents)
|
||||||
|
+ Assert: changed_semantics = true (username affects note document content/title)
|
||||||
|
@@ Work Chunk 0A: semantic-change detection
|
||||||
|
- old_body != body || old_note_type != note_type || ...
|
||||||
|
+ old_body != body || old_note_type != note_type || ...
|
||||||
|
+ || old_author_username != author_username
|
||||||
|
@@ Work Chunk 2C: Note Document Extractor header
|
||||||
|
author: @{author}
|
||||||
|
+ author_id: {author_id}
|
||||||
|
```
|
||||||
|
|
||||||
|
3. **Replace `last_seen_at` sweep marker with monotonic `sync_run_id`**
|
||||||
|
Why: Timestamp markers are vulnerable to clock skew and concurrent runs; run IDs are deterministic and safer.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Phase 0: Stable Note Identity
|
||||||
|
+ ### Work Chunk 0E: Monotonic Run Marker
|
||||||
|
+ Add `sync_runs` table and `notes.last_seen_run_id`.
|
||||||
|
+ Ingest assigns one run_id per sync transaction.
|
||||||
|
+ Upsert sets `last_seen_run_id = current_run_id`.
|
||||||
|
+ Sweep condition becomes `last_seen_run_id < current_run_id` (when fetch_complete=true).
|
||||||
|
@@ Work Chunk 0C
|
||||||
|
- fetch_complete + last_seen_at-based sweep
|
||||||
|
+ fetch_complete + run_id-based sweep
|
||||||
|
```
|
||||||
|
|
||||||
|
4. **Materialize stale-note set once during sweep**
|
||||||
|
Why: Current set-based SQL still re-runs the stale subquery 3 times; materializing once improves performance and guarantees identical deletion set.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 0B: Immediate Deletion Propagation
|
||||||
|
- DELETE FROM documents ... IN (SELECT id FROM notes WHERE ...);
|
||||||
|
- DELETE FROM dirty_sources ... IN (SELECT id FROM notes WHERE ...);
|
||||||
|
- DELETE FROM notes WHERE ...;
|
||||||
|
+ CREATE TEMP TABLE _stale_note_ids AS
|
||||||
|
+ SELECT id, is_system FROM notes WHERE discussion_id = ? AND last_seen_run_id < ?;
|
||||||
|
+ DELETE FROM documents
|
||||||
|
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
|
||||||
|
+ DELETE FROM dirty_sources
|
||||||
|
+ WHERE source_type='note' AND source_id IN (SELECT id FROM _stale_note_ids WHERE is_system=0);
|
||||||
|
+ DELETE FROM notes WHERE id IN (SELECT id FROM _stale_note_ids);
|
||||||
|
+ DROP TABLE _stale_note_ids;
|
||||||
|
```
|
||||||
|
|
||||||
|
5. **Move historical note backfill out of migration into resumable runtime job**
|
||||||
|
Why: Data-heavy migration can block startup and is harder to resume/recover on large DBs.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 2H
|
||||||
|
- Backfill Existing Notes After Upgrade (Migration 024)
|
||||||
|
+ Backfill Existing Notes After Upgrade (Resumable Runtime Backfill)
|
||||||
|
@@
|
||||||
|
- Files: `migrations/024_note_dirty_backfill.sql`, `src/core/db.rs`
|
||||||
|
+ Files: `src/documents/backfill.rs`, `src/cli/commands/generate_docs.rs`
|
||||||
|
@@
|
||||||
|
- INSERT INTO dirty_sources ... SELECT ... FROM notes ...
|
||||||
|
+ Introduce batched backfill API:
|
||||||
|
+ `enqueue_missing_note_documents(batch_size: usize) -> BackfillProgress`
|
||||||
|
+ invoked from `generate-docs`/`sync` until complete, resumable across runs.
|
||||||
|
```
|
||||||
|
|
||||||
|
6. **Add streaming path for large `jsonl`/`csv` note exports**
|
||||||
|
Why: Current `query_notes` materializes full result set in memory; streaming improves scalability and latency.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1A
|
||||||
|
+ Add `query_notes_stream(conn, filters, row_handler)` for forward-only row iteration.
|
||||||
|
@@ Work Chunk 1C
|
||||||
|
- print_list_notes_jsonl(&result)
|
||||||
|
- print_list_notes_csv(&result)
|
||||||
|
+ print_list_notes_jsonl_stream(config, filters)
|
||||||
|
+ print_list_notes_csv_stream(config, filters)
|
||||||
|
+ (table/json keep counted buffered path)
|
||||||
|
```
|
||||||
|
|
||||||
|
7. **Add index for path-centric note queries**
|
||||||
|
Why: `--path` + project/date queries are a stated hot path and not fully covered by current proposed indexes.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Work Chunk 1E: Composite Query Index
|
||||||
|
+ CREATE INDEX IF NOT EXISTS idx_notes_project_path_created
|
||||||
|
+ ON notes(project_id, position_new_path, created_at DESC, id DESC)
|
||||||
|
+ WHERE is_system = 0 AND position_new_path IS NOT NULL;
|
||||||
|
```
|
||||||
|
|
||||||
|
8. **Add property/invariant tests (not only examples)**
|
||||||
|
Why: This feature touches ingestion identity, sweeping, deletion propagation, and document regeneration; randomized invariants will catch subtle regressions.
|
||||||
|
|
||||||
|
```diff
|
||||||
|
@@ Verification Checklist
|
||||||
|
+ Add property tests (proptest):
|
||||||
|
+ - stable local IDs across randomized re-sync orderings
|
||||||
|
+ - no orphan `documents(source_type='note')` after randomized deletions/sweeps
|
||||||
|
+ - partial-fetch runs never reduce note count
|
||||||
|
+ - repeated full rebuild converges (fixed-point idempotence)
|
||||||
|
```
|
||||||
|
|
||||||
|
These revisions keep your existing direction, avoid all rejected items, and materially improve correctness, scale behavior, and long-term maintainability.
|
||||||
2518
docs/prd-per-note-search.md
Normal file
2518
docs/prd-per-note-search.md
Normal file
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -2,19 +2,22 @@
|
|||||||
|
|
||||||
## Overview
|
## Overview
|
||||||
|
|
||||||
Robot mode optimizes the `lore` CLI for AI agent consumption with structured JSON output, meaningful exit codes, and token-efficient responses.
|
Robot mode optimizes the `lore` CLI for AI agent consumption with compact JSON output, structured errors with machine-actionable recovery steps, meaningful exit codes, response timing metadata, field selection for token efficiency, and TTY auto-detection.
|
||||||
|
|
||||||
## Activation
|
## Activation
|
||||||
|
|
||||||
```bash
|
```bash
|
||||||
# Explicit flag
|
# Explicit flag
|
||||||
lore --robot list issues
|
lore --robot issues -n 5
|
||||||
|
|
||||||
# Auto-detection (when stdout is not a TTY)
|
# JSON shorthand
|
||||||
lore list issues | jq .
|
lore -J issues -n 5
|
||||||
|
|
||||||
# Environment variable
|
# Environment variable
|
||||||
LORE_ROBOT=true lore list issues
|
LORE_ROBOT=1 lore issues
|
||||||
|
|
||||||
|
# Auto-detection (when stdout is not a TTY)
|
||||||
|
lore issues | jq .
|
||||||
```
|
```
|
||||||
|
|
||||||
## Global Flags
|
## Global Flags
|
||||||
@@ -22,218 +25,160 @@ LORE_ROBOT=true lore list issues
|
|||||||
| Flag | Description |
|
| Flag | Description |
|
||||||
|------|-------------|
|
|------|-------------|
|
||||||
| `--robot` | Force JSON output, structured errors |
|
| `--robot` | Force JSON output, structured errors |
|
||||||
| `--quiet` | Suppress progress/spinners (implied by --robot) |
|
| `-J` / `--json` | Shorthand for `--robot` |
|
||||||
|
| `--quiet` | Suppress progress/spinners (implied by `--robot`) |
|
||||||
|
| `--fields <list>` | Select output fields for list commands |
|
||||||
|
|
||||||
## Exit Codes
|
## Response Envelope
|
||||||
|
|
||||||
| Code | ErrorCode | Meaning |
|
All commands return a consistent JSON envelope to stdout:
|
||||||
|------|-----------|---------|
|
|
||||||
| 0 | - | Success |
|
```json
|
||||||
| 1 | INTERNAL_ERROR | Unknown/internal error |
|
{"ok":true,"data":{...},"meta":{"elapsed_ms":42}}
|
||||||
| 2 | CONFIG_NOT_FOUND | Config file missing |
|
```
|
||||||
| 3 | CONFIG_INVALID | Config file malformed |
|
|
||||||
| 4 | TOKEN_NOT_SET | GitLab token not configured |
|
Key properties:
|
||||||
| 5 | GITLAB_AUTH_FAILED | Authentication failed |
|
- **Compact JSON**: Single-line output (no pretty-printing) for efficient parsing
|
||||||
| 6 | GITLAB_NOT_FOUND | Resource not found |
|
- **Uniform envelope**: Every command wraps its data in `{"ok":true,"data":{...},"meta":{...}}`
|
||||||
| 7 | GITLAB_RATE_LIMITED | Rate limited |
|
- **Timing metadata**: `meta.elapsed_ms` is present on every response (wall-clock milliseconds)
|
||||||
| 8 | GITLAB_NETWORK_ERROR | Network/connection error |
|
|
||||||
| 9 | DB_LOCKED | Database locked by another process |
|
|
||||||
| 10 | DB_ERROR | Database error |
|
|
||||||
| 11 | MIGRATION_FAILED | Migration failed |
|
|
||||||
| 12 | IO_ERROR | File I/O error |
|
|
||||||
| 13 | TRANSFORM_ERROR | Data transformation error |
|
|
||||||
|
|
||||||
## Error Output Format
|
## Error Output Format
|
||||||
|
|
||||||
When `--robot` is active, errors are JSON on stderr:
|
Errors are JSON on stderr with structured fields for programmatic handling:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{
|
||||||
"error": {
|
"error": {
|
||||||
"code": "CONFIG_NOT_FOUND",
|
"code": "CONFIG_NOT_FOUND",
|
||||||
"message": "Config file not found at ~/.config/lore/config.toml",
|
"message": "Config file not found at ~/.config/lore/config.json. Run \"lore init\" first.",
|
||||||
"suggestion": "Run 'lore init' to create configuration"
|
"suggestion": "Run 'lore init' to set up your GitLab connection.",
|
||||||
|
"actions": ["lore init"]
|
||||||
}
|
}
|
||||||
}
|
}
|
||||||
```
|
```
|
||||||
|
|
||||||
## Success Output Format
|
| Field | Type | Description |
|
||||||
|
|-------|------|-------------|
|
||||||
|
| `code` | string | Machine-readable error code (e.g., `CONFIG_NOT_FOUND`) |
|
||||||
|
| `message` | string | Human-readable error description |
|
||||||
|
| `suggestion` | string? | Recovery guidance (omitted when not applicable) |
|
||||||
|
| `actions` | string[]? | Executable shell commands for recovery (omitted when empty) |
|
||||||
|
|
||||||
All commands return consistent JSON structure:
|
### Error Actions by Code
|
||||||
|
|
||||||
|
| Error Code | Actions |
|
||||||
|
|------------|---------|
|
||||||
|
| `CONFIG_NOT_FOUND` | `["lore init"]` |
|
||||||
|
| `CONFIG_INVALID` | `["lore init --force"]` |
|
||||||
|
| `GITLAB_AUTH_FAILED` | `["export GITLAB_TOKEN=glpat-xxx", "lore auth"]` |
|
||||||
|
| `TOKEN_NOT_SET` | `["export GITLAB_TOKEN=glpat-xxx"]` |
|
||||||
|
| `OLLAMA_UNAVAILABLE` | `["ollama serve"]` |
|
||||||
|
| `OLLAMA_MODEL_NOT_FOUND` | `["ollama pull nomic-embed-text"]` |
|
||||||
|
| `DB_LOCKED` | `["lore ingest --force"]` |
|
||||||
|
| `EMBEDDING_FAILED` | `["lore embed --retry-failed"]` |
|
||||||
|
| `MIGRATION_FAILED` | `["lore migrate"]` |
|
||||||
|
| `GITLAB_NETWORK_ERROR` | `["lore doctor"]` |
|
||||||
|
|
||||||
|
## Exit Codes
|
||||||
|
|
||||||
|
| Code | ErrorCode | Meaning |
|
||||||
|
|------|-----------|---------|
|
||||||
|
| 0 | -- | Success |
|
||||||
|
| 1 | `INTERNAL_ERROR` | Unknown/internal error |
|
||||||
|
| 2 | -- | Usage error (invalid flags or arguments) |
|
||||||
|
| 3 | `CONFIG_INVALID` | Config file malformed |
|
||||||
|
| 4 | `TOKEN_NOT_SET` | GitLab token not configured |
|
||||||
|
| 5 | `GITLAB_AUTH_FAILED` | Authentication failed |
|
||||||
|
| 6 | `GITLAB_NOT_FOUND` | Resource not found |
|
||||||
|
| 7 | `GITLAB_RATE_LIMITED` | Rate limited |
|
||||||
|
| 8 | `GITLAB_NETWORK_ERROR` | Network/connection error |
|
||||||
|
| 9 | `DB_LOCKED` | Database locked by another process |
|
||||||
|
| 10 | `DB_ERROR` | Database error |
|
||||||
|
| 11 | `MIGRATION_FAILED` | Migration failed |
|
||||||
|
| 12 | `IO_ERROR` | File I/O error |
|
||||||
|
| 13 | `TRANSFORM_ERROR` | Data transformation error |
|
||||||
|
| 14 | `OLLAMA_UNAVAILABLE` | Ollama not running |
|
||||||
|
| 15 | `OLLAMA_MODEL_NOT_FOUND` | Ollama model not installed |
|
||||||
|
| 16 | `EMBEDDING_FAILED` | Embedding generation failed |
|
||||||
|
| 17 | `NOT_FOUND` | Entity does not exist locally |
|
||||||
|
| 18 | `AMBIGUOUS` | Multiple projects match (use `-p`) |
|
||||||
|
| 19 | -- | Health check failed |
|
||||||
|
| 20 | `CONFIG_NOT_FOUND` | Config file missing |
|
||||||
|
|
||||||
|
## Field Selection
|
||||||
|
|
||||||
|
The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in each item of the response array:
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Preset: ~60% fewer tokens
|
||||||
|
lore -J issues --fields minimal
|
||||||
|
|
||||||
|
# Custom field list
|
||||||
|
lore -J mrs --fields iid,title,state,draft,target_branch
|
||||||
|
```
|
||||||
|
|
||||||
|
### Presets
|
||||||
|
|
||||||
|
| Preset | Expands to |
|
||||||
|
|--------|------------|
|
||||||
|
| `minimal` | `iid`, `title`, `state`, `updated_at_iso` |
|
||||||
|
|
||||||
|
### Available Fields
|
||||||
|
|
||||||
|
**Issues**: `iid`, `title`, `state`, `author_username`, `labels`, `assignees`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `status_name`, `status_category`, `status_color`, `status_icon_name`, `status_synced_at_iso`
|
||||||
|
|
||||||
|
**MRs**: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`
|
||||||
|
|
||||||
|
Field selection applies only to list output, not to show (single-entity) output which returns full detail.
|
||||||
|
|
||||||
|
## Command Response Schemas
|
||||||
|
|
||||||
|
Every command in `lore robot-docs` includes a `response_schema` field describing the shape of its JSON response. This enables agents to understand response structures without trial-and-error.
|
||||||
|
|
||||||
|
```bash
|
||||||
|
# Get schema for a specific command
|
||||||
|
lore robot-docs | jq '.data.commands.issues.response_schema'
|
||||||
|
|
||||||
|
# Get all schemas
|
||||||
|
lore robot-docs | jq '[.data.commands | to_entries[] | select(.value.response_schema) | {(.key): .value.response_schema}] | add'
|
||||||
|
```
|
||||||
|
|
||||||
|
## Clap Error Handling
|
||||||
|
|
||||||
|
Parse errors from the argument parser emit structured JSON to stderr with semantic error codes:
|
||||||
|
|
||||||
|
| Code | Meaning |
|
||||||
|
|------|---------|
|
||||||
|
| `UNKNOWN_COMMAND` | Unrecognized subcommand (includes fuzzy suggestion) |
|
||||||
|
| `UNKNOWN_FLAG` | Unrecognized command-line flag |
|
||||||
|
| `MISSING_REQUIRED` | Required argument not provided |
|
||||||
|
| `INVALID_VALUE` | Invalid value for argument |
|
||||||
|
| `TOO_MANY_VALUES` | Too many values provided |
|
||||||
|
| `TOO_FEW_VALUES` | Too few values provided |
|
||||||
|
| `ARGUMENT_CONFLICT` | Conflicting arguments |
|
||||||
|
| `MISSING_COMMAND` | No subcommand provided |
|
||||||
|
| `HELP_REQUESTED` | Help or version flag used |
|
||||||
|
| `PARSE_ERROR` | General parse error |
|
||||||
|
|
||||||
|
Unknown commands include a fuzzy suggestion when a close match exists:
|
||||||
|
|
||||||
```json
|
```json
|
||||||
{
|
{"error":{"code":"UNKNOWN_COMMAND","message":"...","suggestion":"Did you mean 'lore issues'? Run 'lore robot-docs' for all commands"}}
|
||||||
"ok": true,
|
|
||||||
"data": { ... },
|
|
||||||
"meta": {
|
|
||||||
"count": 50,
|
|
||||||
"total": 1234,
|
|
||||||
"elapsed_ms": 45
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
## Command-Specific Output
|
## Agent Self-Discovery
|
||||||
|
|
||||||
### lore list issues --robot
|
`lore robot-docs` provides a complete manifest for agent bootstrapping:
|
||||||
|
|
||||||
```json
|
```bash
|
||||||
{
|
lore robot-docs # Pretty-printed (human-readable)
|
||||||
"ok": true,
|
lore --robot robot-docs # Compact (for parsing)
|
||||||
"data": {
|
|
||||||
"issues": [
|
|
||||||
{
|
|
||||||
"iid": 123,
|
|
||||||
"project": "group/project",
|
|
||||||
"title": "Bug in login",
|
|
||||||
"state": "opened",
|
|
||||||
"author": "username",
|
|
||||||
"assignees": ["user1"],
|
|
||||||
"labels": ["bug", "priority::high"],
|
|
||||||
"discussions": { "total": 5, "unresolved": 2 },
|
|
||||||
"updated_at": "2024-01-15T10:30:00Z",
|
|
||||||
"web_url": "https://..."
|
|
||||||
}
|
|
||||||
]
|
|
||||||
},
|
|
||||||
"meta": { "showing": 50, "total": 234 }
|
|
||||||
}
|
|
||||||
```
|
```
|
||||||
|
|
||||||
### lore show issue 123 --robot
|
The manifest includes:
|
||||||
|
- All commands with flags, examples, and response schemas
|
||||||
```json
|
- Deprecated command aliases (e.g., `list issues` -> `issues`)
|
||||||
{
|
- Exit codes with meanings
|
||||||
"ok": true,
|
- Clap error codes
|
||||||
"data": {
|
- Suggested workflows (first setup, daily sync, search, pre-flight)
|
||||||
"issue": {
|
- Activation methods (flags, env vars, TTY auto-detection)
|
||||||
"iid": 123,
|
|
||||||
"project": "group/project",
|
|
||||||
"title": "Bug in login",
|
|
||||||
"description": "Full markdown...",
|
|
||||||
"state": "opened",
|
|
||||||
"author": "username",
|
|
||||||
"created_at": "2024-01-10T08:00:00Z",
|
|
||||||
"updated_at": "2024-01-15T10:30:00Z",
|
|
||||||
"discussions": [
|
|
||||||
{
|
|
||||||
"id": "abc123",
|
|
||||||
"resolved": false,
|
|
||||||
"notes": [
|
|
||||||
{
|
|
||||||
"author": "user1",
|
|
||||||
"body": "Comment text...",
|
|
||||||
"created_at": "2024-01-11T09:00:00Z",
|
|
||||||
"system": false
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### lore ingest --type issues --robot
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"ok": true,
|
|
||||||
"data": {
|
|
||||||
"resource_type": "issues",
|
|
||||||
"projects": [
|
|
||||||
{
|
|
||||||
"path": "group/project",
|
|
||||||
"issues_synced": 45,
|
|
||||||
"discussions_synced": 123
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"totals": {
|
|
||||||
"issues": 45,
|
|
||||||
"discussions": 123
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"meta": { "elapsed_ms": 3400 }
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### lore count issues --robot
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"ok": true,
|
|
||||||
"data": {
|
|
||||||
"entity": "issues",
|
|
||||||
"count": 1234,
|
|
||||||
"breakdown": {
|
|
||||||
"opened": 456,
|
|
||||||
"closed": 778
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### lore doctor --robot
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"ok": true,
|
|
||||||
"data": {
|
|
||||||
"success": true,
|
|
||||||
"checks": {
|
|
||||||
"config": { "status": "ok", "path": "~/.config/lore/config.toml" },
|
|
||||||
"database": { "status": "ok", "version": 6 },
|
|
||||||
"gitlab": { "status": "ok", "user": "username" },
|
|
||||||
"projects": [
|
|
||||||
{ "path": "group/project", "status": "ok" }
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
### lore sync-status --robot
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"ok": true,
|
|
||||||
"data": {
|
|
||||||
"last_sync": {
|
|
||||||
"status": "completed",
|
|
||||||
"resource_type": "issues",
|
|
||||||
"started_at": "2024-01-15T10:00:00Z",
|
|
||||||
"completed_at": "2024-01-15T10:00:45Z",
|
|
||||||
"duration_ms": 45000
|
|
||||||
},
|
|
||||||
"cursors": [
|
|
||||||
{
|
|
||||||
"project": "group/project",
|
|
||||||
"resource_type": "issues",
|
|
||||||
"cursor": "2024-01-15T10:00:00Z"
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
## Implementation Plan
|
|
||||||
|
|
||||||
### Phase 1: Core Infrastructure
|
|
||||||
1. Add `--robot` global flag to Cli struct
|
|
||||||
2. Create `RobotOutput` trait for consistent JSON serialization
|
|
||||||
3. Add exit code mapping from ErrorCode
|
|
||||||
4. Implement TTY detection with `atty` crate
|
|
||||||
|
|
||||||
### Phase 2: Command Updates
|
|
||||||
1. Update all commands to check robot mode
|
|
||||||
2. Add JSON output variants for commands missing them (count, ingest, sync-status)
|
|
||||||
3. Suppress progress bars in robot mode
|
|
||||||
|
|
||||||
### Phase 3: Error Handling
|
|
||||||
1. Update main.rs error handler for robot mode
|
|
||||||
2. Add suggestion field to GiError variants
|
|
||||||
3. Emit structured JSON errors to stderr
|
|
||||||
|
|
||||||
### Phase 4: Documentation
|
|
||||||
1. Update AGENTS.md with robot mode commands
|
|
||||||
2. Add --robot examples to help text
|
|
||||||
|
|||||||
541
docs/user-journeys.md
Normal file
541
docs/user-journeys.md
Normal file
@@ -0,0 +1,541 @@
|
|||||||
|
# Lore CLI User Journeys
|
||||||
|
|
||||||
|
## Purpose
|
||||||
|
|
||||||
|
Map realistic workflows for both human users and AI agents to identify gaps in the command surface and optimization opportunities. Each journey starts with a **problem** and traces the commands needed to reach a **resolution**.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 1: Human User Flows
|
||||||
|
|
||||||
|
### H1. Morning Standup Prep
|
||||||
|
|
||||||
|
**Problem:** "What happened since yesterday? I need to know what moved before standup."
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore sync -q # Refresh data (quiet, no noise)
|
||||||
|
lore issues -s opened --since 1d # Issues that changed overnight
|
||||||
|
lore mrs -s opened --since 1d # MRs that moved
|
||||||
|
lore who @me # My current workload snapshot
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No single "activity feed" command. User runs 3 queries to get what should be one view. No `--since 1d` shorthand for "since yesterday." No `@me` alias for the authenticated user.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H2. Sprint Planning: What's Ready to Pick Up?
|
||||||
|
|
||||||
|
**Problem:** "We're planning the next sprint. What's open, unassigned, and actionable?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore issues -s opened -p myproject # All open issues
|
||||||
|
lore issues -s opened -l "ready" # Issues labeled ready
|
||||||
|
lore issues -s opened --has-due # Issues with deadlines approaching
|
||||||
|
lore count issues -p myproject # How many total?
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to filter by "unassigned" issues (missing `--no-assignee` flag). No way to sort by due date. No way to see priority/weight. Can't combine filters like "opened AND no assignee AND has due date."
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H3. Investigating a Production Incident
|
||||||
|
|
||||||
|
**Problem:** "Deploy broke prod. I need the full timeline of what changed around the deploy."
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore sync -q # Get latest
|
||||||
|
lore timeline "deploy" --since 7d # What happened around deploys
|
||||||
|
lore search "deploy" --type mr # MRs mentioning deploy
|
||||||
|
lore mrs 456 # Inspect the suspicious MR
|
||||||
|
lore who --overlap src/deploy/ # Who else touches deploy code
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Timeline is keyword-based, not event-based. Can't filter by "MRs merged in the last 24 hours" directly. No way to see which MRs were merged between two dates (release diff). Would benefit from `lore mrs -s merged --since 1d`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H4. Preparing to Review Someone's MR
|
||||||
|
|
||||||
|
**Problem:** "I was assigned to review MR !789. I need context before diving in."
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore mrs 789 # Read the MR description + discussions
|
||||||
|
lore mrs 789 -o # Open in browser for the actual diff
|
||||||
|
lore who src/features/auth/ # Who are the experts in this area?
|
||||||
|
lore search "auth refactor" --type issue # Related issues for background
|
||||||
|
lore timeline "authentication" # History of auth changes
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to see the file list touched by an MR from the CLI (data is stored in `mr_file_changes` but not surfaced). No way to link an MR back to its closing issue(s) from the MR detail view. The cross-reference data exists in `entity_references` but isn't shown in `mrs <iid>` output.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H5. Onboarding to an Unfamiliar Code Area
|
||||||
|
|
||||||
|
**Problem:** "I'm new to the team and need to understand how the billing module works."
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore search "billing" -n 20 # What exists about billing?
|
||||||
|
lore who src/billing/ # Who knows billing best?
|
||||||
|
lore timeline "billing" --depth 2 # History of billing changes
|
||||||
|
lore mrs -s merged -l billing --since 6m # Recent merged billing work
|
||||||
|
lore issues -s opened -l billing # Outstanding billing issues
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to get a "module overview" in one command. The search spans issues, MRs, and discussions but doesn't summarize by category. No way to see the most-discussed or most-referenced entities (high-signal items for understanding).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H6. Finding the Right Reviewer for My PR
|
||||||
|
|
||||||
|
**Problem:** "I'm about to submit a PR touching auth and payments. Who should review?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore who src/features/auth/ # Auth experts
|
||||||
|
lore who src/features/payments/ # Payment experts
|
||||||
|
lore who @candidate1 # Check candidate1's workload
|
||||||
|
lore who @candidate2 # Check candidate2's workload
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to query multiple paths at once (`lore who src/auth/ src/payments/`). No way to find the intersection of expertise. No workload-aware recommendation ("who knows this AND has bandwidth"). Four separate commands for what should be one decision.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H7. Understanding Why a Feature Was Built This Way
|
||||||
|
|
||||||
|
**Problem:** "This code is weird. Why was it implemented like this? What was the original discussion?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore search "feature-name rationale" # Search for decision context
|
||||||
|
lore timeline "feature-name" --depth 2 # Full history with cross-refs
|
||||||
|
lore issues 234 # Read the original issue
|
||||||
|
lore mrs 567 # Read the implementation MR
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to search within a specific issue's or MR's discussion notes. The search covers documents (titles + descriptions) but per-note search isn't available yet (PRD exists). No way to navigate "issue 234 was closed by MR 567" without manually knowing both IDs.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H8. Checking Team Workload Before Assigning Work
|
||||||
|
|
||||||
|
**Problem:** "I need to assign this urgent bug. Who has the least on their plate?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore who @alice # Alice's workload
|
||||||
|
lore who @bob # Bob's workload
|
||||||
|
lore who @carol # Carol's workload
|
||||||
|
lore who @dave # Dave's workload
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No team-level workload view. Must query each person individually. No way to list "all assignees and their open issue counts." No concept of a team roster. Would benefit from `lore who --team` or `lore workload`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H9. Preparing Release Notes
|
||||||
|
|
||||||
|
**Problem:** "We're cutting a release. I need to summarize what's in this version."
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore mrs -s merged --since 2w -p myproject # MRs merged since last release
|
||||||
|
lore issues -s closed --since 2w -p myproject # Issues closed since last release
|
||||||
|
lore mrs -s merged -l feature --since 2w # Feature MRs specifically
|
||||||
|
lore mrs -s merged -l bugfix --since 2w # Bugfix MRs
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to filter by milestone (for version-based releases). Wait -- `issues` has `-m` for milestone but `mrs` does not. No changelog generation. No "what closed between tag A and tag B." No grouping by label for release note categories.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H10. Finding and Closing Stale Issues
|
||||||
|
|
||||||
|
**Problem:** "Our backlog is bloated. Which issues haven't been touched in months?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore issues -s opened --sort updated --asc -n 50 # Oldest-updated first
|
||||||
|
# Then manually inspect each one...
|
||||||
|
lore issues 42 # Is this still relevant?
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No `--before` or `--updated-before` filter (only `--since` exists). Can sort ascending but can't filter "not updated in 90 days." No staleness indicator. No bulk operations concept.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H11. Understanding a Bug's Full History
|
||||||
|
|
||||||
|
**Problem:** "Bug #321 keeps getting reopened. I need to understand its entire lifecycle."
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore issues 321 # Read the issue
|
||||||
|
lore timeline "bug-keyword" -p myproject # Try to find timeline events
|
||||||
|
# But timeline is keyword-based, not entity-based...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to get a timeline for a specific entity by IID. `lore timeline` requires a keyword query, not an entity reference. Would benefit from `lore timeline --issue 321` or `lore timeline --mr 456` to get the event history of a specific entity directly.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H12. Identifying Who to Ask About Failing Tests
|
||||||
|
|
||||||
|
**Problem:** "CI tests are failing in `src/lib/parser.rs`. Who last touched this?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore who src/lib/parser.rs # Expert lookup
|
||||||
|
lore who --overlap src/lib/parser.rs # Who else has touched it
|
||||||
|
lore search "parser" --type mr --since 2w # Recent MRs touching parser
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Expert mode uses DiffNote analysis (code review comments), not actual file change tracking. The `mr_file_changes` table has the real data but `who` doesn't use it for attribution. Could be much more accurate with file-change-based expertise.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H13. Tracking a Feature Across Multiple MRs
|
||||||
|
|
||||||
|
**Problem:** "The 'dark mode' feature spans 5 MRs. I need to see them all together."
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore mrs -l dark-mode # MRs with the label
|
||||||
|
lore issues -l dark-mode # Related issues
|
||||||
|
lore timeline "dark mode" --depth 2 # Cross-referenced events
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Works reasonably well with labels as the grouping mechanism. But if the team didn't label consistently, there's no way to discover related MRs by content similarity. No "related items" view that combines issues + MRs + discussions for a topic.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H14. Checking if a Similar Fix Was Already Attempted
|
||||||
|
|
||||||
|
**Problem:** "Before I implement this fix, was something similar tried before?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore search "memory leak connection pool" # Semantic search
|
||||||
|
lore search "connection pool" --type mr -s all # Wait, no state filter on search
|
||||||
|
lore mrs -s closed -l bugfix # Closed bugfix MRs (coarse)
|
||||||
|
lore timeline "connection pool" # Historical context
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Search doesn't have a `--state` filter. Can't search only closed/merged items. The semantic search is powerful but can't be combined with entity state. Would benefit from `--state merged` on search to find past attempts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### H15. Reviewing Discussions That Need My Attention
|
||||||
|
|
||||||
|
**Problem:** "Which discussion threads am I involved in that are still unresolved?"
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore who --active # All active unresolved discussions
|
||||||
|
lore who --active --since 30d # Wider window
|
||||||
|
# But can't filter to "discussions I'm in"...
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** `--active` shows all unresolved discussions, not filtered by participant. No way to say "show me discussions where @me participated." No notification/mention tracking. No "my unresolved threads" view.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 2: AI Agent Flows
|
||||||
|
|
||||||
|
### A1. Context Gathering Before Code Modification
|
||||||
|
|
||||||
|
**Problem:** Agent is about to modify `src/features/auth/session.rs` and needs full context.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J health # Pre-flight check
|
||||||
|
lore -J who src/features/auth/ # Who knows this area
|
||||||
|
lore -J search "auth session" -n 10 # Related issues/MRs
|
||||||
|
lore -J mrs -s merged --since 3m -l auth # Recent auth changes
|
||||||
|
lore -J who --overlap src/features/auth/session.rs # Concurrent work risk
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to check "are there open MRs touching this file right now?" The overlap mode shows historical touches, not active branches. An agent needs to know about in-flight changes to avoid conflicts.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A2. Auto-Triaging an Incoming Issue
|
||||||
|
|
||||||
|
**Problem:** Agent receives a new issue and needs to categorize it, find related work, and suggest assignees.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J issues 999 # Read the new issue
|
||||||
|
lore -J search "$(extract_keywords)" --explain # Find similar past issues
|
||||||
|
lore -J who src/affected/path/ # Suggest experts as assignees
|
||||||
|
lore -J issues -s opened -l same-label # Check for duplicates
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to get just the description text for programmatic keyword extraction. `issues <iid>` returns full detail including discussions. Agent must parse the full response to extract the description for a secondary search. Would benefit from `--fields description` on detail view. No duplicate detection built in.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A3. Generating Sprint Status Report
|
||||||
|
|
||||||
|
**Problem:** Agent needs to produce a weekly status report for the team.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J issues -s closed --since 1w --fields minimal # Completed work
|
||||||
|
lore -J issues -s opened --status "In progress" # In-flight work
|
||||||
|
lore -J mrs -s merged --since 1w --fields minimal # Merged PRs
|
||||||
|
lore -J mrs -s opened -D --fields minimal # Open non-draft MRs
|
||||||
|
lore -J count issues # Totals
|
||||||
|
lore -J count mrs # MR totals
|
||||||
|
lore -J who --active --since 1w # Discussions needing attention
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Seven separate queries for one report. No `lore summary` or `lore report` command. No way to get "issues transitioned from X to Y this week" (state change history exists in events but isn't queryable). No velocity metric (issues closed per week trend).
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A4. Finding Relevant Prior Art Before Implementing
|
||||||
|
|
||||||
|
**Problem:** Agent is implementing a caching layer and wants to find if similar patterns exist in the codebase's GitLab history.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J search "caching" --mode hybrid -n 20 --explain
|
||||||
|
lore -J search "cache invalidation" --mode hybrid -n 10
|
||||||
|
lore -J search "redis" --mode lexical --type discussion # Exact term in discussions
|
||||||
|
lore -J timeline "cache" --since 1y # Wait, max is 1y? Let's try 12m
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to search discussion notes individually (per-note search). Discussions are aggregated into documents, so individual note-level matches are lost. The `--explain` flag helps but doesn't show which specific note matched. No `--since 1y` or `--since 12m` duration format.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A5. Building Context for PR Description
|
||||||
|
|
||||||
|
**Problem:** Agent wrote code and needs to generate a PR description that references relevant issues.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J search "feature description keywords" --type issue
|
||||||
|
lore -J issues -s opened -l feature-label --fields iid,title,web_url
|
||||||
|
# Cross-reference: which issues does this MR close?
|
||||||
|
# No command for this -- must manually scan search results
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to query the `entity_references` table directly. Agent can't ask "which issues reference MR !456" or "which issues contain 'closes #123' in their text." The data exists but isn't exposed as a query surface. Would benefit from `lore refs --mr 456` or `lore refs --issue 123`.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A6. Identifying Affected Experts for Review Assignment
|
||||||
|
|
||||||
|
**Problem:** Agent needs to automatically assign reviewers based on the files changed in an MR.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J mrs 456 # Get MR details
|
||||||
|
# Parse file paths from response... but file changes aren't in the output
|
||||||
|
lore -J who src/path/from/mr/ # Query each path
|
||||||
|
lore -J who src/another/path/ # One at a time...
|
||||||
|
lore -J who @candidate --fields minimal # Check workload
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** MR detail view (`mrs <iid>`) doesn't include the file change list from `mr_file_changes`. Agent can't programmatically extract which files an MR touches. Must fall back to GitLab API or guess from description. The `who` command doesn't accept multiple paths. No "auto-reviewer" suggestion combining expertise + availability.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A7. Incident Investigation and Timeline Reconstruction
|
||||||
|
|
||||||
|
**Problem:** Agent needs to reconstruct what happened during an outage for a postmortem.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J timeline "outage" --since 3d --depth 2 --expand-mentions
|
||||||
|
lore -J search "error 500" --since 3d
|
||||||
|
lore -J mrs -s merged --since 3d -p production-service
|
||||||
|
lore -J issues --status "In progress" -p production-service
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Timeline is keyword-seeded, which means if the outage wasn't described with that exact term, seeds may miss it. No way to seed a timeline from an entity ID (e.g., "start from issue #321 and expand outward"). No severity/priority filter. No way to correlate with merge times.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A8. Cross-Project Impact Assessment
|
||||||
|
|
||||||
|
**Problem:** Agent needs to understand how a breaking API change in project A affects projects B and C.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J search "api-endpoint-name" -p project-a
|
||||||
|
lore -J search "api-endpoint-name" -p project-b
|
||||||
|
lore -J search "api-endpoint-name" -p project-c
|
||||||
|
# Or without project filter to search everywhere:
|
||||||
|
lore -J search "api-endpoint-name" -n 50
|
||||||
|
lore -J timeline "api-endpoint-name" --depth 2
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Cross-project references in entity_references are tracked but the timeline shows unresolved references for entities not synced locally. No way to see a cross-project dependency map. Search works across projects but doesn't group results by project.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A9. Automated Stale Issue Recommendations
|
||||||
|
|
||||||
|
**Problem:** Agent runs weekly to identify issues that should be closed or re-prioritized.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J issues -s opened --sort updated --asc -n 100 # Oldest first
|
||||||
|
# For each issue, check:
|
||||||
|
lore -J issues <iid> # Read details
|
||||||
|
lore -J search "<issue title keywords>" # Any recent activity?
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No `--updated-before` filter, so agent must fetch all and filter client-side. No way to detect "issue has no assignee AND no activity in 90 days." The 100-issue limit means pagination is needed for large backlogs, but there's no cursor/offset pagination -- only `--limit`. Agent must do N+1 queries to inspect each candidate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A10. Code Review Preparation (File-Level Context)
|
||||||
|
|
||||||
|
**Problem:** Agent is reviewing MR !789 and needs to understand the history of each changed file.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J mrs 789 # Get MR details
|
||||||
|
# Can't get file list from output...
|
||||||
|
# Fall back to search by MR title keywords
|
||||||
|
lore -J search "feature-from-mr" --type mr
|
||||||
|
lore -J who src/guessed/path/ # Expertise for each file
|
||||||
|
lore -J who --overlap src/guessed/path/ # Concurrent changes
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Same as A6 -- `mr_file_changes` data isn't exposed. Agent is blind to the actual files in the MR unless it parses the description or uses the GitLab API directly. This is the single biggest gap for automated code review workflows.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A11. Building a Knowledge Graph of Entity Relationships
|
||||||
|
|
||||||
|
**Problem:** Agent wants to map how issues, MRs, and discussions are connected for a feature.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J search "feature-name" -n 30
|
||||||
|
lore -J timeline "feature-name" --depth 2 --max-entities 100
|
||||||
|
# Timeline shows expanded entities and cross-refs, but...
|
||||||
|
# No way to query entity_references directly
|
||||||
|
# No way to get "all entities that reference issue #123"
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** The `entity_references` table (closes, related, mentioned) is used internally by timeline but isn't queryable as a standalone command. Agent can't ask "what closes issue #123?" or "what does MR !456 reference?" No graph export. Would enable powerful dependency mapping.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A12. Release Readiness Assessment
|
||||||
|
|
||||||
|
**Problem:** Agent needs to verify all issues in milestone "v2.0" are closed and MRs are merged.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J issues -m "v2.0" -s opened # Any open issues in milestone?
|
||||||
|
lore -J issues -m "v2.0" -s closed # Closed issues
|
||||||
|
# MRs don't have milestone filter...
|
||||||
|
lore -J mrs -s opened -l "v2.0" # Try label as proxy
|
||||||
|
lore -J who --active -p myproject # Unresolved discussions
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** MRs don't have a `--milestone` filter (issues do). No way to check "all MRs linked to issues in milestone v2.0" -- would require joining `entity_references` with issue milestone. No release checklist concept. No way to verify "every issue in this milestone has a closing MR."
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A13. Answering "What Changed?" Between Two Points
|
||||||
|
|
||||||
|
**Problem:** Agent needs to diff project state between two dates for a stakeholder report.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J issues -s closed --since 2w --fields minimal # Recently closed
|
||||||
|
lore -J issues -s opened --since 2w --fields minimal # Recently opened
|
||||||
|
lore -J mrs -s merged --since 2w --fields minimal # Recently merged
|
||||||
|
# But no way to get "issues that CHANGED STATE" in a window
|
||||||
|
# An issue opened 3 months ago but closed yesterday won't appear in --since 2w for issues -s opened
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** `--since` filters by `updated_at`, not by "state changed at." An issue closed yesterday but created 6 months ago would appear in `issues -s closed --since 1d` (because updated_at changed), but the semantics are subtle. No explicit "state transitions in time window" query. The resource_state_events table has this data but it's not exposed as a filter.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A14. Meeting Prep: Summarize Recent Activity for a Stakeholder
|
||||||
|
|
||||||
|
**Problem:** Agent needs to prepare a 2-minute summary for a project sponsor meeting.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J count issues -p project # Current totals
|
||||||
|
lore -J count mrs -p project # MR totals
|
||||||
|
lore -J issues -s closed --since 1w -p project --fields minimal
|
||||||
|
lore -J mrs -s merged --since 1w -p project --fields minimal
|
||||||
|
lore -J issues -s opened --status "In progress" -p project
|
||||||
|
lore -J who --active -p project --since 1w
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** Six queries, same as A3. No summary/dashboard command. Agent must synthesize all responses. No trend data (is the open issue count growing or shrinking?). No "highlights" extraction.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
### A15. Determining If Work Is Safe to Start (Conflict Detection)
|
||||||
|
|
||||||
|
**Problem:** Agent is about to start work on an issue and needs to check nobody else is already working on it.
|
||||||
|
|
||||||
|
**Flow:**
|
||||||
|
```
|
||||||
|
lore -J issues 123 # Read the issue
|
||||||
|
# Check assignees from response
|
||||||
|
lore -J mrs -s opened -A other-person # Are they working on related MRs?
|
||||||
|
lore -J who --overlap src/target/path/ # Anyone actively touching these files?
|
||||||
|
lore -J search "issue-123-keywords" --type mr -s opened # Wait, search has no --state
|
||||||
|
```
|
||||||
|
|
||||||
|
**Gap identified:** No way to check "is there an open MR that closes issue #123?" -- the entity_references data exists but isn't queryable. Search doesn't support `--state` filter. No "conflict detection" or "in-flight work" check. Agent must do multiple queries and manually correlate.
|
||||||
|
|
||||||
|
---
|
||||||
|
|
||||||
|
## Part 3: Gap Summary
|
||||||
|
|
||||||
|
### Critical Gaps (high impact, blocks common workflows)
|
||||||
|
|
||||||
|
| # | Gap | Affected Flows | Suggested Command/Flag |
|
||||||
|
|---|-----|----------------|----------------------|
|
||||||
|
| 1 | **MR file changes not surfaced** | H4, A6, A10 | `lore mrs <iid> --files` or include in detail view |
|
||||||
|
| 2 | **Entity references not queryable** | H7, A5, A11, A15 | `lore refs --issue 123` / `lore refs --mr 456` |
|
||||||
|
| 3 | **Per-note search missing** | H7, A4 | `lore search --granularity note` (PRD exists) |
|
||||||
|
| 4 | **No entity-based timeline** | H11, A7 | `lore timeline --issue 321` / `lore timeline --mr 456` |
|
||||||
|
| 5 | **No @me / current-user alias** | H1, H15 | Resolve from auth token automatically |
|
||||||
|
|
||||||
|
### Important Gaps (significant friction, multiple workarounds needed)
|
||||||
|
|
||||||
|
| # | Gap | Affected Flows | Suggested Command/Flag |
|
||||||
|
|---|-----|----------------|----------------------|
|
||||||
|
| 6 | **No activity feed / summary** | H1, A3, A14 | `lore activity --since 1d` or `lore summary` |
|
||||||
|
| 7 | **No multi-path who query** | H6, A6 | `lore who src/path1/ src/path2/` |
|
||||||
|
| 8 | **No --state filter on search** | H14, A15 | `lore search --state merged` |
|
||||||
|
| 9 | **MRs missing --milestone filter** | H9, A12 | `lore mrs -m "v2.0"` |
|
||||||
|
| 10 | **No --no-assignee / --unassigned** | H2 | `lore issues --no-assignee` |
|
||||||
|
| 11 | **No --updated-before filter** | H10, A9 | `lore issues --before 90d` or `--stale 90d` |
|
||||||
|
| 12 | **No team workload view** | H8 | `lore who --team` or `lore workload` |
|
||||||
|
|
||||||
|
### Nice-to-Have Gaps (would improve agent efficiency)
|
||||||
|
|
||||||
|
| # | Gap | Affected Flows | Suggested Command/Flag |
|
||||||
|
|---|-----|----------------|----------------------|
|
||||||
|
| 13 | **No pagination/offset** | A9 | `--offset 100` for large result sets |
|
||||||
|
| 14 | **No detail --fields on show** | A2 | `lore issues 999 --fields description` |
|
||||||
|
| 15 | **No cross-project grouping** | A8 | `lore search --group-by project` |
|
||||||
|
| 16 | **No trend/velocity metrics** | A3, A14 | `lore trends issues --period week` |
|
||||||
|
| 17 | **No --for-issue on mrs** | A12, A15 | `lore mrs --closes 123` (query entity_refs) |
|
||||||
|
| 18 | **1y/12m duration not supported** | A4 | Support `1y`, `12m`, `365d` in --since |
|
||||||
|
| 19 | **No discussion participant filter** | H15 | `lore who --active --participant @me` |
|
||||||
|
| 20 | **No sort by due date** | H2 | `lore issues --sort due` |
|
||||||
434
docs/who-command-design.feedback-1.md
Normal file
434
docs/who-command-design.feedback-1.md
Normal file
@@ -0,0 +1,434 @@
|
|||||||
|
Below are the highest-leverage revisions I’d make to this plan. I’m focusing on correctness pitfalls, SQLite gotchas, query performance on 280K notes, and reducing “dynamic SQL + param juggling” complexity—without turning this into a new ingestion project.
|
||||||
|
|
||||||
|
Change 1 — Fix a hard SQLite bug in --active (GROUP_CONCAT DISTINCT + separator)
|
||||||
|
Why
|
||||||
|
|
||||||
|
SQLite does not allow GROUP_CONCAT(DISTINCT x, sep). With DISTINCT, SQLite only permits a single argument (GROUP_CONCAT(DISTINCT x)). Your current query will error at runtime in many SQLite versions.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Use a subquery that selects distinct participants, then GROUP_CONCAT with your separator.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ fn query_active(...)
|
||||||
|
- (SELECT GROUP_CONCAT(DISTINCT n.author_username, X'1F')
|
||||||
|
- FROM notes n
|
||||||
|
- WHERE n.discussion_id = d.id
|
||||||
|
- AND n.is_system = 0
|
||||||
|
- AND n.author_username IS NOT NULL) AS participants
|
||||||
|
+ (SELECT GROUP_CONCAT(username, X'1F') FROM (
|
||||||
|
+ SELECT DISTINCT n.author_username AS username
|
||||||
|
+ FROM notes n
|
||||||
|
+ WHERE n.discussion_id = d.id
|
||||||
|
+ AND n.is_system = 0
|
||||||
|
+ AND n.author_username IS NOT NULL
|
||||||
|
+ ORDER BY username
|
||||||
|
+ )) AS participants
|
||||||
|
|
||||||
|
Change 2 — Replace “contains('.') => exact file match” with segment-aware path classification
|
||||||
|
Why
|
||||||
|
|
||||||
|
path.contains('.') misclassifies directories like:
|
||||||
|
|
||||||
|
.github/workflows/
|
||||||
|
|
||||||
|
src/v1.2/auth/
|
||||||
|
|
||||||
|
It also fails the “root file” case (README.md) because your mode discriminator only treats paths as paths if they contain /.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Add explicit --path to force Expert mode (covers root files cleanly).
|
||||||
|
|
||||||
|
Classify file-vs-dir by checking last path segment for a dot, and whether the input ends with /.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ pub struct WhoArgs {
|
||||||
|
- /// Username or file path (path if contains /)
|
||||||
|
- pub target: Option<String>,
|
||||||
|
+ /// Username or file path shorthand (ambiguous for root files like README.md)
|
||||||
|
+ pub target: Option<String>,
|
||||||
|
+
|
||||||
|
+ /// Force expert mode for a file/directory path (supports root files like README.md)
|
||||||
|
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["active", "overlap", "reviews"])]
|
||||||
|
+ pub path: Option<String>,
|
||||||
|
@@ fn resolve_mode<'a>(args: &'a WhoArgs) -> Result<WhoMode<'a>> {
|
||||||
|
- if let Some(target) = &args.target {
|
||||||
|
+ if let Some(p) = &args.path {
|
||||||
|
+ return Ok(WhoMode::Expert { path: p });
|
||||||
|
+ }
|
||||||
|
+ if let Some(target) = &args.target {
|
||||||
|
let clean = target.strip_prefix('@').unwrap_or(target);
|
||||||
|
if args.reviews {
|
||||||
|
return Ok(WhoMode::Reviews { username: clean });
|
||||||
|
}
|
||||||
|
- // Disambiguation: if target contains '/', it's a file path.
|
||||||
|
- // GitLab usernames never contain '/'.
|
||||||
|
- if target.contains('/') {
|
||||||
|
+ // Disambiguation:
|
||||||
|
+ // - treat as path if it contains '/'
|
||||||
|
+ // - otherwise treat as username (root files require --path)
|
||||||
|
+ if target.contains('/') {
|
||||||
|
return Ok(WhoMode::Expert { path: target });
|
||||||
|
}
|
||||||
|
return Ok(WhoMode::Workload { username: clean });
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
And update the path pattern logic used by Expert/Overlap:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ fn query_expert(...)
|
||||||
|
- // Normalize path for LIKE matching: add trailing % if no extension
|
||||||
|
- let path_pattern = if path.contains('.') {
|
||||||
|
- path.to_string() // Exact file match
|
||||||
|
- } else {
|
||||||
|
- let trimmed = path.trim_end_matches('/');
|
||||||
|
- format!("{trimmed}/%")
|
||||||
|
- };
|
||||||
|
+ // Normalize:
|
||||||
|
+ // - if ends_with('/') => directory prefix
|
||||||
|
+ // - else if last segment contains '.' => file exact match
|
||||||
|
+ // - else => directory prefix
|
||||||
|
+ let trimmed = path.trim_end_matches('/');
|
||||||
|
+ let last = trimmed.rsplit('/').next().unwrap_or(trimmed);
|
||||||
|
+ let is_file = !path.ends_with('/') && last.contains('.');
|
||||||
|
+ let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
|
||||||
|
|
||||||
|
Change 3 — Stop building dynamic SQL strings for optional filters; always bind params
|
||||||
|
Why
|
||||||
|
|
||||||
|
Right now you’re mixing:
|
||||||
|
|
||||||
|
dynamic project_clause string fragments
|
||||||
|
|
||||||
|
ad-hoc param vectors
|
||||||
|
|
||||||
|
placeholder renumbering by branch
|
||||||
|
|
||||||
|
That’s brittle and easy to regress (especially when you add more conditions later). SQLite/rusqlite can bind Option<T> to NULL, which enables a simple pattern:
|
||||||
|
|
||||||
|
sql
|
||||||
|
Copy code
|
||||||
|
AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
|
||||||
|
Revision (representative; apply to all queries)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ fn query_expert(...)
|
||||||
|
- let project_clause = if project_id.is_some() {
|
||||||
|
- "AND n.project_id = ?3"
|
||||||
|
- } else {
|
||||||
|
- ""
|
||||||
|
- };
|
||||||
|
-
|
||||||
|
- let sql = format!(
|
||||||
|
+ let sql = format!(
|
||||||
|
"SELECT username, role, activity_count, last_active_at FROM (
|
||||||
|
@@
|
||||||
|
FROM notes n
|
||||||
|
WHERE n.position_new_path LIKE ?1
|
||||||
|
AND n.is_system = 0
|
||||||
|
AND n.author_username IS NOT NULL
|
||||||
|
AND n.created_at >= ?2
|
||||||
|
- {project_clause}
|
||||||
|
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
@@
|
||||||
|
WHERE n.position_new_path LIKE ?1
|
||||||
|
AND m.author_username IS NOT NULL
|
||||||
|
AND m.updated_at >= ?2
|
||||||
|
- {project_clause}
|
||||||
|
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
GROUP BY m.author_username
|
||||||
|
- )"
|
||||||
|
+ ) t"
|
||||||
|
);
|
||||||
|
-
|
||||||
|
- let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
|
||||||
|
- params.push(Box::new(path_pattern.clone()));
|
||||||
|
- params.push(Box::new(since_ms));
|
||||||
|
- if let Some(pid) = project_id {
|
||||||
|
- params.push(Box::new(pid));
|
||||||
|
- }
|
||||||
|
- let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
|
||||||
|
+ let param_refs = rusqlite::params![path_pattern, since_ms, project_id];
|
||||||
|
|
||||||
|
|
||||||
|
Notes:
|
||||||
|
|
||||||
|
Adds required derived-table alias t (some SQLite configurations are stricter).
|
||||||
|
|
||||||
|
Eliminates the dynamic param vector and placeholder gymnastics.
|
||||||
|
|
||||||
|
Change 4 — Filter “path touch” queries to DiffNotes and escape LIKE properly
|
||||||
|
Why
|
||||||
|
|
||||||
|
Only DiffNotes reliably have position_new_path; including other note types can skew counts and harm performance.
|
||||||
|
|
||||||
|
LIKE treats % and _ as wildcards—rare in file paths, but not impossible (generated files, templates). Escaping is a low-cost robustness win.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Add note_type='DiffNote' and LIKE ... ESCAPE '\' plus a tiny escape helper.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ fn query_expert(...)
|
||||||
|
- FROM notes n
|
||||||
|
- WHERE n.position_new_path LIKE ?1
|
||||||
|
+ FROM notes n
|
||||||
|
+ WHERE n.note_type = 'DiffNote'
|
||||||
|
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
|
||||||
|
AND n.is_system = 0
|
||||||
|
@@
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ Helper Functions
|
||||||
|
+fn escape_like(input: &str) -> String {
|
||||||
|
+ input.replace('\\', "\\\\").replace('%', "\\%").replace('_', "\\_")
|
||||||
|
+}
|
||||||
|
|
||||||
|
|
||||||
|
And when building patterns:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
- let path_pattern = if is_file { trimmed.to_string() } else { format!("{trimmed}/%") };
|
||||||
|
+ let base = escape_like(trimmed);
|
||||||
|
+ let path_pattern = if is_file { base } else { format!("{base}/%") };
|
||||||
|
|
||||||
|
|
||||||
|
Apply the same changes to query_overlap and any other position_new_path LIKE ....
|
||||||
|
|
||||||
|
Change 5 — Use note timestamps for “touch since” semantics (Expert/Overlap author branch)
|
||||||
|
Why
|
||||||
|
|
||||||
|
In Expert/Overlap “author” branches you filter by m.updated_at >= since. That answers “MR updated recently” rather than “MR touched at this path recently”, which can surface stale ownership.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Filter by the note creation time (and use it for “last touch” where relevant). You can still compute author activity, but anchor it to note activity.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ fn query_overlap(...)
|
||||||
|
- WHERE n.position_new_path LIKE ?1
|
||||||
|
+ WHERE n.note_type = 'DiffNote'
|
||||||
|
+ AND n.position_new_path LIKE ?1 ESCAPE '\'
|
||||||
|
AND m.state IN ('opened', 'merged')
|
||||||
|
AND m.author_username IS NOT NULL
|
||||||
|
- AND m.updated_at >= ?2
|
||||||
|
+ AND n.created_at >= ?2
|
||||||
|
AND (?3 IS NULL OR m.project_id = ?3)
|
||||||
|
|
||||||
|
|
||||||
|
Same idea in Expert mode’s “MR authors” branch.
|
||||||
|
|
||||||
|
Change 6 — Workload mode: apply --since consistently to unresolved discussions
|
||||||
|
Why
|
||||||
|
|
||||||
|
Workload’s unresolved discussions ignore since_ms. That makes --since partially misleading and can dump very old threads.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Filter on d.last_note_at when since_ms is set.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ fn query_workload(...)
|
||||||
|
- let disc_sql = format!(
|
||||||
|
+ let disc_since = if since_ms.is_some() {
|
||||||
|
+ "AND d.last_note_at >= ?2"
|
||||||
|
+ } else { "" };
|
||||||
|
+ let disc_sql = format!(
|
||||||
|
"SELECT d.noteable_type,
|
||||||
|
@@
|
||||||
|
WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
AND EXISTS (
|
||||||
|
@@
|
||||||
|
)
|
||||||
|
{disc_project_filter}
|
||||||
|
+ {disc_since}
|
||||||
|
ORDER BY d.last_note_at DESC
|
||||||
|
LIMIT {limit}"
|
||||||
|
);
|
||||||
|
@@
|
||||||
|
- // Rebuild params for discussion query (only username + optional project_id)
|
||||||
|
- let mut disc_params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
|
||||||
|
- disc_params.push(Box::new(username.to_string()));
|
||||||
|
- if let Some(pid) = project_id {
|
||||||
|
- disc_params.push(Box::new(pid));
|
||||||
|
- }
|
||||||
|
+ // Params: username, since_ms, project_id (NULLs ok)
|
||||||
|
+ let disc_param_refs = rusqlite::params![username, since_ms, project_id];
|
||||||
|
|
||||||
|
|
||||||
|
(If you adopt Change 3 fully, this becomes very clean.)
|
||||||
|
|
||||||
|
Change 7 — Make Overlap results represent “both roles” instead of collapsing to one
|
||||||
|
Why
|
||||||
|
|
||||||
|
Collapsing to a single role loses valuable info (“they authored and reviewed”). Also your current “prefer author” rule is arbitrary for the “who else is touching this” question.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Track role counts separately and render as A, R, or A+R.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ pub struct OverlapUser {
|
||||||
|
pub username: String,
|
||||||
|
- pub role: String,
|
||||||
|
- pub touch_count: u32,
|
||||||
|
+ pub author_touch_count: u32,
|
||||||
|
+ pub review_touch_count: u32,
|
||||||
|
+ pub touch_count: u32,
|
||||||
|
pub last_touch_at: i64,
|
||||||
|
pub mr_iids: Vec<i64>,
|
||||||
|
}
|
||||||
|
@@ fn query_overlap(...)
|
||||||
|
- let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
|
||||||
|
+ let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
|
||||||
|
username: username.clone(),
|
||||||
|
- role: role.clone(),
|
||||||
|
+ author_touch_count: 0,
|
||||||
|
+ review_touch_count: 0,
|
||||||
|
touch_count: 0,
|
||||||
|
last_touch_at: 0,
|
||||||
|
mr_iids: Vec::new(),
|
||||||
|
});
|
||||||
|
entry.touch_count += count;
|
||||||
|
+ if role == "author" { entry.author_touch_count += count; }
|
||||||
|
+ if role == "reviewer" { entry.review_touch_count += count; }
|
||||||
|
@@ human output
|
||||||
|
- println!(
|
||||||
|
- " {:<16} {:<8} {:>7} {:<12} {}",
|
||||||
|
+ println!(
|
||||||
|
+ " {:<16} {:<6} {:>7} {:<12} {}",
|
||||||
|
...
|
||||||
|
);
|
||||||
|
@@
|
||||||
|
- user.role,
|
||||||
|
+ format_roles(user.author_touch_count, user.review_touch_count),
|
||||||
|
|
||||||
|
Change 8 — Add an “Index Audit + optional migration” step (big perf win, low blast radius)
|
||||||
|
Why
|
||||||
|
|
||||||
|
With 280K notes, the path/timestamp queries will degrade quickly without indexes. This isn’t “scope creep”; it’s making the feature usable.
|
||||||
|
|
||||||
|
Revision (plan-level)
|
||||||
|
|
||||||
|
Add a non-breaking migration that only creates indexes if missing.
|
||||||
|
|
||||||
|
Optionally add a runtime check: if EXPLAIN QUERY PLAN indicates full table scan on notes, print a dim warning in human mode.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ Implementation Order
|
||||||
|
-| Step | What | Files |
|
||||||
|
+| Step | What | Files |
|
||||||
|
| 1 | CLI skeleton: `WhoArgs` + `Commands::Who` + dispatch + stub | `cli/mod.rs`, `commands/mod.rs`, `main.rs` |
|
||||||
|
+| 1.5 | Index audit + add `CREATE INDEX IF NOT EXISTS` migration for who hot paths | `migrations/0xx_who_indexes.sql` |
|
||||||
|
@@
|
||||||
|
|
||||||
|
|
||||||
|
Suggested indexes (tune names to your conventions):
|
||||||
|
|
||||||
|
notes(note_type, position_new_path, created_at)
|
||||||
|
|
||||||
|
notes(discussion_id, is_system, author_username)
|
||||||
|
|
||||||
|
discussions(resolvable, resolved, last_note_at, project_id)
|
||||||
|
|
||||||
|
merge_requests(project_id, state, updated_at, author_username)
|
||||||
|
|
||||||
|
issue_assignees(username, issue_id)
|
||||||
|
|
||||||
|
Even if SQLite can’t perfectly index LIKE, these still help with join and timestamp filters.
|
||||||
|
|
||||||
|
Change 9 — Make robot JSON reproducible by echoing the effective query inputs
|
||||||
|
Why
|
||||||
|
|
||||||
|
Agent workflows benefit from a stable “query record”: what mode ran, what path/user, resolved project, effective since, limit.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Include an input object in JSON output.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ struct WhoJsonData {
|
||||||
|
mode: String,
|
||||||
|
+ input: serde_json::Value,
|
||||||
|
#[serde(flatten)]
|
||||||
|
result: serde_json::Value,
|
||||||
|
}
|
||||||
|
@@ pub fn print_who_json(...)
|
||||||
|
- let output = WhoJsonEnvelope {
|
||||||
|
+ let input = serde_json::json!({
|
||||||
|
+ "project": /* resolved or raw args.project */,
|
||||||
|
+ "since": /* resolved since ISO */,
|
||||||
|
+ "limit": /* args.limit */,
|
||||||
|
+ });
|
||||||
|
+ let output = WhoJsonEnvelope {
|
||||||
|
ok: true,
|
||||||
|
data: WhoJsonData {
|
||||||
|
mode: mode.to_string(),
|
||||||
|
+ input,
|
||||||
|
result: data,
|
||||||
|
},
|
||||||
|
meta: RobotMeta { elapsed_ms },
|
||||||
|
};
|
||||||
|
|
||||||
|
Change 10 — Tighten clap constraints so invalid combinations never reach resolve_mode
|
||||||
|
Why
|
||||||
|
|
||||||
|
Right now conflicts are enforced manually (or not at all). Clamp the invalid combos at the CLI layer:
|
||||||
|
|
||||||
|
--active should conflict with target, --overlap, --reviews, --path
|
||||||
|
|
||||||
|
--reviews should require a username (and should conflict with Expert path modes)
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@ pub struct WhoArgs {
|
||||||
|
- pub active: bool,
|
||||||
|
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "overlap", "reviews", "path"])]
|
||||||
|
+ pub active: bool,
|
||||||
|
@@
|
||||||
|
- pub overlap: Option<String>,
|
||||||
|
+ #[arg(long, help_heading = "Mode", conflicts_with_all = ["target", "active", "reviews", "path"])]
|
||||||
|
+ pub overlap: Option<String>,
|
||||||
|
@@
|
||||||
|
- pub reviews: bool,
|
||||||
|
+ #[arg(long, help_heading = "Mode", requires = "target", conflicts_with_all = ["active", "overlap", "path"])]
|
||||||
|
+ pub reviews: bool,
|
||||||
|
|
||||||
|
Summary of what I’d definitely change
|
||||||
|
|
||||||
|
If you do nothing else, do these first:
|
||||||
|
|
||||||
|
Fix GROUP_CONCAT(DISTINCT ..., sep) in Active mode (runtime error).
|
||||||
|
|
||||||
|
Path classification: add --path, and stop using contains('.') globally.
|
||||||
|
|
||||||
|
Remove dynamic SQL + param vectors: always bind project_id as nullable and use (? IS NULL OR ...).
|
||||||
|
|
||||||
|
Filter to DiffNotes + LIKE escaping for correctness and fewer rows scanned.
|
||||||
|
|
||||||
|
Optional index migration: otherwise this will feel slow/non-deterministically slow depending on local DB state.
|
||||||
|
|
||||||
|
If you want, I can also provide a consolidated “v2 plan” as a single unified patch (one diff) rather than per-change snippets.
|
||||||
303
docs/who-command-design.feedback-2.md
Normal file
303
docs/who-command-design.feedback-2.md
Normal file
@@ -0,0 +1,303 @@
|
|||||||
|
Below are the highest-leverage revisions I’d make to iteration 1 to tighten correctness, performance, and “agent usefulness” without blowing up scope. For each change: (1) rationale, (2) a focused unified diff against the plan you pasted.
|
||||||
|
|
||||||
|
Change 1 — Make robot “input echo” actually resolved (project_id, project_path, since_ms/iso, mode)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Your Design Principle #5 says the robot envelope should echo resolved inputs (“effective since, resolved project”), but the current input object echoes only raw CLI strings. Agents can’t reliably reproduce or compare runs (e.g., fuzzy project resolution may map differently over time).
|
||||||
|
|
||||||
|
This is also a reliability improvement: “what ran” should be computed once and propagated, not recomputed in output.
|
||||||
|
|
||||||
|
Plan diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
-5. **Robot-first reproducibility.** Robot JSON output includes an `input` object echoing the resolved query parameters (effective since, resolved project, limit) so agents can trace exactly what ran.
|
||||||
|
+5. **Robot-first reproducibility.** Robot JSON output includes a `resolved_input` object (mode, since_ms + since_iso, resolved project_id + project_path, limit, db_path) so agents can trace exactly what ran.
|
||||||
|
|
||||||
|
@@
|
||||||
|
-/// Main entry point. Resolves mode from args and dispatches.
|
||||||
|
-pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoResult> {
|
||||||
|
+/// Main entry point. Resolves mode + resolved inputs once, then dispatches.
|
||||||
|
+pub fn run_who(config: &Config, args: &WhoArgs) -> Result<WhoRun> {
|
||||||
|
let db_path = get_db_path(config.storage.db_path.as_deref());
|
||||||
|
let conn = create_connection(&db_path)?;
|
||||||
|
|
||||||
|
- let project_id = args
|
||||||
|
+ let project_id = args
|
||||||
|
.project
|
||||||
|
.as_deref()
|
||||||
|
.map(|p| resolve_project(&conn, p))
|
||||||
|
.transpose()?;
|
||||||
|
+ let project_path = project_id
|
||||||
|
+ .map(|id| lookup_project_path(&conn, id))
|
||||||
|
+ .transpose()?;
|
||||||
|
|
||||||
|
let mode = resolve_mode(args)?;
|
||||||
|
|
||||||
|
match mode {
|
||||||
|
WhoMode::Expert { path } => {
|
||||||
|
let since_ms = resolve_since(args.since.as_deref(), "6m")?;
|
||||||
|
let result = query_expert(&conn, path, project_id, since_ms, args.limit)?;
|
||||||
|
- Ok(WhoResult::Expert(result))
|
||||||
|
+ Ok(WhoRun::new("expert", &db_path, project_id, project_path, since_ms, args.limit, WhoResult::Expert(result)))
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
}
|
||||||
|
}
|
||||||
|
+
|
||||||
|
+/// Wrapper that carries resolved inputs for reproducible output.
|
||||||
|
+pub struct WhoRun {
|
||||||
|
+ pub mode: String,
|
||||||
|
+ pub resolved_input: WhoResolvedInput,
|
||||||
|
+ pub result: WhoResult,
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+pub struct WhoResolvedInput {
|
||||||
|
+ pub db_path: String,
|
||||||
|
+ pub project_id: Option<i64>,
|
||||||
|
+ pub project_path: Option<String>,
|
||||||
|
+ pub since_ms: i64,
|
||||||
|
+ pub since_iso: String,
|
||||||
|
+ pub limit: usize,
|
||||||
|
+}
|
||||||
|
@@
|
||||||
|
-pub fn print_who_json(result: &WhoResult, args: &WhoArgs, elapsed_ms: u64) {
|
||||||
|
- let (mode, data) = match result {
|
||||||
|
+pub fn print_who_json(run: &WhoRun, args: &WhoArgs, elapsed_ms: u64) {
|
||||||
|
+ let (mode, data) = match &run.result {
|
||||||
|
WhoResult::Expert(r) => ("expert", expert_to_json(r)),
|
||||||
|
@@
|
||||||
|
- let input = serde_json::json!({
|
||||||
|
+ let input = serde_json::json!({
|
||||||
|
"target": args.target,
|
||||||
|
"path": args.path,
|
||||||
|
"project": args.project,
|
||||||
|
"since": args.since,
|
||||||
|
"limit": args.limit,
|
||||||
|
});
|
||||||
|
+
|
||||||
|
+ let resolved_input = serde_json::json!({
|
||||||
|
+ "mode": run.mode,
|
||||||
|
+ "db_path": run.resolved_input.db_path,
|
||||||
|
+ "project_id": run.resolved_input.project_id,
|
||||||
|
+ "project_path": run.resolved_input.project_path,
|
||||||
|
+ "since_ms": run.resolved_input.since_ms,
|
||||||
|
+ "since_iso": run.resolved_input.since_iso,
|
||||||
|
+ "limit": run.resolved_input.limit,
|
||||||
|
+ });
|
||||||
|
@@
|
||||||
|
- data: WhoJsonData {
|
||||||
|
- mode: mode.to_string(),
|
||||||
|
- input,
|
||||||
|
- result: data,
|
||||||
|
- },
|
||||||
|
+ data: WhoJsonData { mode: mode.to_string(), input, resolved_input, result: data },
|
||||||
|
meta: RobotMeta { elapsed_ms },
|
||||||
|
};
|
||||||
|
@@
|
||||||
|
struct WhoJsonData {
|
||||||
|
mode: String,
|
||||||
|
input: serde_json::Value,
|
||||||
|
+ resolved_input: serde_json::Value,
|
||||||
|
#[serde(flatten)]
|
||||||
|
result: serde_json::Value,
|
||||||
|
}
|
||||||
|
|
||||||
|
Change 2 — Remove dynamic SQL format!(..LIMIT {limit}) and parameterize LIMIT everywhere
|
||||||
|
Why
|
||||||
|
|
||||||
|
You explicitly prefer static SQL ((?N IS NULL OR ...)) to avoid subtle bugs; but Workload/Active use format! for LIMIT. Even though limit is typed, it’s an inconsistency that complicates statement caching and encourages future string assembly creep.
|
||||||
|
|
||||||
|
SQLite supports LIMIT ? with bound parameters; rusqlite can bind an i64.
|
||||||
|
|
||||||
|
Plan diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
- let issues_sql = format!(
|
||||||
|
- "SELECT ...
|
||||||
|
- ORDER BY i.updated_at DESC
|
||||||
|
- LIMIT {limit}"
|
||||||
|
- );
|
||||||
|
- let mut stmt = conn.prepare(&issues_sql)?;
|
||||||
|
+ let issues_sql =
|
||||||
|
+ "SELECT ...
|
||||||
|
+ ORDER BY i.updated_at DESC
|
||||||
|
+ LIMIT ?4";
|
||||||
|
+ let mut stmt = conn.prepare(issues_sql)?;
|
||||||
|
let assigned_issues: Vec<WorkloadIssue> = stmt
|
||||||
|
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
|
||||||
|
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
|
||||||
|
@@
|
||||||
|
- let authored_sql = format!(
|
||||||
|
- "SELECT ...
|
||||||
|
- ORDER BY m.updated_at DESC
|
||||||
|
- LIMIT {limit}"
|
||||||
|
- );
|
||||||
|
- let mut stmt = conn.prepare(&authored_sql)?;
|
||||||
|
+ let authored_sql =
|
||||||
|
+ "SELECT ...
|
||||||
|
+ ORDER BY m.updated_at DESC
|
||||||
|
+ LIMIT ?4";
|
||||||
|
+ let mut stmt = conn.prepare(authored_sql)?;
|
||||||
|
@@
|
||||||
|
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
|
||||||
|
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
|
||||||
|
@@
|
||||||
|
- let reviewing_sql = format!(
|
||||||
|
- "SELECT ...
|
||||||
|
- ORDER BY m.updated_at DESC
|
||||||
|
- LIMIT {limit}"
|
||||||
|
- );
|
||||||
|
- let mut stmt = conn.prepare(&reviewing_sql)?;
|
||||||
|
+ let reviewing_sql =
|
||||||
|
+ "SELECT ...
|
||||||
|
+ ORDER BY m.updated_at DESC
|
||||||
|
+ LIMIT ?4";
|
||||||
|
+ let mut stmt = conn.prepare(reviewing_sql)?;
|
||||||
|
@@
|
||||||
|
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
|
||||||
|
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
|
||||||
|
@@
|
||||||
|
- let disc_sql = format!(
|
||||||
|
- "SELECT ...
|
||||||
|
- ORDER BY d.last_note_at DESC
|
||||||
|
- LIMIT {limit}"
|
||||||
|
- );
|
||||||
|
- let mut stmt = conn.prepare(&disc_sql)?;
|
||||||
|
+ let disc_sql =
|
||||||
|
+ "SELECT ...
|
||||||
|
+ ORDER BY d.last_note_at DESC
|
||||||
|
+ LIMIT ?4";
|
||||||
|
+ let mut stmt = conn.prepare(disc_sql)?;
|
||||||
|
@@
|
||||||
|
- .query_map(rusqlite::params![username, project_id, since_ms], |row| {
|
||||||
|
+ .query_map(rusqlite::params![username, project_id, since_ms, limit as i64], |row| {
|
||||||
|
@@
|
||||||
|
- let sql = format!(
|
||||||
|
- "SELECT ...
|
||||||
|
- ORDER BY d.last_note_at DESC
|
||||||
|
- LIMIT {limit}"
|
||||||
|
- );
|
||||||
|
- let mut stmt = conn.prepare(&sql)?;
|
||||||
|
+ let sql =
|
||||||
|
+ "SELECT ...
|
||||||
|
+ ORDER BY d.last_note_at DESC
|
||||||
|
+ LIMIT ?3";
|
||||||
|
+ let mut stmt = conn.prepare(sql)?;
|
||||||
|
@@
|
||||||
|
- .query_map(rusqlite::params![since_ms, project_id], |row| {
|
||||||
|
+ .query_map(rusqlite::params![since_ms, project_id, limit as i64], |row| {
|
||||||
|
|
||||||
|
Change 3 — Fix path matching for dotless files (LICENSE/Makefile) via “exact OR prefix” (no new flags)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Your improved “dot only in last segment” heuristic still fails on dotless files (LICENSE, Makefile, Dockerfile) which are common, especially at repo root. Right now they’ll be treated as directories (LICENSE/%) and silently return nothing.
|
||||||
|
|
||||||
|
Best minimal UX: if user provides a path that’s ambiguous (no trailing slash), match either exact file OR directory prefix.
|
||||||
|
|
||||||
|
Plan diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
-/// Build a LIKE pattern from a user-supplied path, with proper LIKE escaping.
|
||||||
|
-///
|
||||||
|
-/// Rules:
|
||||||
|
-/// - If the path ends with `/`, it's a directory prefix → `escaped_path%`
|
||||||
|
-/// - If the last path segment contains `.`, it's a file → exact match
|
||||||
|
-/// - Otherwise, it's a directory prefix → `escaped_path/%`
|
||||||
|
+/// Build an exact + prefix match from a user-supplied path, with proper LIKE escaping.
|
||||||
|
+///
|
||||||
|
+/// Rules:
|
||||||
|
+/// - If the path ends with `/`, treat as directory-only (prefix match)
|
||||||
|
+/// - Otherwise, treat as ambiguous: exact match OR directory prefix
|
||||||
|
+/// (fixes dotless files like LICENSE/Makefile without requiring new flags)
|
||||||
|
@@
|
||||||
|
-fn build_path_pattern(path: &str) -> String {
|
||||||
|
+struct PathMatch {
|
||||||
|
+ exact: String,
|
||||||
|
+ prefix: String,
|
||||||
|
+ dir_only: bool,
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+fn build_path_match(path: &str) -> PathMatch {
|
||||||
|
let trimmed = path.trim_end_matches('/');
|
||||||
|
- let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
|
||||||
|
- let is_file = !path.ends_with('/') && last_segment.contains('.');
|
||||||
|
let escaped = escape_like(trimmed);
|
||||||
|
-
|
||||||
|
- if is_file {
|
||||||
|
- escaped
|
||||||
|
- } else {
|
||||||
|
- format!("{escaped}/%")
|
||||||
|
- }
|
||||||
|
+ PathMatch {
|
||||||
|
+ exact: escaped.clone(),
|
||||||
|
+ prefix: format!("{escaped}/%"),
|
||||||
|
+ dir_only: path.ends_with('/'),
|
||||||
|
+ }
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
- let path_pattern = build_path_pattern(path);
|
||||||
|
+ let pm = build_path_match(path);
|
||||||
|
@@
|
||||||
|
- AND n.position_new_path LIKE ?1 ESCAPE '\\'
|
||||||
|
+ AND (
|
||||||
|
+ (?4 = 1 AND n.position_new_path LIKE ?2 ESCAPE '\\')
|
||||||
|
+ OR (?4 = 0 AND (n.position_new_path = ?1 OR n.position_new_path LIKE ?2 ESCAPE '\\'))
|
||||||
|
+ )
|
||||||
|
@@
|
||||||
|
- let rows: Vec<(String, String, u32, i64)> = stmt
|
||||||
|
- .query_map(rusqlite::params![path_pattern, since_ms, project_id], |row| {
|
||||||
|
+ let rows: Vec<(String, String, u32, i64)> = stmt
|
||||||
|
+ .query_map(rusqlite::params![pm.exact, pm.prefix, since_ms, i32::from(pm.dir_only), project_id], |row| {
|
||||||
|
Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
|
||||||
|
})?
|
||||||
|
|
||||||
|
|
||||||
|
(Apply the same pattern to Overlap mode.)
|
||||||
|
|
||||||
|
Change 4 — Consistently exclude system notes in all DiffNote-based branches (Expert/Overlap author branches currently don’t)
|
||||||
|
Why
|
||||||
|
|
||||||
|
You filter n.is_system = 0 for reviewer branches, but not in the author branches of Expert/Overlap. That can skew “author touch” via system-generated diff notes or bot activity.
|
||||||
|
|
||||||
|
Consistency here improves correctness and also enables more aggressive partial indexing.
|
||||||
|
|
||||||
|
Plan diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
- WHERE n.note_type = 'DiffNote'
|
||||||
|
+ WHERE n.note_type = 'DiffNote'
|
||||||
|
AND n.position_new_path LIKE ?1 ESCAPE '\\'
|
||||||
|
+ AND n.is_system = 0
|
||||||
|
AND m.author_username IS NOT NULL
|
||||||
|
AND n.created_at >= ?2
|
||||||
|
AND (?3 IS NULL OR m.project_id = ?3)
|
||||||
|
@@
|
||||||
|
- WHERE n.note_type = 'DiffNote'
|
||||||
|
+ WHERE n.note_type = 'DiffNote'
|
||||||
|
AND n.position_new_path LIKE ?1 ESCAPE '\\'
|
||||||
|
+ AND n.is_system = 0
|
||||||
|
AND m.state IN ('opened', 'merged')
|
||||||
|
AND m.author_username IS NOT NULL
|
||||||
|
AND n.created_at >= ?2
|
||||||
|
AND (?3 IS NULL OR m.project_id = ?3)
|
||||||
|
|
||||||
|
Change 5 — Rework Migration 017 indexes to match real predicates + add one critical notes index for discussion participation
|
||||||
|
Why
|
||||||
|
|
||||||
|
(a) idx_notes_diffnote_path_created currently leads with note_type even though it’s constant via partial index. You want the leading columns to match your most selective predicates: position_new_path prefix + created_at range, with optional project_id.
|
||||||
|
|
||||||
|
(b) Active + Workload discussion participation repeatedly hits notes by (discussion_id, author_username); you only guarantee notes(discussion_id) is indexed. Adding a narrow partial composite index pays off immediately for both “participants” and “EXISTS user participated” checks.
|
||||||
|
|
||||||
|
(c) The discussions index should focus on (project_id, last_note_at) with a partial predicate; resolvable/resolved a_
|
||||||
471
docs/who-command-design.feedback-3.md
Normal file
471
docs/who-command-design.feedback-3.md
Normal file
@@ -0,0 +1,471 @@
|
|||||||
|
Below are the revisions I’d make to iteration 2 to improve correctness, determinism, query-plan quality, and multi-project usability without turning this into a bigger product.
|
||||||
|
|
||||||
|
I’m treating your plan as the “source of truth” and showing git-diff style patches against the plan text/code blocks you included.
|
||||||
|
|
||||||
|
Change 1 — Fix project scoping to hit the right index (DiffNote branches)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Your hot-path index is:
|
||||||
|
|
||||||
|
idx_notes_diffnote_path_created ON notes(position_new_path, created_at, project_id) WHERE note_type='DiffNote' AND is_system=0
|
||||||
|
|
||||||
|
But in Expert/Overlap you sometimes scope by m.project_id = ?3 (MR table), not n.project_id = ?3 (notes table). That weakens the optimizer’s ability to use the composite notes index (and can force broader joins before filtering).
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ Query: Expert Mode @@
|
||||||
|
- AND (?3 IS NULL OR m.project_id = ?3)
|
||||||
|
+ -- IMPORTANT: scope on notes.project_id to maximize use of
|
||||||
|
+ -- idx_notes_diffnote_path_created (notes is the selective table)
|
||||||
|
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
|
||||||
|
@@ Query: Overlap Mode @@
|
||||||
|
- AND (?3 IS NULL OR m.project_id = ?3)
|
||||||
|
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
|
||||||
|
@@ Query: Overlap Mode (author branch) @@
|
||||||
|
- AND (?3 IS NULL OR m.project_id = ?3)
|
||||||
|
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
|
||||||
|
Change 2 — Introduce a “prefix vs exact” path query to avoid LIKE when you don’t need it
|
||||||
|
Why
|
||||||
|
|
||||||
|
For exact file paths (e.g. src/auth/login.rs), you currently do:
|
||||||
|
|
||||||
|
position_new_path LIKE ?1 ESCAPE '\' where ?1 has no wildcard
|
||||||
|
|
||||||
|
That’s logically fine, but it’s a worse signal to the planner than = and can degrade performance depending on collation/case settings.
|
||||||
|
|
||||||
|
This doesn’t violate “static SQL” — you can pick between two static query strings.
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ Helper: Path Pattern Construction @@
|
||||||
|
-fn build_path_pattern(path: &str) -> String {
|
||||||
|
+struct PathQuery {
|
||||||
|
+ /// The parameter value to bind.
|
||||||
|
+ value: String,
|
||||||
|
+ /// If true: use LIKE value || '%'. If false: use '='.
|
||||||
|
+ is_prefix: bool,
|
||||||
|
+}
|
||||||
|
+
|
||||||
|
+fn build_path_query(path: &str) -> PathQuery {
|
||||||
|
let trimmed = path.trim_end_matches('/');
|
||||||
|
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
|
||||||
|
let is_file = !path.ends_with('/') && last_segment.contains('.');
|
||||||
|
let escaped = escape_like(trimmed);
|
||||||
|
|
||||||
|
if is_file {
|
||||||
|
- escaped
|
||||||
|
+ PathQuery { value: escaped, is_prefix: false }
|
||||||
|
} else {
|
||||||
|
- format!("{escaped}/%")
|
||||||
|
+ PathQuery { value: format!("{escaped}/%"), is_prefix: true }
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
And then (example for DiffNote predicates):
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@ Query: Expert Mode @@
|
||||||
|
- let path_pattern = build_path_pattern(path);
|
||||||
|
+ let pq = build_path_query(path);
|
||||||
|
|
||||||
|
- let sql = " ... n.position_new_path LIKE ?1 ESCAPE '\\' ... ";
|
||||||
|
+ let sql_prefix = " ... n.position_new_path LIKE ?1 ESCAPE '\\' ... ";
|
||||||
|
+ let sql_exact = " ... n.position_new_path = ?1 ... ";
|
||||||
|
|
||||||
|
- let mut stmt = conn.prepare(sql)?;
|
||||||
|
+ let mut stmt = if pq.is_prefix { conn.prepare_cached(sql_prefix)? }
|
||||||
|
+ else { conn.prepare_cached(sql_exact)? };
|
||||||
|
let rows = stmt.query_map(params![... pq.value ...], ...);
|
||||||
|
|
||||||
|
Change 3 — Push Expert aggregation into SQL (less Rust, fewer rows, SQL-level LIMIT)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Right now Expert does:
|
||||||
|
|
||||||
|
UNION ALL
|
||||||
|
|
||||||
|
return per-role rows
|
||||||
|
|
||||||
|
HashMap merge
|
||||||
|
|
||||||
|
score compute
|
||||||
|
|
||||||
|
sort/truncate
|
||||||
|
|
||||||
|
You can do all of that in SQL deterministically, then LIMIT ?N actually works.
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ Query: Expert Mode @@
|
||||||
|
- let sql = "SELECT username, role, activity_count, last_active_at FROM (
|
||||||
|
- ...
|
||||||
|
- )";
|
||||||
|
+ let sql = "
|
||||||
|
+ WITH activity AS (
|
||||||
|
+ SELECT
|
||||||
|
+ n.author_username AS username,
|
||||||
|
+ 'reviewer' AS role,
|
||||||
|
+ COUNT(*) AS cnt,
|
||||||
|
+ MAX(n.created_at) AS last_active_at
|
||||||
|
+ FROM notes n
|
||||||
|
+ WHERE n.note_type = 'DiffNote'
|
||||||
|
+ AND n.is_system = 0
|
||||||
|
+ AND n.author_username IS NOT NULL
|
||||||
|
+ AND n.created_at >= ?2
|
||||||
|
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
+ AND (
|
||||||
|
+ (?4 = 1 AND n.position_new_path LIKE ?1 ESCAPE '\\') OR
|
||||||
|
+ (?4 = 0 AND n.position_new_path = ?1)
|
||||||
|
+ )
|
||||||
|
+ GROUP BY n.author_username
|
||||||
|
+
|
||||||
|
+ UNION ALL
|
||||||
|
+
|
||||||
|
+ SELECT
|
||||||
|
+ m.author_username AS username,
|
||||||
|
+ 'author' AS role,
|
||||||
|
+ COUNT(DISTINCT m.id) AS cnt,
|
||||||
|
+ MAX(n.created_at) AS last_active_at
|
||||||
|
+ FROM merge_requests m
|
||||||
|
+ JOIN discussions d ON d.merge_request_id = m.id
|
||||||
|
+ JOIN notes n ON n.discussion_id = d.id
|
||||||
|
+ WHERE n.note_type = 'DiffNote'
|
||||||
|
+ AND n.is_system = 0
|
||||||
|
+ AND m.author_username IS NOT NULL
|
||||||
|
+ AND n.created_at >= ?2
|
||||||
|
+ AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
+ AND (
|
||||||
|
+ (?4 = 1 AND n.position_new_path LIKE ?1 ESCAPE '\\') OR
|
||||||
|
+ (?4 = 0 AND n.position_new_path = ?1)
|
||||||
|
+ )
|
||||||
|
+ GROUP BY m.author_username
|
||||||
|
+ )
|
||||||
|
+ SELECT
|
||||||
|
+ username,
|
||||||
|
+ SUM(CASE WHEN role='reviewer' THEN cnt ELSE 0 END) AS review_count,
|
||||||
|
+ SUM(CASE WHEN role='author' THEN cnt ELSE 0 END) AS author_count,
|
||||||
|
+ MAX(last_active_at) AS last_active_at,
|
||||||
|
+ (SUM(CASE WHEN role='reviewer' THEN cnt ELSE 0 END) * 3.0) +
|
||||||
|
+ (SUM(CASE WHEN role='author' THEN cnt ELSE 0 END) * 2.0) AS score
|
||||||
|
+ FROM activity
|
||||||
|
+ GROUP BY username
|
||||||
|
+ ORDER BY score DESC, last_active_at DESC, username ASC
|
||||||
|
+ LIMIT ?5
|
||||||
|
+ ";
|
||||||
|
|
||||||
|
- // Aggregate by username: combine reviewer + author counts
|
||||||
|
- let mut user_map: HashMap<...> = HashMap::new();
|
||||||
|
- ...
|
||||||
|
- experts.sort_by(...); experts.truncate(limit);
|
||||||
|
+ // No Rust-side merge/sort needed; SQL already returns final rows.
|
||||||
|
|
||||||
|
Change 4 — Overlap output is ambiguous across projects: include stable MR refs (project_path!iid)
|
||||||
|
Why
|
||||||
|
|
||||||
|
mr_iids: Vec<i64> is ambiguous in a multi-project DB. !123 only means something with a project.
|
||||||
|
|
||||||
|
Also: your MR IID dedup is currently Vec.contains() inside a loop (O(n²)). Use a HashSet.
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ OverlapResult @@
|
||||||
|
pub struct OverlapUser {
|
||||||
|
pub username: String,
|
||||||
|
@@
|
||||||
|
- pub mr_iids: Vec<i64>,
|
||||||
|
+ /// Stable MR references like "group/project!123"
|
||||||
|
+ pub mr_refs: Vec<String>,
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ Query: Overlap Mode (SQL) @@
|
||||||
|
- GROUP_CONCAT(DISTINCT m.iid) AS mr_iids
|
||||||
|
+ GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
|
||||||
|
FROM notes n
|
||||||
|
JOIN discussions d ON n.discussion_id = d.id
|
||||||
|
JOIN merge_requests m ON d.merge_request_id = m.id
|
||||||
|
+ JOIN projects p ON m.project_id = p.id
|
||||||
|
@@
|
||||||
|
- GROUP_CONCAT(DISTINCT m.iid) AS mr_iids
|
||||||
|
+ GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
|
||||||
|
FROM merge_requests m
|
||||||
|
JOIN discussions d ON d.merge_request_id = m.id
|
||||||
|
JOIN notes n ON n.discussion_id = d.id
|
||||||
|
+ JOIN projects p ON m.project_id = p.id
|
||||||
|
|
||||||
|
@@ Query: Overlap Mode (Rust merge) @@
|
||||||
|
- let mr_iids: Vec<i64> = mr_iids_csv ...
|
||||||
|
+ let mr_refs: Vec<String> = mr_refs_csv
|
||||||
|
+ .as_deref()
|
||||||
|
+ .map(|csv| csv.split(',').map(|s| s.trim().to_string()).collect())
|
||||||
|
+ .unwrap_or_default();
|
||||||
|
@@
|
||||||
|
- // Merge MR IIDs, deduplicate
|
||||||
|
- for iid in &mr_iids {
|
||||||
|
- if !entry.mr_iids.contains(iid) {
|
||||||
|
- entry.mr_iids.push(*iid);
|
||||||
|
- }
|
||||||
|
- }
|
||||||
|
+ // Merge MR refs, deduplicate
|
||||||
|
+ use std::collections::HashSet;
|
||||||
|
+ let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
|
||||||
|
+ for r in mr_refs { set.insert(r); }
|
||||||
|
+ entry.mr_refs = set.into_iter().collect();
|
||||||
|
|
||||||
|
Change 5 — Active mode: avoid correlated subqueries by preselecting discussions, then aggregating notes once
|
||||||
|
Why
|
||||||
|
|
||||||
|
Your Active query does two correlated subqueries per discussion row:
|
||||||
|
|
||||||
|
note_count
|
||||||
|
|
||||||
|
participants
|
||||||
|
|
||||||
|
With LIMIT 20 it’s not catastrophic, but it is still unnecessary work and creates “spiky” behavior if the planner chooses poorly.
|
||||||
|
|
||||||
|
Pattern to use:
|
||||||
|
|
||||||
|
CTE selects the limited set of discussions
|
||||||
|
|
||||||
|
Join notes once, aggregate with GROUP BY
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ Query: Active Mode @@
|
||||||
|
- let sql =
|
||||||
|
- "SELECT
|
||||||
|
- d.noteable_type,
|
||||||
|
- ...
|
||||||
|
- (SELECT COUNT(*) FROM notes n
|
||||||
|
- WHERE n.discussion_id = d.id AND n.is_system = 0) AS note_count,
|
||||||
|
- (SELECT GROUP_CONCAT(username, X'1F') FROM (
|
||||||
|
- SELECT DISTINCT n.author_username AS username
|
||||||
|
- FROM notes n
|
||||||
|
- WHERE n.discussion_id = d.id
|
||||||
|
- AND n.is_system = 0
|
||||||
|
- AND n.author_username IS NOT NULL
|
||||||
|
- ORDER BY username
|
||||||
|
- )) AS participants
|
||||||
|
- FROM discussions d
|
||||||
|
- ...
|
||||||
|
- LIMIT ?3";
|
||||||
|
+ let sql = "
|
||||||
|
+ WITH picked AS (
|
||||||
|
+ SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id, d.project_id, d.last_note_at
|
||||||
|
+ FROM discussions d
|
||||||
|
+ WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
+ AND d.last_note_at >= ?1
|
||||||
|
+ AND (?2 IS NULL OR d.project_id = ?2)
|
||||||
|
+ ORDER BY d.last_note_at DESC
|
||||||
|
+ LIMIT ?3
|
||||||
|
+ ),
|
||||||
|
+ note_agg AS (
|
||||||
|
+ SELECT
|
||||||
|
+ n.discussion_id,
|
||||||
|
+ COUNT(*) AS note_count,
|
||||||
|
+ GROUP_CONCAT(n.author_username, X'1F') AS participants
|
||||||
|
+ FROM (
|
||||||
|
+ SELECT DISTINCT discussion_id, author_username
|
||||||
|
+ FROM notes
|
||||||
|
+ WHERE is_system = 0 AND author_username IS NOT NULL
|
||||||
|
+ ) n
|
||||||
|
+ JOIN picked p ON p.id = n.discussion_id
|
||||||
|
+ GROUP BY n.discussion_id
|
||||||
|
+ )
|
||||||
|
+ SELECT
|
||||||
|
+ p.noteable_type,
|
||||||
|
+ COALESCE(i.iid, m.iid) AS entity_iid,
|
||||||
|
+ COALESCE(i.title, m.title) AS entity_title,
|
||||||
|
+ proj.path_with_namespace,
|
||||||
|
+ p.last_note_at,
|
||||||
|
+ COALESCE(na.note_count, 0) AS note_count,
|
||||||
|
+ COALESCE(na.participants, '') AS participants
|
||||||
|
+ FROM picked p
|
||||||
|
+ JOIN projects proj ON p.project_id = proj.id
|
||||||
|
+ LEFT JOIN issues i ON p.issue_id = i.id
|
||||||
|
+ LEFT JOIN merge_requests m ON p.merge_request_id = m.id
|
||||||
|
+ LEFT JOIN note_agg na ON na.discussion_id = p.id
|
||||||
|
+ ORDER BY p.last_note_at DESC
|
||||||
|
+ ";
|
||||||
|
|
||||||
|
Change 6 — Use prepare_cached() everywhere (cheap perf win, no scope creep)
|
||||||
|
Why
|
||||||
|
|
||||||
|
You already worked hard to keep SQL static. Taking advantage of sqlite statement caching completes the loop.
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ Query functions @@
|
||||||
|
- let mut stmt = conn.prepare(sql)?;
|
||||||
|
+ let mut stmt = conn.prepare_cached(sql)?;
|
||||||
|
|
||||||
|
|
||||||
|
Apply in all query fns (query_workload, query_reviews, query_active, query_expert, query_overlap, lookup_project_path).
|
||||||
|
|
||||||
|
Change 7 — Human output: show project_path where ambiguity exists (Workload + Overlap)
|
||||||
|
Why
|
||||||
|
|
||||||
|
When not project-scoped, #42 and !100 aren’t unique. You already have project paths in the query results — you’re just not printing them.
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ print_workload_human @@
|
||||||
|
- println!(
|
||||||
|
- " {} {} {}",
|
||||||
|
+ println!(
|
||||||
|
+ " {} {} {} {}",
|
||||||
|
style(format!("#{:<5}", item.iid)).cyan(),
|
||||||
|
truncate_str(&item.title, 45),
|
||||||
|
style(format_relative_time(item.updated_at)).dim(),
|
||||||
|
+ style(&item.project_path).dim(),
|
||||||
|
);
|
||||||
|
|
||||||
|
@@ print_workload_human (MRs) @@
|
||||||
|
- println!(
|
||||||
|
- " {} {}{} {}",
|
||||||
|
+ println!(
|
||||||
|
+ " {} {}{} {} {}",
|
||||||
|
style(format!("!{:<5}", mr.iid)).cyan(),
|
||||||
|
truncate_str(&mr.title, 40),
|
||||||
|
style(draft).dim(),
|
||||||
|
style(format_relative_time(mr.updated_at)).dim(),
|
||||||
|
+ style(&mr.project_path).dim(),
|
||||||
|
);
|
||||||
|
|
||||||
|
@@ print_overlap_human @@
|
||||||
|
- let mr_str = user.mr_iids.iter().take(5).map(|iid| format!("!{iid}")).collect::<Vec<_>>().join(", ");
|
||||||
|
+ let mr_str = user.mr_refs.iter().take(5).cloned().collect::<Vec<_>>().join(", ");
|
||||||
|
|
||||||
|
Change 8 — Robot JSON: add stable IDs + “defaulted” flags for reproducibility
|
||||||
|
Why
|
||||||
|
|
||||||
|
You already added resolved_input — good. Two more reproducibility gaps remain:
|
||||||
|
|
||||||
|
Agents can’t reliably “open” an entity without IDs (discussion_id, mr_id, issue_id).
|
||||||
|
|
||||||
|
Agents can’t tell whether since was user-provided vs defaulted (important when replaying intent).
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ WhoResolvedInput @@
|
||||||
|
pub struct WhoResolvedInput {
|
||||||
|
@@
|
||||||
|
pub since_ms: Option<i64>,
|
||||||
|
pub since_iso: Option<String>,
|
||||||
|
+ pub since_was_default: bool,
|
||||||
|
pub limit: usize,
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ run_who @@
|
||||||
|
- let since_ms = resolve_since(args.since.as_deref(), "6m")?;
|
||||||
|
+ let since_was_default = args.since.is_none();
|
||||||
|
+ let since_ms = resolve_since(args.since.as_deref(), "6m")?;
|
||||||
|
Ok(WhoRun {
|
||||||
|
resolved_input: WhoResolvedInput {
|
||||||
|
@@
|
||||||
|
since_ms: Some(since_ms),
|
||||||
|
since_iso: Some(ms_to_iso(since_ms)),
|
||||||
|
+ since_was_default,
|
||||||
|
limit: args.limit,
|
||||||
|
},
|
||||||
|
|
||||||
|
@@ print_who_json resolved_input @@
|
||||||
|
let resolved_input = serde_json::json!({
|
||||||
|
@@
|
||||||
|
"since_ms": run.resolved_input.since_ms,
|
||||||
|
"since_iso": run.resolved_input.since_iso,
|
||||||
|
+ "since_was_default": run.resolved_input.since_was_default,
|
||||||
|
"limit": run.resolved_input.limit,
|
||||||
|
});
|
||||||
|
|
||||||
|
|
||||||
|
And for Active/Workload discussion items, add IDs in SQL and JSON:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@ ActiveDiscussion @@
|
||||||
|
pub struct ActiveDiscussion {
|
||||||
|
+ pub discussion_id: i64,
|
||||||
|
@@
|
||||||
|
}
|
||||||
|
|
||||||
|
@@ query_active SELECT @@
|
||||||
|
- SELECT
|
||||||
|
- p.noteable_type,
|
||||||
|
+ SELECT
|
||||||
|
+ p.id AS discussion_id,
|
||||||
|
+ p.noteable_type,
|
||||||
|
|
||||||
|
@@ active_to_json @@
|
||||||
|
- "discussions": r.discussions.iter().map(|d| json!({
|
||||||
|
+ "discussions": r.discussions.iter().map(|d| json!({
|
||||||
|
+ "discussion_id": d.discussion_id,
|
||||||
|
...
|
||||||
|
}))
|
||||||
|
|
||||||
|
Change 9 — Make performance verification explicit: require EXPLAIN QUERY PLAN checks for each mode
|
||||||
|
Why
|
||||||
|
|
||||||
|
You’re adding indexes specifically for these queries. The only way to ensure the planner is doing what you think is to lock in a short perf checklist (especially after schema drift or SQLite version differences).
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@ Verification @@
|
||||||
|
# Manual verification against real data
|
||||||
|
cargo run --release -- who src/features/global-search/
|
||||||
|
@@
|
||||||
|
cargo run --release -- who src/features/global-search/ -p typescript # project scoped
|
||||||
|
+
|
||||||
|
+# Perf verification (required before merge):
|
||||||
|
+# Confirm idx_notes_diffnote_path_created is used for Expert/Overlap and
|
||||||
|
+# idx_discussions_unresolved_recent is used for Active.
|
||||||
|
+sqlite3 path/to/db.sqlite "
|
||||||
|
+ EXPLAIN QUERY PLAN
|
||||||
|
+ SELECT ... -- paste final Expert SQL with representative bindings
|
||||||
|
+";
|
||||||
|
|
||||||
|
|
||||||
|
(Keep it lightweight: one representative query per mode is enough.)
|
||||||
|
|
||||||
|
Net effect
|
||||||
|
|
||||||
|
Correctness: project scoping hits the notes index; IDs added for agent workflows.
|
||||||
|
|
||||||
|
Performance: fewer rows/materialization in Expert; statement caching everywhere; Active avoids correlated subqueries.
|
||||||
|
|
||||||
|
UX: human output no longer ambiguous across projects; Overlap MR references become actionable.
|
||||||
|
|
||||||
|
Reproducibility: agents can distinguish defaults vs explicit inputs; can dereference entities reliably.
|
||||||
|
|
||||||
|
If you want one “highest ROI” subset to implement first: Change 1 + Change 4 + Change 6 + Change 7. That’s where the real operational value lands.
|
||||||
3
docs/who-command-design.feedback-4.md
Normal file
3
docs/who-command-design.feedback-4.md
Normal file
@@ -0,0 +1,3 @@
|
|||||||
|
ChatGPT said:
|
||||||
|
Proposing code revisions for performance and determinism
|
||||||
|
Answer now
|
||||||
356
docs/who-command-design.feedback-5.md
Normal file
356
docs/who-command-design.feedback-5.md
Normal file
@@ -0,0 +1,356 @@
|
|||||||
|
Below are the highest-leverage revisions I’d make. They’re tightly scoped (no new tables/APIs), but fix a few real correctness issues and make the outputs more actionable.
|
||||||
|
|
||||||
|
1) Fix a correctness bug in PathQuery: don’t escape for =, and make --path Makefile actually work
|
||||||
|
Why
|
||||||
|
|
||||||
|
Bug: build_path_query() currently runs escape_like() even when is_prefix = false (exact match). That will break exact matches for paths containing _, %, or \ because = does not treat those as metacharacters (so the escaped string won’t equal the stored path).
|
||||||
|
|
||||||
|
UX mismatch: The plan says --path handles dotless root files (Makefile/LICENSE), but the current logic still treats them as directory prefixes (Makefile/%) → zero results.
|
||||||
|
|
||||||
|
Change
|
||||||
|
|
||||||
|
Only escape for LIKE.
|
||||||
|
|
||||||
|
Treat root paths (no /) passed via --path as exact matches by default (unless they end with /).
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
-/// Build a path query from a user-supplied path.
|
||||||
|
-///
|
||||||
|
-/// Rules:
|
||||||
|
-/// - If the path ends with `/`, it's a directory prefix -> `escaped_path%` (LIKE)
|
||||||
|
-/// - If the last path segment contains `.`, it's a file -> exact match (=)
|
||||||
|
-/// - Otherwise, it's a directory prefix -> `escaped_path/%` (LIKE)
|
||||||
|
+/// Build a path query from a user-supplied path.
|
||||||
|
+///
|
||||||
|
+/// Rules:
|
||||||
|
+/// - If the path ends with `/`, it's a directory prefix -> `escaped_path/%` (LIKE)
|
||||||
|
+/// - If the path is a root path (no `/`) and does NOT end with `/`, treat as exact (=)
|
||||||
|
+/// (this makes `--path Makefile` and `--path LICENSE` work as intended)
|
||||||
|
+/// - Else if the last path segment contains `.`, treat as exact (=)
|
||||||
|
+/// - Otherwise, treat as directory prefix -> `escaped_path/%` (LIKE)
|
||||||
|
@@
|
||||||
|
-fn build_path_query(path: &str) -> PathQuery {
|
||||||
|
+fn build_path_query(path: &str) -> PathQuery {
|
||||||
|
let trimmed = path.trim_end_matches('/');
|
||||||
|
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
|
||||||
|
- let is_file = !path.ends_with('/') && last_segment.contains('.');
|
||||||
|
- let escaped = escape_like(trimmed);
|
||||||
|
+ let is_root = !trimmed.contains('/');
|
||||||
|
+ let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
|
||||||
|
|
||||||
|
if is_file {
|
||||||
|
PathQuery {
|
||||||
|
- value: escaped,
|
||||||
|
+ // IMPORTANT: do NOT escape for exact match (=)
|
||||||
|
+ value: trimmed.to_string(),
|
||||||
|
is_prefix: false,
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
+ let escaped = escape_like(trimmed);
|
||||||
|
PathQuery {
|
||||||
|
value: format!("{escaped}/%"),
|
||||||
|
is_prefix: true,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
-/// **Known limitation:** Dotless root files (LICENSE, Makefile, Dockerfile)
|
||||||
|
-/// without a trailing `/` will be treated as directory prefixes. Use `--path`
|
||||||
|
-/// for these — the `--path` flag passes through to Expert mode directly,
|
||||||
|
-/// and the `build_path_query` output for "LICENSE" is a prefix `LICENSE/%`
|
||||||
|
-/// which will simply return zero results (a safe, obvious failure mode that the
|
||||||
|
-/// help text addresses).
|
||||||
|
+/// Note: Root file paths passed via `--path` (including dotless files like Makefile/LICENSE)
|
||||||
|
+/// are treated as exact matches unless they end with `/`.
|
||||||
|
|
||||||
|
|
||||||
|
Also update the --path help text to be explicit:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
- /// Force expert mode for a file/directory path (handles root files like
|
||||||
|
- /// README.md, LICENSE, Makefile that lack a / and can't be auto-detected)
|
||||||
|
+ /// Force expert mode for a file/directory path.
|
||||||
|
+ /// Root files (README.md, LICENSE, Makefile) are treated as exact matches.
|
||||||
|
+ /// Use a trailing `/` to force directory-prefix matching.
|
||||||
|
|
||||||
|
2) Fix Active mode: your note_count is currently counting participants, and the CTE scans too broadly
|
||||||
|
Why
|
||||||
|
|
||||||
|
In note_agg, you do SELECT DISTINCT discussion_id, author_username and then COUNT(*) AS note_count. That’s participant count, not note count.
|
||||||
|
|
||||||
|
The current note_agg also builds the DISTINCT set from all notes then joins to picked. It’s avoidable work.
|
||||||
|
|
||||||
|
Change
|
||||||
|
|
||||||
|
Split into two aggregations scoped to picked:
|
||||||
|
|
||||||
|
note_counts: counts non-system notes per picked discussion.
|
||||||
|
|
||||||
|
participants: distinct usernames per picked discussion, then GROUP_CONCAT.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
- note_agg AS (
|
||||||
|
- SELECT
|
||||||
|
- n.discussion_id,
|
||||||
|
- COUNT(*) AS note_count,
|
||||||
|
- GROUP_CONCAT(n.author_username, X'1F') AS participants
|
||||||
|
- FROM (
|
||||||
|
- SELECT DISTINCT discussion_id, author_username
|
||||||
|
- FROM notes
|
||||||
|
- WHERE is_system = 0 AND author_username IS NOT NULL
|
||||||
|
- ) n
|
||||||
|
- JOIN picked p ON p.id = n.discussion_id
|
||||||
|
- GROUP BY n.discussion_id
|
||||||
|
- )
|
||||||
|
+ note_counts AS (
|
||||||
|
+ SELECT
|
||||||
|
+ n.discussion_id,
|
||||||
|
+ COUNT(*) AS note_count
|
||||||
|
+ FROM notes n
|
||||||
|
+ JOIN picked p ON p.id = n.discussion_id
|
||||||
|
+ WHERE n.is_system = 0
|
||||||
|
+ GROUP BY n.discussion_id
|
||||||
|
+ ),
|
||||||
|
+ participants AS (
|
||||||
|
+ SELECT
|
||||||
|
+ x.discussion_id,
|
||||||
|
+ GROUP_CONCAT(x.author_username, X'1F') AS participants
|
||||||
|
+ FROM (
|
||||||
|
+ SELECT DISTINCT n.discussion_id, n.author_username
|
||||||
|
+ FROM notes n
|
||||||
|
+ JOIN picked p ON p.id = n.discussion_id
|
||||||
|
+ WHERE n.is_system = 0 AND n.author_username IS NOT NULL
|
||||||
|
+ ) x
|
||||||
|
+ GROUP BY x.discussion_id
|
||||||
|
+ )
|
||||||
|
@@
|
||||||
|
- LEFT JOIN note_agg na ON na.discussion_id = p.id
|
||||||
|
+ LEFT JOIN note_counts nc ON nc.discussion_id = p.id
|
||||||
|
+ LEFT JOIN participants pa ON pa.discussion_id = p.id
|
||||||
|
@@
|
||||||
|
- COALESCE(na.note_count, 0) AS note_count,
|
||||||
|
- COALESCE(na.participants, '') AS participants
|
||||||
|
+ COALESCE(nc.note_count, 0) AS note_count,
|
||||||
|
+ COALESCE(pa.participants, '') AS participants
|
||||||
|
|
||||||
|
|
||||||
|
Net effect: correctness fix + more predictable perf.
|
||||||
|
|
||||||
|
Add a test that would have failed before:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
#[test]
|
||||||
|
fn test_active_query() {
|
||||||
|
@@
|
||||||
|
- insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
|
||||||
|
+ insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/foo.rs", "needs work");
|
||||||
|
+ insert_diffnote(&conn, 2, 1, 1, "reviewer_b", "src/foo.rs", "follow-up");
|
||||||
|
@@
|
||||||
|
- assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
|
||||||
|
+ assert_eq!(result.discussions[0].participants, vec!["reviewer_b"]);
|
||||||
|
+ assert_eq!(result.discussions[0].note_count, 2);
|
||||||
|
|
||||||
|
3) Index fix: idx_discussions_unresolved_recent won’t help global --active ordering
|
||||||
|
Why
|
||||||
|
|
||||||
|
Your index is (project_id, last_note_at) with WHERE resolvable=1 AND resolved=0.
|
||||||
|
|
||||||
|
When --active is not project-scoped (common default), SQLite can’t use (project_id, last_note_at) to satisfy ORDER BY last_note_at DESC efficiently because project_id isn’t constrained.
|
||||||
|
|
||||||
|
This can turn into a scan+sort over potentially large unresolved sets.
|
||||||
|
|
||||||
|
Change
|
||||||
|
|
||||||
|
Keep the project-scoped index, but add a global ordering index (partial, still small):
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent
|
||||||
|
ON discussions(project_id, last_note_at)
|
||||||
|
WHERE resolvable = 1 AND resolved = 0;
|
||||||
|
+
|
||||||
|
+-- Active (global): unresolved discussions by recency (no project scope).
|
||||||
|
+-- Supports ORDER BY last_note_at DESC LIMIT N when project_id is unconstrained.
|
||||||
|
+CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global
|
||||||
|
+ ON discussions(last_note_at)
|
||||||
|
+ WHERE resolvable = 1 AND resolved = 0;
|
||||||
|
|
||||||
|
4) Make Overlap “touches” coherent: count MRs for reviewers, not DiffNotes
|
||||||
|
Why
|
||||||
|
|
||||||
|
Overlap’s question is “Who else has MRs touching my files?” but:
|
||||||
|
|
||||||
|
reviewer branch uses COUNT(*) (DiffNotes)
|
||||||
|
|
||||||
|
author branch uses COUNT(DISTINCT m.id) (MRs)
|
||||||
|
|
||||||
|
Those are different units; summing them into touch_count is misleading.
|
||||||
|
|
||||||
|
Change
|
||||||
|
|
||||||
|
Count distinct MRs on the reviewer branch too:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
- COUNT(*) AS touch_count,
|
||||||
|
+ COUNT(DISTINCT m.id) AS touch_count,
|
||||||
|
MAX(n.created_at) AS last_touch_at,
|
||||||
|
GROUP_CONCAT(DISTINCT (p.path_with_namespace || '!' || m.iid)) AS mr_refs
|
||||||
|
|
||||||
|
|
||||||
|
Also update human output labeling:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
- style("Touches").bold(),
|
||||||
|
+ style("MRs").bold(),
|
||||||
|
|
||||||
|
|
||||||
|
(You still preserve “strength” via mr_refs and last_touch_at.)
|
||||||
|
|
||||||
|
5) Make outputs more actionable: add a canonical ref field (group/project!iid, group/project#iid)
|
||||||
|
Why
|
||||||
|
|
||||||
|
You already do this for Overlap (mr_refs). Doing the same for Workload and Active reduces friction for both humans and agents:
|
||||||
|
|
||||||
|
humans can copy/paste a single token
|
||||||
|
|
||||||
|
robots don’t need to stitch project_path + iid + prefix
|
||||||
|
|
||||||
|
Change (Workload structs + SQL)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
pub struct WorkloadIssue {
|
||||||
|
pub iid: i64,
|
||||||
|
+ pub ref_: String,
|
||||||
|
pub title: String,
|
||||||
|
pub project_path: String,
|
||||||
|
pub updated_at: i64,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
pub struct WorkloadMr {
|
||||||
|
pub iid: i64,
|
||||||
|
+ pub ref_: String,
|
||||||
|
pub title: String,
|
||||||
|
pub draft: bool,
|
||||||
|
pub project_path: String,
|
||||||
|
@@
|
||||||
|
- let issues_sql =
|
||||||
|
- "SELECT i.iid, i.title, p.path_with_namespace, i.updated_at
|
||||||
|
+ let issues_sql =
|
||||||
|
+ "SELECT i.iid,
|
||||||
|
+ (p.path_with_namespace || '#' || i.iid) AS ref,
|
||||||
|
+ i.title, p.path_with_namespace, i.updated_at
|
||||||
|
@@
|
||||||
|
- iid: row.get(0)?,
|
||||||
|
- title: row.get(1)?,
|
||||||
|
- project_path: row.get(2)?,
|
||||||
|
- updated_at: row.get(3)?,
|
||||||
|
+ iid: row.get(0)?,
|
||||||
|
+ ref_: row.get(1)?,
|
||||||
|
+ title: row.get(2)?,
|
||||||
|
+ project_path: row.get(3)?,
|
||||||
|
+ updated_at: row.get(4)?,
|
||||||
|
})
|
||||||
|
@@
|
||||||
|
- let authored_sql =
|
||||||
|
- "SELECT m.iid, m.title, m.draft, p.path_with_namespace, m.updated_at
|
||||||
|
+ let authored_sql =
|
||||||
|
+ "SELECT m.iid,
|
||||||
|
+ (p.path_with_namespace || '!' || m.iid) AS ref,
|
||||||
|
+ m.title, m.draft, p.path_with_namespace, m.updated_at
|
||||||
|
@@
|
||||||
|
- iid: row.get(0)?,
|
||||||
|
- title: row.get(1)?,
|
||||||
|
- draft: row.get::<_, i32>(2)? != 0,
|
||||||
|
- project_path: row.get(3)?,
|
||||||
|
+ iid: row.get(0)?,
|
||||||
|
+ ref_: row.get(1)?,
|
||||||
|
+ title: row.get(2)?,
|
||||||
|
+ draft: row.get::<_, i32>(3)? != 0,
|
||||||
|
+ project_path: row.get(4)?,
|
||||||
|
author_username: None,
|
||||||
|
- updated_at: row.get(4)?,
|
||||||
|
+ updated_at: row.get(5)?,
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
|
Then use ref_ in human output + robot JSON.
|
||||||
|
|
||||||
|
6) Reviews mode: tolerate leading whitespace before **prefix**
|
||||||
|
Why
|
||||||
|
|
||||||
|
Many people write " **suggestion**: ...". Current LIKE '**%**%' misses that.
|
||||||
|
|
||||||
|
Change
|
||||||
|
|
||||||
|
Use ltrim(n.body) consistently:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
- AND n.body LIKE '**%**%'
|
||||||
|
+ AND ltrim(n.body) LIKE '**%**%'
|
||||||
|
@@
|
||||||
|
- SUBSTR(n.body, 3, INSTR(SUBSTR(n.body, 3), '**') - 1) AS raw_prefix,
|
||||||
|
+ SUBSTR(ltrim(n.body), 3, INSTR(SUBSTR(ltrim(n.body), 3), '**') - 1) AS raw_prefix,
|
||||||
|
|
||||||
|
7) Add two small tests that catch the above regressions
|
||||||
|
Why
|
||||||
|
|
||||||
|
These are exactly the kind of issues that slip through without targeted tests.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/plan.md b/plan.md
|
||||||
|
@@
|
||||||
|
#[test]
|
||||||
|
fn test_escape_like() {
|
||||||
|
@@
|
||||||
|
}
|
||||||
|
+
|
||||||
|
+ #[test]
|
||||||
|
+ fn test_build_path_query_exact_does_not_escape() {
|
||||||
|
+ // '_' must not be escaped for '='
|
||||||
|
+ let pq = build_path_query("README_with_underscore.md");
|
||||||
|
+ assert_eq!(pq.value, "README_with_underscore.md");
|
||||||
|
+ assert!(!pq.is_prefix);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ #[test]
|
||||||
|
+ fn test_path_flag_dotless_root_file_is_exact() {
|
||||||
|
+ let pq = build_path_query("Makefile");
|
||||||
|
+ assert_eq!(pq.value, "Makefile");
|
||||||
|
+ assert!(!pq.is_prefix);
|
||||||
|
+ }
|
||||||
|
|
||||||
|
Summary of net effect
|
||||||
|
|
||||||
|
Correctness fixes: exact-path escaping bug; Active.note_count bug.
|
||||||
|
|
||||||
|
Perf fixes: global --active index; avoid broad note scans in Active.
|
||||||
|
|
||||||
|
Usefulness upgrades: coherent overlap “touch” metric; canonical refs everywhere; reviews prefix more robust.
|
||||||
|
|
||||||
|
If you want one extra “stretch” that still isn’t scope creep: add an unscoped warning line in human output when project_id == None (e.g., “Aggregated across projects; use -p to scope”) for Expert/Overlap/Active. That’s pure presentation, but prevents misinterpretation in multi-project DBs.
|
||||||
471
docs/who-command-design.feedback-6.md
Normal file
471
docs/who-command-design.feedback-6.md
Normal file
@@ -0,0 +1,471 @@
|
|||||||
|
Proposed revisions (Iteration 6)
|
||||||
|
|
||||||
|
Below are the highest-leverage changes I’d make on top of your current Iteration 5 plan, with rationale and git-diff style edits to the plan text/snippets.
|
||||||
|
|
||||||
|
1) Fix a real edge case: dotless non-root files (src/Dockerfile, infra/Makefile, etc.)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Your current build_path_query() treats dotless last segments as directories (prefix match) unless the path is root. That misclassifies legitimate dotless files inside directories and silently produces path/% (zero hits or wrong hits).
|
||||||
|
|
||||||
|
Best minimal fix: keep your static SQL approach, but add a DB existence probe (static SQL) for path queries:
|
||||||
|
|
||||||
|
If user didn’t force directory (/), and exact path exists in DiffNotes, treat as exact =.
|
||||||
|
|
||||||
|
Otherwise use prefix LIKE 'dir/%'.
|
||||||
|
|
||||||
|
This avoids new CLI flags, avoids heuristics lists, and uses your existing partial index (idx_notes_diffnote_path_created) efficiently.
|
||||||
|
|
||||||
|
Diff
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/Plan.md b/Plan.md
|
||||||
|
@@
|
||||||
|
-struct PathQuery {
|
||||||
|
+struct PathQuery {
|
||||||
|
/// The parameter value to bind.
|
||||||
|
value: String,
|
||||||
|
/// If true: use `LIKE value ESCAPE '\'`. If false: use `= value`.
|
||||||
|
is_prefix: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
-/// Build a path query from a user-supplied path.
|
||||||
|
+/// Build a path query from a user-supplied path, with a DB probe for dotless files.
|
||||||
|
@@
|
||||||
|
-fn build_path_query(path: &str) -> PathQuery {
|
||||||
|
+fn build_path_query(conn: &Connection, path: &str) -> Result<PathQuery> {
|
||||||
|
let trimmed = path.trim_end_matches('/');
|
||||||
|
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
|
||||||
|
let is_root = !trimmed.contains('/');
|
||||||
|
- let is_file = !path.ends_with('/') && (is_root || last_segment.contains('.'));
|
||||||
|
+ let forced_dir = path.ends_with('/');
|
||||||
|
+ let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
|
||||||
|
+
|
||||||
|
+ // If it doesn't "look like a file" but the exact path exists in DiffNotes,
|
||||||
|
+ // treat as exact (handles src/Dockerfile, infra/Makefile, etc.).
|
||||||
|
+ let exact_exists = if !looks_like_file && !forced_dir {
|
||||||
|
+ conn.query_row(
|
||||||
|
+ "SELECT 1
|
||||||
|
+ FROM notes
|
||||||
|
+ WHERE note_type = 'DiffNote'
|
||||||
|
+ AND is_system = 0
|
||||||
|
+ AND position_new_path = ?1
|
||||||
|
+ LIMIT 1",
|
||||||
|
+ rusqlite::params![trimmed],
|
||||||
|
+ |_| Ok(()),
|
||||||
|
+ ).is_ok()
|
||||||
|
+ } else {
|
||||||
|
+ false
|
||||||
|
+ };
|
||||||
|
+
|
||||||
|
+ let is_file = looks_like_file || exact_exists;
|
||||||
|
|
||||||
|
if is_file {
|
||||||
|
PathQuery {
|
||||||
|
value: trimmed.to_string(),
|
||||||
|
is_prefix: false,
|
||||||
|
}
|
||||||
|
} else {
|
||||||
|
let escaped = escape_like(trimmed);
|
||||||
|
PathQuery {
|
||||||
|
value: format!("{escaped}/%"),
|
||||||
|
is_prefix: true,
|
||||||
|
}
|
||||||
|
}
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Also update callers:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- let pq = build_path_query(path);
|
||||||
|
+ let pq = build_path_query(conn, path)?;
|
||||||
|
@@
|
||||||
|
- let pq = build_path_query(path);
|
||||||
|
+ let pq = build_path_query(conn, path)?;
|
||||||
|
|
||||||
|
|
||||||
|
And tests:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- fn test_build_path_query() {
|
||||||
|
+ fn test_build_path_query() {
|
||||||
|
@@
|
||||||
|
- // Dotless root file -> exact match (root path without '/')
|
||||||
|
+ // Dotless root file -> exact match (root path without '/')
|
||||||
|
let pq = build_path_query("Makefile");
|
||||||
|
assert_eq!(pq.value, "Makefile");
|
||||||
|
assert!(!pq.is_prefix);
|
||||||
|
+
|
||||||
|
+ // Dotless file in subdir should become exact if DB contains it (probe)
|
||||||
|
+ // (set up: insert one DiffNote with position_new_path = "src/Dockerfile")
|
||||||
|
|
||||||
|
2) Make “reviewer” semantics correct: exclude MR authors commenting on their own diffs
|
||||||
|
Why
|
||||||
|
|
||||||
|
Right now, Overlap (and Expert reviewer branch) will count MR authors as “reviewers” if they leave DiffNotes in their own MR (clarifications / replies), inflating A+R and contaminating “who reviewed here” signals.
|
||||||
|
|
||||||
|
You already enforce this in --reviews mode (m.author_username != ?1). Apply the same principle consistently:
|
||||||
|
|
||||||
|
Reviewer branch: only count notes where n.author_username != m.author_username (when both non-NULL).
|
||||||
|
|
||||||
|
Diff (Overlap reviewer branch)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- WHERE n.note_type = 'DiffNote'
|
||||||
|
+ WHERE n.note_type = 'DiffNote'
|
||||||
|
AND n.position_new_path LIKE ?1 ESCAPE '\\'
|
||||||
|
AND n.is_system = 0
|
||||||
|
AND n.author_username IS NOT NULL
|
||||||
|
+ AND (m.author_username IS NULL OR n.author_username != m.author_username)
|
||||||
|
AND n.created_at >= ?2
|
||||||
|
AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
|
||||||
|
|
||||||
|
Same change for sql_exact.
|
||||||
|
|
||||||
|
3) Expert mode scoring: align units + reduce single-MR “comment storms”
|
||||||
|
Why
|
||||||
|
|
||||||
|
Expert currently mixes units:
|
||||||
|
|
||||||
|
reviewer side: DiffNote count
|
||||||
|
|
||||||
|
author side: distinct MR count
|
||||||
|
|
||||||
|
That makes score noisy and can crown “someone who wrote 30 comments on one MR” as top expert.
|
||||||
|
|
||||||
|
Fix: make both sides primarily MR-breadth:
|
||||||
|
|
||||||
|
reviewer: COUNT(DISTINCT m.id) as review_mr_count
|
||||||
|
|
||||||
|
author: COUNT(DISTINCT m.id) as author_mr_count
|
||||||
|
Optionally keep review_note_count as a secondary intensity signal (but not the main driver).
|
||||||
|
|
||||||
|
Diff (types + SQL)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
pub struct Expert {
|
||||||
|
pub username: String,
|
||||||
|
- pub score: f64,
|
||||||
|
- pub review_count: u32,
|
||||||
|
- pub author_count: u32,
|
||||||
|
+ pub score: i64,
|
||||||
|
+ pub review_mr_count: u32,
|
||||||
|
+ pub review_note_count: u32,
|
||||||
|
+ pub author_mr_count: u32,
|
||||||
|
pub last_active_ms: i64,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Reviewer branch now joins to MR so it can count distinct MRs and exclude self-comments:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- SELECT
|
||||||
|
- n.author_username AS username,
|
||||||
|
- 'reviewer' AS role,
|
||||||
|
- COUNT(*) AS cnt,
|
||||||
|
- MAX(n.created_at) AS last_active_at
|
||||||
|
- FROM notes n
|
||||||
|
+ SELECT
|
||||||
|
+ n.author_username AS username,
|
||||||
|
+ 'reviewer' AS role,
|
||||||
|
+ COUNT(DISTINCT m.id) AS mr_cnt,
|
||||||
|
+ COUNT(*) AS note_cnt,
|
||||||
|
+ MAX(n.created_at) AS last_active_at
|
||||||
|
+ FROM notes n
|
||||||
|
+ JOIN discussions d ON n.discussion_id = d.id
|
||||||
|
+ JOIN merge_requests m ON d.merge_request_id = m.id
|
||||||
|
WHERE n.note_type = 'DiffNote'
|
||||||
|
AND n.is_system = 0
|
||||||
|
AND n.author_username IS NOT NULL
|
||||||
|
+ AND (m.author_username IS NULL OR n.author_username != m.author_username)
|
||||||
|
AND n.position_new_path LIKE ?1 ESCAPE '\\'
|
||||||
|
AND n.created_at >= ?2
|
||||||
|
AND (?3 IS NULL OR n.project_id = ?3)
|
||||||
|
GROUP BY n.author_username
|
||||||
|
|
||||||
|
|
||||||
|
Update author branch payload to match shape:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
SELECT
|
||||||
|
m.author_username AS username,
|
||||||
|
'author' AS role,
|
||||||
|
- COUNT(DISTINCT m.id) AS cnt,
|
||||||
|
+ COUNT(DISTINCT m.id) AS mr_cnt,
|
||||||
|
+ 0 AS note_cnt,
|
||||||
|
MAX(n.created_at) AS last_active_at
|
||||||
|
|
||||||
|
|
||||||
|
Aggregate:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
SELECT
|
||||||
|
username,
|
||||||
|
- SUM(CASE WHEN role = 'reviewer' THEN cnt ELSE 0 END) AS review_count,
|
||||||
|
- SUM(CASE WHEN role = 'author' THEN cnt ELSE 0 END) AS author_count,
|
||||||
|
+ SUM(CASE WHEN role = 'reviewer' THEN mr_cnt ELSE 0 END) AS review_mr_count,
|
||||||
|
+ SUM(CASE WHEN role = 'reviewer' THEN note_cnt ELSE 0 END) AS review_note_count,
|
||||||
|
+ SUM(CASE WHEN role = 'author' THEN mr_cnt ELSE 0 END) AS author_mr_count,
|
||||||
|
MAX(last_active_at) AS last_active_at,
|
||||||
|
- (SUM(CASE WHEN role = 'reviewer' THEN cnt ELSE 0 END) * 3.0) +
|
||||||
|
- (SUM(CASE WHEN role = 'author' THEN cnt ELSE 0 END) * 2.0) AS score
|
||||||
|
+ (
|
||||||
|
+ (SUM(CASE WHEN role = 'reviewer' THEN mr_cnt ELSE 0 END) * 20) +
|
||||||
|
+ (SUM(CASE WHEN role = 'author' THEN mr_cnt ELSE 0 END) * 12) +
|
||||||
|
+ (SUM(CASE WHEN role = 'reviewer' THEN note_cnt ELSE 0 END) * 1)
|
||||||
|
+ ) AS score
|
||||||
|
|
||||||
|
|
||||||
|
Human header:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- style("Reviews").bold(),
|
||||||
|
- style("Authored").bold(),
|
||||||
|
+ style("Reviewed(MRs)").bold(),
|
||||||
|
+ style("Notes").bold(),
|
||||||
|
+ style("Authored(MRs)").bold(),
|
||||||
|
|
||||||
|
4) Deterministic output: participants + MR refs + tie-breakers
|
||||||
|
Why
|
||||||
|
|
||||||
|
You’ve correctly focused on reproducibility (resolved_input), but you still have nondeterministic lists:
|
||||||
|
|
||||||
|
participants: GROUP_CONCAT order is undefined → vector order changes run-to-run.
|
||||||
|
|
||||||
|
mr_refs: you dedup via HashSet then iterate → undefined order.
|
||||||
|
|
||||||
|
user sorting in overlap is missing stable tie-breakers.
|
||||||
|
|
||||||
|
This is a real “robot mode flake” source.
|
||||||
|
|
||||||
|
Diff (Active participants sort)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- let participants: Vec<String> = participants_csv
|
||||||
|
+ let mut participants: Vec<String> = participants_csv
|
||||||
|
.as_deref()
|
||||||
|
.filter(|s| !s.is_empty())
|
||||||
|
.map(|csv| csv.split('\x1F').map(String::from).collect())
|
||||||
|
.unwrap_or_default();
|
||||||
|
+ participants.sort(); // stable, deterministic
|
||||||
|
|
||||||
|
Diff (Overlap MR refs sort + stable user sort)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- users.sort_by(|a, b| b.touch_count.cmp(&a.touch_count));
|
||||||
|
+ users.sort_by(|a, b| {
|
||||||
|
+ b.touch_count.cmp(&a.touch_count)
|
||||||
|
+ .then_with(|| b.last_touch_at.cmp(&a.last_touch_at))
|
||||||
|
+ .then_with(|| a.username.cmp(&b.username))
|
||||||
|
+ });
|
||||||
|
@@
|
||||||
|
- entry.mr_refs = set.into_iter().collect();
|
||||||
|
+ let mut v: Vec<String> = set.into_iter().collect();
|
||||||
|
+ v.sort();
|
||||||
|
+ entry.mr_refs = v;
|
||||||
|
|
||||||
|
5) Make --limit actionable: surface truncation explicitly (human + robot)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Agents (and humans) need to know if results were cut off so they can rerun with a bigger -n.
|
||||||
|
Right now there’s no signal.
|
||||||
|
|
||||||
|
Minimal pattern: query limit + 1, set truncated = true if you got > limit, then truncate.
|
||||||
|
|
||||||
|
Diff (result types)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
pub struct ExpertResult {
|
||||||
|
pub path_query: String,
|
||||||
|
pub experts: Vec<Expert>,
|
||||||
|
+ pub truncated: bool,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
pub struct ActiveResult {
|
||||||
|
pub discussions: Vec<ActiveDiscussion>,
|
||||||
|
pub total_unresolved: u32,
|
||||||
|
+ pub truncated: bool,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
pub struct OverlapResult {
|
||||||
|
pub path_query: String,
|
||||||
|
pub users: Vec<OverlapUser>,
|
||||||
|
+ pub truncated: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
Diff (query pattern example)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- let limit_i64 = limit as i64;
|
||||||
|
+ let limit_plus_one = (limit + 1) as i64;
|
||||||
|
@@
|
||||||
|
- LIMIT ?4
|
||||||
|
+ LIMIT ?4
|
||||||
|
@@
|
||||||
|
- rusqlite::params![pq.value, since_ms, project_id, limit_i64],
|
||||||
|
+ rusqlite::params![pq.value, since_ms, project_id, limit_plus_one],
|
||||||
|
@@
|
||||||
|
- Ok(ExpertResult {
|
||||||
|
+ let truncated = experts.len() > limit;
|
||||||
|
+ let experts = experts.into_iter().take(limit).collect();
|
||||||
|
+ Ok(ExpertResult {
|
||||||
|
path_query: path.to_string(),
|
||||||
|
experts,
|
||||||
|
+ truncated,
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
|
Human output hint:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
if r.experts.is_empty() { ... }
|
||||||
|
+ if r.truncated {
|
||||||
|
+ println!(" {}", style("(showing first -n; rerun with a higher --limit)").dim());
|
||||||
|
+ }
|
||||||
|
|
||||||
|
|
||||||
|
Robot output field:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
fn expert_to_json(r: &ExpertResult) -> serde_json::Value {
|
||||||
|
serde_json::json!({
|
||||||
|
"path_query": r.path_query,
|
||||||
|
+ "truncated": r.truncated,
|
||||||
|
"experts": ...
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
6) Overlap merge hot loop: avoid repeated HashSet rebuild per row
|
||||||
|
Why
|
||||||
|
|
||||||
|
This line is expensive in a UNION result with many rows:
|
||||||
|
|
||||||
|
rust
|
||||||
|
Copy code
|
||||||
|
let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
|
||||||
|
|
||||||
|
|
||||||
|
It reallocates and rehashes every time.
|
||||||
|
|
||||||
|
Fix: store an accumulator with HashSet during merge, convert once at end.
|
||||||
|
|
||||||
|
Diff (internal accumulator)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
- let mut user_map: HashMap<String, OverlapUser> = HashMap::new();
|
||||||
|
+ struct OverlapAcc {
|
||||||
|
+ username: String,
|
||||||
|
+ author_touch_count: u32,
|
||||||
|
+ review_touch_count: u32,
|
||||||
|
+ touch_count: u32,
|
||||||
|
+ last_touch_at: i64,
|
||||||
|
+ mr_refs: HashSet<String>,
|
||||||
|
+ }
|
||||||
|
+ let mut user_map: HashMap<String, OverlapAcc> = HashMap::new();
|
||||||
|
@@
|
||||||
|
- let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapUser {
|
||||||
|
+ let entry = user_map.entry(username.clone()).or_insert_with(|| OverlapAcc {
|
||||||
|
username: username.clone(),
|
||||||
|
author_touch_count: 0,
|
||||||
|
review_touch_count: 0,
|
||||||
|
touch_count: 0,
|
||||||
|
last_touch_at: 0,
|
||||||
|
- mr_refs: Vec::new(),
|
||||||
|
+ mr_refs: HashSet::new(),
|
||||||
|
});
|
||||||
|
@@
|
||||||
|
- let mut set: HashSet<String> = entry.mr_refs.drain(..).collect();
|
||||||
|
- for r in mr_refs { set.insert(r); }
|
||||||
|
- entry.mr_refs = set.into_iter().collect();
|
||||||
|
+ for r in mr_refs { entry.mr_refs.insert(r); }
|
||||||
|
@@
|
||||||
|
- let mut users: Vec<OverlapUser> = user_map.into_values().collect();
|
||||||
|
+ let mut users: Vec<OverlapUser> = user_map.into_values().map(|a| {
|
||||||
|
+ let mut mr_refs: Vec<String> = a.mr_refs.into_iter().collect();
|
||||||
|
+ mr_refs.sort();
|
||||||
|
+ OverlapUser {
|
||||||
|
+ username: a.username,
|
||||||
|
+ author_touch_count: a.author_touch_count,
|
||||||
|
+ review_touch_count: a.review_touch_count,
|
||||||
|
+ touch_count: a.touch_count,
|
||||||
|
+ last_touch_at: a.last_touch_at,
|
||||||
|
+ mr_refs,
|
||||||
|
+ }
|
||||||
|
+ }).collect();
|
||||||
|
|
||||||
|
7) Tests to lock these behaviors
|
||||||
|
Add tests (high value)
|
||||||
|
|
||||||
|
dotless subdir file uses DB probe → exact match
|
||||||
|
|
||||||
|
self-review exclusion prevents MR author showing up as reviewer
|
||||||
|
|
||||||
|
deterministic ordering for participants and mr_refs (sort)
|
||||||
|
|
||||||
|
Diff (test additions outline)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
#[test]
|
||||||
|
+ fn test_build_path_query_dotless_subdir_file_uses_probe() {
|
||||||
|
+ let conn = setup_test_db();
|
||||||
|
+ insert_project(&conn, 1, "team/backend");
|
||||||
|
+ insert_mr(&conn, 1, 1, 100, "author_a", "opened");
|
||||||
|
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
|
||||||
|
+ insert_diffnote(&conn, 1, 1, 1, "reviewer_b", "src/Dockerfile", "note");
|
||||||
|
+
|
||||||
|
+ let pq = build_path_query(&conn, "src/Dockerfile").unwrap();
|
||||||
|
+ assert_eq!(pq.value, "src/Dockerfile");
|
||||||
|
+ assert!(!pq.is_prefix);
|
||||||
|
+ }
|
||||||
|
+
|
||||||
|
+ #[test]
|
||||||
|
+ fn test_overlap_excludes_self_review_notes() {
|
||||||
|
+ let conn = setup_test_db();
|
||||||
|
+ insert_project(&conn, 1, "team/backend");
|
||||||
|
+ insert_mr(&conn, 1, 1, 100, "author_a", "opened");
|
||||||
|
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
|
||||||
|
+ // author_a comments on their own MR diff
|
||||||
|
+ insert_diffnote(&conn, 1, 1, 1, "author_a", "src/auth/login.rs", "clarification");
|
||||||
|
+
|
||||||
|
+ let result = query_overlap(&conn, "src/auth/", None, 0, 20).unwrap();
|
||||||
|
+ let u = result.users.iter().find(|u| u.username == "author_a");
|
||||||
|
+ // should not be credited as reviewer touch
|
||||||
|
+ assert!(u.map(|x| x.review_touch_count).unwrap_or(0) == 0);
|
||||||
|
+ }
|
||||||
|
|
||||||
|
Net effect
|
||||||
|
|
||||||
|
Correctness: fixes dotless subdir files + self-review pollution.
|
||||||
|
|
||||||
|
Signal quality: Expert ranking becomes harder to game by comment volume.
|
||||||
|
|
||||||
|
Robot reproducibility: deterministic ordering + explicit truncation.
|
||||||
|
|
||||||
|
Performance: avoids rehash loops in overlap merges; path probe uses indexed equality.
|
||||||
|
|
||||||
|
If you want one “single best” change: #1 (DB probe exact-match) is the most likely to prevent confusing “why is this empty?” behavior without adding any user-facing complexity.
|
||||||
353
docs/who-command-design.feedback-7.md
Normal file
353
docs/who-command-design.feedback-7.md
Normal file
@@ -0,0 +1,353 @@
|
|||||||
|
Below are the highest-leverage revisions I’d make to iteration 6 to improve correctness (multi-project edge cases), robot-mode reliability (bounded payloads + truncation), and signal quality—without changing the fundamental scope (still pure SQL over existing tables).
|
||||||
|
|
||||||
|
1) Make build_path_query project-aware and two-way probe (exact and prefix)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Your DB probe currently answers: “does this exact file exist anywhere in DiffNotes?” That can misclassify in a project-scoped run:
|
||||||
|
|
||||||
|
Path exists as a dotless file in Project A → probe returns true
|
||||||
|
|
||||||
|
User runs -p Project B where the path is a directory (or different shape) → you switch to exact, return empty, and miss valid prefix hits.
|
||||||
|
|
||||||
|
Also, you still have a minor heuristic fragility for dot directories when the user omits trailing / (e.g., .github/workflows): last segment has a dot → you treat as file unless forced dir.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Thread project_id into build_path_query(conn, path, project_id)
|
||||||
|
|
||||||
|
Probe exact first (scoped), then probe prefix (scoped)
|
||||||
|
|
||||||
|
Only fall back to heuristics if both probes fail
|
||||||
|
|
||||||
|
This keeps “static SQL, no dynamic assembly,” and costs at most 2 indexed existence queries per invocation.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
- fn build_path_query(conn: &Connection, path: &str) -> Result<PathQuery> {
|
||||||
|
+ fn build_path_query(conn: &Connection, path: &str, project_id: Option<i64>) -> Result<PathQuery> {
|
||||||
|
let trimmed = path.trim_end_matches('/');
|
||||||
|
let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
|
||||||
|
let is_root = !trimmed.contains('/');
|
||||||
|
let forced_dir = path.ends_with('/');
|
||||||
|
- let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
|
||||||
|
+ // Heuristic is now only a fallback; probes decide first.
|
||||||
|
+ let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
|
||||||
|
|
||||||
|
- let exact_exists = if !looks_like_file && !forced_dir {
|
||||||
|
- conn.query_row(
|
||||||
|
- "SELECT 1 FROM notes
|
||||||
|
- WHERE note_type = 'DiffNote'
|
||||||
|
- AND is_system = 0
|
||||||
|
- AND position_new_path = ?1
|
||||||
|
- LIMIT 1",
|
||||||
|
- rusqlite::params![trimmed],
|
||||||
|
- |_| Ok(()),
|
||||||
|
- )
|
||||||
|
- .is_ok()
|
||||||
|
- } else {
|
||||||
|
- false
|
||||||
|
- };
|
||||||
|
+ // Probe 1: exact file exists (scoped)
|
||||||
|
+ let exact_exists = conn.query_row(
|
||||||
|
+ "SELECT 1 FROM notes
|
||||||
|
+ WHERE note_type = 'DiffNote'
|
||||||
|
+ AND is_system = 0
|
||||||
|
+ AND position_new_path = ?1
|
||||||
|
+ AND (?2 IS NULL OR project_id = ?2)
|
||||||
|
+ LIMIT 1",
|
||||||
|
+ rusqlite::params![trimmed, project_id],
|
||||||
|
+ |_| Ok(()),
|
||||||
|
+ ).is_ok();
|
||||||
|
+
|
||||||
|
+ // Probe 2: directory prefix exists (scoped)
|
||||||
|
+ let prefix_exists = if !forced_dir {
|
||||||
|
+ let escaped = escape_like(trimmed);
|
||||||
|
+ let pat = format!("{escaped}/%");
|
||||||
|
+ conn.query_row(
|
||||||
|
+ "SELECT 1 FROM notes
|
||||||
|
+ WHERE note_type = 'DiffNote'
|
||||||
|
+ AND is_system = 0
|
||||||
|
+ AND position_new_path LIKE ?1 ESCAPE '\\'
|
||||||
|
+ AND (?2 IS NULL OR project_id = ?2)
|
||||||
|
+ LIMIT 1",
|
||||||
|
+ rusqlite::params![pat, project_id],
|
||||||
|
+ |_| Ok(()),
|
||||||
|
+ ).is_ok()
|
||||||
|
+ } else { false };
|
||||||
|
|
||||||
|
- let is_file = looks_like_file || exact_exists;
|
||||||
|
+ // Forced directory always wins; otherwise: exact > prefix > heuristic
|
||||||
|
+ let is_file = if forced_dir { false }
|
||||||
|
+ else if exact_exists { true }
|
||||||
|
+ else if prefix_exists { false }
|
||||||
|
+ else { looks_like_file };
|
||||||
|
|
||||||
|
if is_file {
|
||||||
|
Ok(PathQuery { value: trimmed.to_string(), is_prefix: false })
|
||||||
|
} else {
|
||||||
|
let escaped = escape_like(trimmed);
|
||||||
|
Ok(PathQuery { value: format!("{escaped}/%"), is_prefix: true })
|
||||||
|
}
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
- let pq = build_path_query(conn, path)?;
|
||||||
|
+ let pq = build_path_query(conn, path, project_id)?;
|
||||||
|
|
||||||
|
|
||||||
|
Add test coverage for the multi-project misclassification case:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
#[test]
|
||||||
|
fn test_build_path_query_dotless_subdir_file_uses_db_probe() {
|
||||||
|
@@
|
||||||
|
- let pq = build_path_query(&conn, "src/Dockerfile").unwrap();
|
||||||
|
+ let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();
|
||||||
|
@@
|
||||||
|
- let pq2 = build_path_query(&conn2, "src/Dockerfile").unwrap();
|
||||||
|
+ let pq2 = build_path_query(&conn2, "src/Dockerfile", None).unwrap();
|
||||||
|
}
|
||||||
|
+
|
||||||
|
+ #[test]
|
||||||
|
+ fn test_build_path_query_probe_is_project_scoped() {
|
||||||
|
+ // Path exists as a dotless file in project 1; project 2 should not
|
||||||
|
+ // treat it as an exact file unless it exists there too.
|
||||||
|
+ let conn = setup_test_db();
|
||||||
|
+ insert_project(&conn, 1, "team/a");
|
||||||
|
+ insert_project(&conn, 2, "team/b");
|
||||||
|
+ insert_mr(&conn, 1, 1, 10, "author_a", "opened");
|
||||||
|
+ insert_discussion(&conn, 1, 1, Some(1), None, true, false);
|
||||||
|
+ insert_diffnote(&conn, 1, 1, 1, "rev", "infra/Makefile", "note");
|
||||||
|
+
|
||||||
|
+ let pq_scoped = build_path_query(&conn, "infra/Makefile", Some(2)).unwrap();
|
||||||
|
+ assert!(pq_scoped.is_prefix); // should fall back to prefix in project 2
|
||||||
|
+ }
|
||||||
|
|
||||||
|
2) Bound robot payload sizes for participants and mr_refs (with totals + truncation)
|
||||||
|
Why
|
||||||
|
|
||||||
|
mr_refs and participants can become unbounded arrays in robot mode, which is a real operational hazard:
|
||||||
|
|
||||||
|
huge JSON → slow, noisy diffs, brittle downstream pipelines
|
||||||
|
|
||||||
|
potential SQLite group_concat truncation becomes invisible (and you can’t distinguish “no refs” vs “refs truncated”)
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
Introduce hard caps and explicit metadata:
|
||||||
|
|
||||||
|
participants_total, participants_truncated
|
||||||
|
|
||||||
|
mr_refs_total, mr_refs_truncated
|
||||||
|
|
||||||
|
This is not scope creep—it’s defensive output hygiene.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
pub struct ActiveDiscussion {
|
||||||
|
@@
|
||||||
|
pub participants: Vec<String>,
|
||||||
|
+ pub participants_total: u32,
|
||||||
|
+ pub participants_truncated: bool,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
pub struct OverlapUser {
|
||||||
|
@@
|
||||||
|
pub mr_refs: Vec<String>,
|
||||||
|
+ pub mr_refs_total: u32,
|
||||||
|
+ pub mr_refs_truncated: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Implementation sketch (Rust-side, deterministic):
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
fn query_active(...) -> Result<ActiveResult> {
|
||||||
|
+ const MAX_PARTICIPANTS: usize = 50;
|
||||||
|
@@
|
||||||
|
- participants.sort();
|
||||||
|
+ participants.sort();
|
||||||
|
+ let participants_total = participants.len() as u32;
|
||||||
|
+ let participants_truncated = participants.len() > MAX_PARTICIPANTS;
|
||||||
|
+ if participants_truncated {
|
||||||
|
+ participants.truncate(MAX_PARTICIPANTS);
|
||||||
|
+ }
|
||||||
|
@@
|
||||||
|
Ok(ActiveDiscussion {
|
||||||
|
@@
|
||||||
|
participants,
|
||||||
|
+ participants_total,
|
||||||
|
+ participants_truncated,
|
||||||
|
})
|
||||||
|
@@
|
||||||
|
fn query_overlap(...) -> Result<OverlapResult> {
|
||||||
|
+ const MAX_MR_REFS_PER_USER: usize = 50;
|
||||||
|
@@
|
||||||
|
.map(|a| {
|
||||||
|
let mut mr_refs: Vec<String> = a.mr_refs.into_iter().collect();
|
||||||
|
mr_refs.sort();
|
||||||
|
+ let mr_refs_total = mr_refs.len() as u32;
|
||||||
|
+ let mr_refs_truncated = mr_refs.len() > MAX_MR_REFS_PER_USER;
|
||||||
|
+ if mr_refs_truncated {
|
||||||
|
+ mr_refs.truncate(MAX_MR_REFS_PER_USER);
|
||||||
|
+ }
|
||||||
|
OverlapUser {
|
||||||
|
@@
|
||||||
|
mr_refs,
|
||||||
|
+ mr_refs_total,
|
||||||
|
+ mr_refs_truncated,
|
||||||
|
}
|
||||||
|
})
|
||||||
|
|
||||||
|
|
||||||
|
Update robot JSON accordingly:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
fn active_to_json(r: &ActiveResult) -> serde_json::Value {
|
||||||
|
@@
|
||||||
|
"participants": d.participants,
|
||||||
|
+ "participants_total": d.participants_total,
|
||||||
|
+ "participants_truncated": d.participants_truncated,
|
||||||
|
}))
|
||||||
|
@@
|
||||||
|
fn overlap_to_json(r: &OverlapResult) -> serde_json::Value {
|
||||||
|
@@
|
||||||
|
"mr_refs": u.mr_refs,
|
||||||
|
+ "mr_refs_total": u.mr_refs_total,
|
||||||
|
+ "mr_refs_truncated": u.mr_refs_truncated,
|
||||||
|
}))
|
||||||
|
|
||||||
|
|
||||||
|
Also update robot-docs manifest schema snippet for who.active.discussions[] and who.overlap.users[].
|
||||||
|
|
||||||
|
3) Add truncation metadata to Workload sections (same LIMIT+1 pattern)
|
||||||
|
Why
|
||||||
|
|
||||||
|
Workload is the mode most likely to be consumed by agents, and right now it has silent truncation (each section is LIMIT N with no signal). Your plan already treats truncation as a first-class contract elsewhere; Workload should match.
|
||||||
|
|
||||||
|
Revision
|
||||||
|
|
||||||
|
For each workload query:
|
||||||
|
|
||||||
|
request LIMIT + 1
|
||||||
|
|
||||||
|
set *_truncated booleans
|
||||||
|
|
||||||
|
trim to requested limit
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
pub struct WorkloadResult {
|
||||||
|
pub username: String,
|
||||||
|
pub assigned_issues: Vec<WorkloadIssue>,
|
||||||
|
pub authored_mrs: Vec<WorkloadMr>,
|
||||||
|
pub reviewing_mrs: Vec<WorkloadMr>,
|
||||||
|
pub unresolved_discussions: Vec<WorkloadDiscussion>,
|
||||||
|
+ pub assigned_issues_truncated: bool,
|
||||||
|
+ pub authored_mrs_truncated: bool,
|
||||||
|
+ pub reviewing_mrs_truncated: bool,
|
||||||
|
+ pub unresolved_discussions_truncated: bool,
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
And in JSON include the booleans (plus you already have summary.counts).
|
||||||
|
|
||||||
|
This is mechanically repetitive but extremely valuable for automation.
|
||||||
|
|
||||||
|
4) Rename “Last Active” → “Last Seen” for Expert/Overlap
|
||||||
|
Why
|
||||||
|
|
||||||
|
For “author” rows, the timestamp is derived from review activity on their MR (via MAX(n.created_at)), not necessarily that person’s direct action. Calling that “active” is semantically misleading. “Last seen” is accurate across both reviewer+author branches.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
pub struct Expert {
|
||||||
|
@@
|
||||||
|
- pub last_active_ms: i64,
|
||||||
|
+ pub last_seen_ms: i64,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
pub struct OverlapUser {
|
||||||
|
@@
|
||||||
|
- pub last_touch_at: i64,
|
||||||
|
+ pub last_seen_at: i64,
|
||||||
|
@@
|
||||||
|
fn print_expert_human(...) {
|
||||||
|
@@
|
||||||
|
- style("Last Active").bold(),
|
||||||
|
+ style("Last Seen").bold(),
|
||||||
|
@@
|
||||||
|
- style(format_relative_time(expert.last_active_ms)).dim(),
|
||||||
|
+ style(format_relative_time(expert.last_seen_ms)).dim(),
|
||||||
|
|
||||||
|
|
||||||
|
(Keep internal SQL aliases consistent: last_seen_at everywhere.)
|
||||||
|
|
||||||
|
5) Make MR state filtering consistent in Expert/Overlap reviewer branches
|
||||||
|
Why
|
||||||
|
|
||||||
|
You already restrict Overlap author branch to opened|merged, but reviewer branches can include closed/unmerged noise. Consistency improves signal quality and can reduce scan churn.
|
||||||
|
|
||||||
|
Low-risk revision: apply the same state filter to reviewer branches (Expert + Overlap). You can keep “closed” excluded by default without adding new flags.
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
WHERE n.note_type = 'DiffNote'
|
||||||
|
AND n.is_system = 0
|
||||||
|
@@
|
||||||
|
- AND n.created_at >= ?2
|
||||||
|
+ AND m.state IN ('opened','merged')
|
||||||
|
+ AND n.created_at >= ?2
|
||||||
|
|
||||||
|
|
||||||
|
This is a semantic choice; if you later want archaeology across closed/unmerged, that belongs in a separate mode/flag, but I would not add it now.
|
||||||
|
|
||||||
|
6) Add a design principle for bounded outputs (aligns with robot-first reproducibility)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
10. **Truncation transparency.** Result types carry a `truncated: bool` flag...
|
||||||
|
+11. **Bounded payloads.** Robot JSON must never emit unbounded arrays (participants, refs).
|
||||||
|
+ Large list fields are capped with `*_total` + `*_truncated` so agents can page/retry.
|
||||||
|
|
||||||
|
Consolidated plan metadata bump (Iteration 7)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
@@
|
||||||
|
-iteration: 6
|
||||||
|
+iteration: 7
|
||||||
|
updated: 2026-02-07
|
||||||
|
|
||||||
|
Net effect (what you get)
|
||||||
|
|
||||||
|
Correct path classification under -p scoping (no cross-project probe leakage)
|
||||||
|
|
||||||
|
Deterministic + bounded robot payloads (no giant JSON surprises)
|
||||||
|
|
||||||
|
Uniform truncation contract across all modes (Workload no longer silently truncates)
|
||||||
|
|
||||||
|
Clearer semantics (“Last Seen” avoids misinterpretation)
|
||||||
|
|
||||||
|
Cleaner signals (reviewer branches ignore closed/unmerged by default)
|
||||||
|
|
||||||
|
If you want, I can also produce a second diff that updates the robot-docs schema block and the Verification EXPLAIN expectations to reflect the new probe queries and the state filter.
|
||||||
552
docs/who-command-design.feedback-8.md
Normal file
552
docs/who-command-design.feedback-8.md
Normal file
@@ -0,0 +1,552 @@
|
|||||||
|
Below are the highest-leverage revisions I’d make for iteration 8, staying within your MVP constraints (static SQL, no scope creep into new data sources), but tightening correctness, index utilization predictability, debuggability, and output safety.
|
||||||
|
|
||||||
|
1) Fix the semantic bug in since_was_default (Workload mode) by introducing since_mode
|
||||||
|
Why this is better
|
||||||
|
|
||||||
|
Right now since_was_default = args.since.is_none() is misleading for Workload, because Workload has no default window (it’s “unbounded unless explicitly filtered”). In robot mode, this creates incorrect intent replay and ambiguity.
|
||||||
|
|
||||||
|
Replace the boolean with a tri-state:
|
||||||
|
|
||||||
|
since_mode: "default" | "explicit" | "none"
|
||||||
|
|
||||||
|
Keep since_was_default only if you want backward compatibility, but compute it as since_mode == "default".
|
||||||
|
|
||||||
|
Patch
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
-5. **Robot-first reproducibility.** Robot JSON output includes both a raw `input` object (echoing CLI args) and a `resolved_input` object (computed `since_ms`, `since_iso`, `since_was_default`, resolved `project_id` + `project_path`, effective `mode`, `limit`) so agents can trace exactly what ran and reproduce it precisely.
|
||||||
|
+5. **Robot-first reproducibility.** Robot JSON output includes both a raw `input` object (echoing CLI args) and a `resolved_input` object (computed `since_ms`, `since_iso`, `since_mode`, resolved `project_id` + `project_path`, effective `mode`, `limit`) so agents can trace exactly what ran and reproduce it precisely.
|
||||||
|
@@
|
||||||
|
pub struct WhoResolvedInput {
|
||||||
|
pub mode: String,
|
||||||
|
pub project_id: Option<i64>,
|
||||||
|
pub project_path: Option<String>,
|
||||||
|
pub since_ms: Option<i64>,
|
||||||
|
pub since_iso: Option<String>,
|
||||||
|
- pub since_was_default: bool,
|
||||||
|
+ /// "default" (mode default applied), "explicit" (user provided --since), "none" (no window)
|
||||||
|
+ pub since_mode: String,
|
||||||
|
pub limit: usize,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
- let since_was_default = args.since.is_none();
|
||||||
|
+ // since_mode semantics:
|
||||||
|
+ // - expert/reviews/active/overlap: default window applies if args.since is None
|
||||||
|
+ // - workload: no default window; args.since None => "none"
|
||||||
|
+ let since_mode_for_defaulted = if args.since.is_some() { "explicit" } else { "default" };
|
||||||
|
+ let since_mode_for_workload = if args.since.is_some() { "explicit" } else { "none" };
|
||||||
|
@@
|
||||||
|
WhoMode::Expert { path } => {
|
||||||
|
let since_ms = resolve_since(args.since.as_deref(), "6m")?;
|
||||||
|
let result = query_expert(&conn, path, project_id, since_ms, args.limit)?;
|
||||||
|
Ok(WhoRun {
|
||||||
|
resolved_input: WhoResolvedInput {
|
||||||
|
mode: "expert".to_string(),
|
||||||
|
project_id,
|
||||||
|
project_path,
|
||||||
|
since_ms: Some(since_ms),
|
||||||
|
since_iso: Some(ms_to_iso(since_ms)),
|
||||||
|
- since_was_default,
|
||||||
|
+ since_mode: since_mode_for_defaulted.to_string(),
|
||||||
|
limit: args.limit,
|
||||||
|
},
|
||||||
|
result: WhoResult::Expert(result),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
WhoMode::Workload { username } => {
|
||||||
|
let since_ms = args
|
||||||
|
.since
|
||||||
|
.as_deref()
|
||||||
|
.map(|s| resolve_since_required(s))
|
||||||
|
.transpose()?;
|
||||||
|
let result = query_workload(&conn, username, project_id, since_ms, args.limit)?;
|
||||||
|
Ok(WhoRun {
|
||||||
|
resolved_input: WhoResolvedInput {
|
||||||
|
mode: "workload".to_string(),
|
||||||
|
project_id,
|
||||||
|
project_path,
|
||||||
|
since_ms,
|
||||||
|
since_iso: since_ms.map(ms_to_iso),
|
||||||
|
- since_was_default,
|
||||||
|
+ since_mode: since_mode_for_workload.to_string(),
|
||||||
|
limit: args.limit,
|
||||||
|
},
|
||||||
|
result: WhoResult::Workload(result),
|
||||||
|
})
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
fn print_who_json(run: &WhoRun, args: &WhoArgs, elapsed_ms: u64) {
|
||||||
|
@@
|
||||||
|
let resolved_input = serde_json::json!({
|
||||||
|
"mode": run.resolved_input.mode,
|
||||||
|
"project_id": run.resolved_input.project_id,
|
||||||
|
"project_path": run.resolved_input.project_path,
|
||||||
|
"since_ms": run.resolved_input.since_ms,
|
||||||
|
"since_iso": run.resolved_input.since_iso,
|
||||||
|
- "since_was_default": run.resolved_input.since_was_default,
|
||||||
|
+ "since_mode": run.resolved_input.since_mode,
|
||||||
|
"limit": run.resolved_input.limit,
|
||||||
|
});
|
||||||
|
}
|
||||||
|
|
||||||
|
2) Stop using nullable-OR ((? IS NULL OR col = ?)) where it determines the “right” index (Active is the big one)
|
||||||
|
Why this is better
|
||||||
|
|
||||||
|
Your global vs project-scoped Active indexes are correct, but the nullable binding pattern undermines them because SQLite’s planner can’t assume whether ?2 is NULL at prepare time. Result: it can pick a “good enough for both” plan, which is often the wrong one for -p.
|
||||||
|
|
||||||
|
Fix: keep SQL static, but use two static statements selected at runtime (like you already do for exact vs prefix path matching).
|
||||||
|
|
||||||
|
Patch
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
-1. **Lean on existing infrastructure.** Use `(?N IS NULL OR ...)` nullable binding pattern (already used in `timeline_seed.rs`) instead of dynamic SQL string assembly.
|
||||||
|
+1. **Lean on existing infrastructure.** Prefer `(?N IS NULL OR ...)` nullable binding for optional filters **unless** it materially changes index choice. In those cases, select between **two static SQL strings** at runtime (no `format!()`), e.g. Active mode uses separate global vs project-scoped statements to ensure the intended index is used.
|
||||||
|
@@
|
||||||
|
fn query_active(
|
||||||
|
conn: &Connection,
|
||||||
|
project_id: Option<i64>,
|
||||||
|
since_ms: i64,
|
||||||
|
limit: usize,
|
||||||
|
) -> Result<ActiveResult> {
|
||||||
|
let limit_plus_one = (limit + 1) as i64;
|
||||||
|
|
||||||
|
- // Total unresolved count
|
||||||
|
- let total_sql =
|
||||||
|
- "SELECT COUNT(*) FROM discussions d
|
||||||
|
- WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
- AND d.last_note_at >= ?1
|
||||||
|
- AND (?2 IS NULL OR d.project_id = ?2)";
|
||||||
|
+ // Total unresolved count (two static variants to avoid nullable-OR planner ambiguity)
|
||||||
|
+ let total_sql_global =
|
||||||
|
+ "SELECT COUNT(*) FROM discussions d
|
||||||
|
+ WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
+ AND d.last_note_at >= ?1";
|
||||||
|
+ let total_sql_scoped =
|
||||||
|
+ "SELECT COUNT(*) FROM discussions d
|
||||||
|
+ WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
+ AND d.last_note_at >= ?1
|
||||||
|
+ AND d.project_id = ?2";
|
||||||
|
|
||||||
|
- let total_unresolved: u32 =
|
||||||
|
- conn.query_row(total_sql, rusqlite::params![since_ms, project_id], |row| row.get(0))?;
|
||||||
|
+ let total_unresolved: u32 = match project_id {
|
||||||
|
+ None => conn.query_row(total_sql_global, rusqlite::params![since_ms], |row| row.get(0))?,
|
||||||
|
+ Some(pid) => conn.query_row(total_sql_scoped, rusqlite::params![since_ms, pid], |row| row.get(0))?,
|
||||||
|
+ };
|
||||||
|
|
||||||
|
- let sql = "
|
||||||
|
+ let sql_global = "
|
||||||
|
WITH picked AS (
|
||||||
|
SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id,
|
||||||
|
d.project_id, d.last_note_at
|
||||||
|
FROM discussions d
|
||||||
|
WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
AND d.last_note_at >= ?1
|
||||||
|
- AND (?2 IS NULL OR d.project_id = ?2)
|
||||||
|
ORDER BY d.last_note_at DESC
|
||||||
|
LIMIT ?2
|
||||||
|
),
|
||||||
|
@@
|
||||||
|
ORDER BY p.last_note_at DESC
|
||||||
|
";
|
||||||
|
|
||||||
|
- let mut stmt = conn.prepare_cached(sql)?;
|
||||||
|
- let discussions: Vec<ActiveDiscussion> = stmt
|
||||||
|
- .query_map(rusqlite::params![since_ms, project_id, limit_plus_one], |row| {
|
||||||
|
+ let sql_scoped = "
|
||||||
|
+ WITH picked AS (
|
||||||
|
+ SELECT d.id, d.noteable_type, d.issue_id, d.merge_request_id,
|
||||||
|
+ d.project_id, d.last_note_at
|
||||||
|
+ FROM discussions d
|
||||||
|
+ WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
+ AND d.last_note_at >= ?1
|
||||||
|
+ AND d.project_id = ?2
|
||||||
|
+ ORDER BY d.last_note_at DESC
|
||||||
|
+ LIMIT ?3
|
||||||
|
+ ),
|
||||||
|
+ note_counts AS (
|
||||||
|
+ SELECT n.discussion_id, COUNT(*) AS note_count
|
||||||
|
+ FROM notes n
|
||||||
|
+ JOIN picked p ON p.id = n.discussion_id
|
||||||
|
+ WHERE n.is_system = 0
|
||||||
|
+ GROUP BY n.discussion_id
|
||||||
|
+ ),
|
||||||
|
+ participants AS (
|
||||||
|
+ SELECT x.discussion_id, GROUP_CONCAT(x.author_username, X'1F') AS participants
|
||||||
|
+ FROM (
|
||||||
|
+ SELECT DISTINCT n.discussion_id, n.author_username
|
||||||
|
+ FROM notes n
|
||||||
|
+ JOIN picked p ON p.id = n.discussion_id
|
||||||
|
+ WHERE n.is_system = 0 AND n.author_username IS NOT NULL
|
||||||
|
+ ) x
|
||||||
|
+ GROUP BY x.discussion_id
|
||||||
|
+ )
|
||||||
|
+ SELECT
|
||||||
|
+ p.id AS discussion_id,
|
||||||
|
+ p.noteable_type,
|
||||||
|
+ COALESCE(i.iid, m.iid) AS entity_iid,
|
||||||
|
+ COALESCE(i.title, m.title) AS entity_title,
|
||||||
|
+ proj.path_with_namespace,
|
||||||
|
+ p.last_note_at,
|
||||||
|
+ COALESCE(nc.note_count, 0) AS note_count,
|
||||||
|
+ COALESCE(pa.participants, '') AS participants
|
||||||
|
+ FROM picked p
|
||||||
|
+ JOIN projects proj ON p.project_id = proj.id
|
||||||
|
+ LEFT JOIN issues i ON p.issue_id = i.id
|
||||||
|
+ LEFT JOIN merge_requests m ON p.merge_request_id = m.id
|
||||||
|
+ LEFT JOIN note_counts nc ON nc.discussion_id = p.id
|
||||||
|
+ LEFT JOIN participants pa ON pa.discussion_id = p.id
|
||||||
|
+ ORDER BY p.last_note_at DESC
|
||||||
|
+ ";
|
||||||
|
+
|
||||||
|
+ let discussions: Vec<ActiveDiscussion> = match project_id {
|
||||||
|
+ None => {
|
||||||
|
+ let mut stmt = conn.prepare_cached(sql_global)?;
|
||||||
|
+ stmt.query_map(rusqlite::params![since_ms, limit_plus_one], |row| {
|
||||||
|
+ /* unchanged row mapping */
|
||||||
|
+ })?.collect::<std::result::Result<Vec<_>, _>>()?
|
||||||
|
+ }
|
||||||
|
+ Some(pid) => {
|
||||||
|
+ let mut stmt = conn.prepare_cached(sql_scoped)?;
|
||||||
|
+ stmt.query_map(rusqlite::params![since_ms, pid, limit_plus_one], |row| {
|
||||||
|
+ /* unchanged row mapping */
|
||||||
|
+ })?.collect::<std::result::Result<Vec<_>, _>>()?
|
||||||
|
+ }
|
||||||
|
+ };
|
||||||
|
|
||||||
|
|
||||||
|
Also update Verification to explicitly check both variants:
|
||||||
|
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
@@
|
||||||
|
# Performance verification (required before merge):
|
||||||
|
@@
|
||||||
|
sqlite3 path/to/db.sqlite "
|
||||||
|
EXPLAIN QUERY PLAN
|
||||||
|
SELECT d.id, d.last_note_at
|
||||||
|
FROM discussions d
|
||||||
|
WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
AND d.last_note_at >= 0
|
||||||
|
ORDER BY d.last_note_at DESC
|
||||||
|
LIMIT 20;
|
||||||
|
"
|
||||||
|
# Expected: SEARCH discussions USING INDEX idx_discussions_unresolved_recent_global
|
||||||
|
+
|
||||||
|
+sqlite3 path/to/db.sqlite "
|
||||||
|
+ EXPLAIN QUERY PLAN
|
||||||
|
+ SELECT d.id, d.last_note_at
|
||||||
|
+ FROM discussions d
|
||||||
|
+ WHERE d.resolvable = 1 AND d.resolved = 0
|
||||||
|
+ AND d.project_id = 1
|
||||||
|
+ AND d.last_note_at >= 0
|
||||||
|
+ ORDER BY d.last_note_at DESC
|
||||||
|
+ LIMIT 20;
|
||||||
|
+"
|
||||||
|
+# Expected: SEARCH discussions USING INDEX idx_discussions_unresolved_recent
|
||||||
|
|
||||||
|
3) Add repo-path normalization (eliminate trivial “no results” footguns)
|
||||||
|
Why this is better
|
||||||
|
|
||||||
|
People paste:
|
||||||
|
|
||||||
|
./src/foo/
|
||||||
|
|
||||||
|
/src/foo/
|
||||||
|
|
||||||
|
src\foo\bar.rs (Windows)
|
||||||
|
These currently lead to silent misses.
|
||||||
|
|
||||||
|
Normalize only user input (not DB content):
|
||||||
|
|
||||||
|
trim whitespace
|
||||||
|
|
||||||
|
strip leading ./ and /
|
||||||
|
|
||||||
|
convert \ → / when present
|
||||||
|
|
||||||
|
collapse repeated //
|
||||||
|
|
||||||
|
Patch
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
fn resolve_mode<'a>(args: &'a WhoArgs) -> Result<WhoMode<'a>> {
|
||||||
|
@@
|
||||||
|
- if let Some(p) = &args.path {
|
||||||
|
- return Ok(WhoMode::Expert { path: p });
|
||||||
|
+ if let Some(p) = &args.path {
|
||||||
|
+ let norm = normalize_repo_path(p);
|
||||||
|
+ return Ok(WhoMode::Expert { path: Box::leak(norm.into_boxed_str()) });
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
- if let Some(path) = &args.overlap {
|
||||||
|
- return Ok(WhoMode::Overlap { path });
|
||||||
|
+ if let Some(path) = &args.overlap {
|
||||||
|
+ let norm = normalize_repo_path(path);
|
||||||
|
+ return Ok(WhoMode::Overlap { path: Box::leak(norm.into_boxed_str()) });
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
- if target.contains('/') {
|
||||||
|
- return Ok(WhoMode::Expert { path: target });
|
||||||
|
+ if target.contains('/') {
|
||||||
|
+ let norm = normalize_repo_path(target);
|
||||||
|
+ return Ok(WhoMode::Expert { path: Box::leak(norm.into_boxed_str()) });
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
}
|
||||||
|
+
|
||||||
|
+/// Normalize user-supplied repo paths to match stored DiffNote paths.
|
||||||
|
+/// - trims whitespace
|
||||||
|
+/// - strips leading "./" and "/" (repo-relative)
|
||||||
|
+/// - converts '\' to '/' (Windows paste)
|
||||||
|
+/// - collapses repeated slashes
|
||||||
|
+fn normalize_repo_path(input: &str) -> String {
|
||||||
|
+ let mut s = input.trim().to_string();
|
||||||
|
+ if s.contains('\\') && !s.contains('/') {
|
||||||
|
+ s = s.replace('\\', "/");
|
||||||
|
+ }
|
||||||
|
+ while s.starts_with("./") {
|
||||||
|
+ s = s.trim_start_matches("./").to_string();
|
||||||
|
+ }
|
||||||
|
+ while s.starts_with('/') {
|
||||||
|
+ s = s.trim_start_matches('/').to_string();
|
||||||
|
+ }
|
||||||
|
+ while s.contains("//") {
|
||||||
|
+ s = s.replace("//", "/");
|
||||||
|
+ }
|
||||||
|
+ s
|
||||||
|
+}
|
||||||
|
|
||||||
|
|
||||||
|
(Add a small test block for normalization; even 2–3 asserts catch regressions.)
|
||||||
|
|
||||||
|
4) Make path matching observable: include path_match (exact vs prefix) in results/JSON
|
||||||
|
Why this is better
|
||||||
|
|
||||||
|
You’ve made path classification smarter (heuristics + two-way probe). That’s great, but without visibility you’ll get “why did it treat this as a directory?” confusion. Exposing match metadata is low cost and hugely helps debugging.
|
||||||
|
|
||||||
|
Patch
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
-struct PathQuery {
|
||||||
|
- /// The parameter value to bind.
|
||||||
|
- value: String,
|
||||||
|
- /// If true: use `LIKE value ESCAPE '\'`. If false: use `= value`.
|
||||||
|
- is_prefix: bool,
|
||||||
|
-}
|
||||||
|
+struct PathQuery {
|
||||||
|
+ /// User input after normalization (no trailing slash stripping yet).
|
||||||
|
+ input: String,
|
||||||
|
+ /// Trimmed path without trailing '/' used for exact/prefix construction.
|
||||||
|
+ normalized: String,
|
||||||
|
+ /// The SQL parameter bound to the statement (`foo/bar` or `foo/bar/%`).
|
||||||
|
+ sql_value: String,
|
||||||
|
+ /// If true: use `LIKE sql_value ESCAPE '\'`. If false: use `= normalized`.
|
||||||
|
+ is_prefix: bool,
|
||||||
|
+}
|
||||||
|
@@
|
||||||
|
- let trimmed = path.trim_end_matches('/');
|
||||||
|
+ let input = normalize_repo_path(path);
|
||||||
|
+ let trimmed = input.trim_end_matches('/').to_string();
|
||||||
|
@@
|
||||||
|
- Ok(PathQuery {
|
||||||
|
- value: trimmed.to_string(),
|
||||||
|
- is_prefix: false,
|
||||||
|
- })
|
||||||
|
+ Ok(PathQuery { input, normalized: trimmed.clone(), sql_value: trimmed, is_prefix: false })
|
||||||
|
} else {
|
||||||
|
- Ok(PathQuery {
|
||||||
|
- value: format!("{escaped}/%"),
|
||||||
|
- is_prefix: true,
|
||||||
|
- })
|
||||||
|
+ Ok(PathQuery { input, normalized: trimmed.clone(), sql_value: format!("{escaped}/%"), is_prefix: true })
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
pub struct ExpertResult {
|
||||||
|
pub path_query: String,
|
||||||
|
+ pub path_match: String, // "exact" or "prefix"
|
||||||
|
pub experts: Vec<Expert>,
|
||||||
|
pub truncated: bool,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
pub struct OverlapResult {
|
||||||
|
pub path_query: String,
|
||||||
|
+ pub path_match: String, // "exact" or "prefix"
|
||||||
|
pub users: Vec<OverlapUser>,
|
||||||
|
pub truncated: bool,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
fn query_expert(...) -> Result<ExpertResult> {
|
||||||
|
let pq = build_path_query(conn, path, project_id)?;
|
||||||
|
@@
|
||||||
|
Ok(ExpertResult {
|
||||||
|
path_query: path.to_string(),
|
||||||
|
+ path_match: if pq.is_prefix { "prefix".to_string() } else { "exact".to_string() },
|
||||||
|
experts,
|
||||||
|
truncated,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
fn query_overlap(...) -> Result<OverlapResult> {
|
||||||
|
let pq = build_path_query(conn, path, project_id)?;
|
||||||
|
@@
|
||||||
|
Ok(OverlapResult {
|
||||||
|
path_query: path.to_string(),
|
||||||
|
+ path_match: if pq.is_prefix { "prefix".to_string() } else { "exact".to_string() },
|
||||||
|
users,
|
||||||
|
truncated,
|
||||||
|
})
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
fn expert_to_json(r: &ExpertResult) -> serde_json::Value {
|
||||||
|
serde_json::json!({
|
||||||
|
"path_query": r.path_query,
|
||||||
|
+ "path_match": r.path_match,
|
||||||
|
"truncated": r.truncated,
|
||||||
|
"experts": ...
|
||||||
|
})
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
fn overlap_to_json(r: &OverlapResult) -> serde_json::Value {
|
||||||
|
serde_json::json!({
|
||||||
|
"path_query": r.path_query,
|
||||||
|
+ "path_match": r.path_match,
|
||||||
|
"truncated": r.truncated,
|
||||||
|
"users": ...
|
||||||
|
})
|
||||||
|
}
|
||||||
|
|
||||||
|
|
||||||
|
Human output can add a single dim hint line:
|
||||||
|
|
||||||
|
(matching exact file) or (matching directory prefix)
|
||||||
|
|
||||||
|
5) Put a hard upper bound on --limit at the CLI boundary
|
||||||
|
Why this is better
|
||||||
|
|
||||||
|
You already bounded nested arrays (participants, mr_refs), but top-level lists are still user-unbounded. A single --limit 50000 can:
|
||||||
|
|
||||||
|
generate huge JSON payloads
|
||||||
|
|
||||||
|
blow up downstream agent pipelines
|
||||||
|
|
||||||
|
create slow queries / memory spikes
|
||||||
|
|
||||||
|
Clamp it before execution. A max of 500 is usually plenty; even 200 is fine.
|
||||||
|
|
||||||
|
Patch
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
pub struct WhoArgs {
|
||||||
|
@@
|
||||||
|
- /// Maximum results per section
|
||||||
|
- #[arg(short = 'n', long = "limit", default_value = "20", help_heading = "Output")]
|
||||||
|
+ /// Maximum results per section (bounded for output safety)
|
||||||
|
+ #[arg(
|
||||||
|
+ short = 'n',
|
||||||
|
+ long = "limit",
|
||||||
|
+ default_value = "20",
|
||||||
|
+ value_parser = clap::value_parser!(u16).range(1..=500),
|
||||||
|
+ help_heading = "Output"
|
||||||
|
+ )]
|
||||||
|
pub limit: usize,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
-11. **Bounded payloads.** Robot JSON must never emit unbounded arrays ...
|
||||||
|
+11. **Bounded payloads.** Robot JSON must never emit unbounded arrays ...
|
||||||
|
+ Top-level result set size is also bounded via `--limit` (1..=500) to prevent runaway payloads.
|
||||||
|
|
||||||
|
6) Clarify Active “unresolved count” semantics (window vs total)
|
||||||
|
Why this is better
|
||||||
|
|
||||||
|
total_unresolved currently means “unresolved within the time window”. The human header prints “Active Discussions (X unresolved)” which can easily be misread as “total unresolved overall”.
|
||||||
|
|
||||||
|
Small rename avoids confusion, no new behavior.
|
||||||
|
|
||||||
|
Patch
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
pub struct ActiveResult {
|
||||||
|
pub discussions: Vec<ActiveDiscussion>,
|
||||||
|
- pub total_unresolved: u32,
|
||||||
|
+ pub total_unresolved_in_window: u32,
|
||||||
|
pub truncated: bool,
|
||||||
|
}
|
||||||
|
@@
|
||||||
|
- println!(
|
||||||
|
- "{}",
|
||||||
|
- style(format!(
|
||||||
|
- "Active Discussions ({} unresolved)",
|
||||||
|
- r.total_unresolved
|
||||||
|
- ))
|
||||||
|
- .bold()
|
||||||
|
- );
|
||||||
|
+ println!("{}", style(format!(
|
||||||
|
+ "Active Discussions ({} unresolved in window)",
|
||||||
|
+ r.total_unresolved_in_window
|
||||||
|
+ )).bold());
|
||||||
|
|
||||||
|
|
||||||
|
(If you later want global total, add a second count query—but I’d keep MVP lean.)
|
||||||
|
|
||||||
|
7) Tighten statement cache behavior: avoid preparing both SQL variants when not needed
|
||||||
|
Why this is better
|
||||||
|
|
||||||
|
You already use prepare_cached(), but as you add more “two static variants” (exact/prefix; scoped/unscoped), it’s easy to accidentally prepare multiple statements per invocation.
|
||||||
|
|
||||||
|
Codify: select variant first, then prepare exactly one.
|
||||||
|
|
||||||
|
This is mostly a plan hygiene change (helps future you keep perf predictable).
|
||||||
|
|
||||||
|
Patch (plan-level emphasis)
|
||||||
|
diff
|
||||||
|
Copy code
|
||||||
|
diff --git a/who-command-design.md b/who-command-design.md
|
||||||
|
--- a/who-command-design.md
|
||||||
|
+++ b/who-command-design.md
|
||||||
|
@@
|
||||||
|
-1. **Lean on existing infrastructure.** ...
|
||||||
|
+1. **Lean on existing infrastructure.** ...
|
||||||
|
+ When multiple static SQL variants exist (exact/prefix; scoped/unscoped), always:
|
||||||
|
+ (a) resolve which variant applies, then (b) `prepare_cached()` exactly one statement.
|
||||||
|
|
||||||
|
Net effect (what you gain)
|
||||||
|
|
||||||
|
Correct robot semantics (since_mode) without breaking your static-SQL/agent-first contract.
|
||||||
|
|
||||||
|
Guaranteed intended index usage for Active global vs scoped queries (the nullable-OR planner pitfall is real).
|
||||||
|
|
||||||
|
Fewer “why no results?” surprises via path normalization.
|
||||||
|
|
||||||
|
Better debugging (path match introspection) with essentially no runtime cost.
|
||||||
|
|
||||||
|
Output safety even when users/agents misconfigure --limit.
|
||||||
|
|
||||||
|
Less ambiguous UX around “unresolved” counts.
|
||||||
|
|
||||||
|
If you want a single “most important” change to ship before iteration 8 locks: #2 (Active query variants) and #1 (since semantics) are the two that prevent the most painful, hard-to-diagnose failures.
|
||||||
3251
docs/who-command-design.md
Normal file
3251
docs/who-command-design.md
Normal file
File diff suppressed because it is too large
Load Diff
844
gitlore-sync-explorer.html
Normal file
844
gitlore-sync-explorer.html
Normal file
@@ -0,0 +1,844 @@
|
|||||||
|
<!DOCTYPE html>
|
||||||
|
<html lang="en">
|
||||||
|
<head>
|
||||||
|
<meta charset="UTF-8">
|
||||||
|
<meta name="viewport" content="width=device-width, initial-scale=1.0">
|
||||||
|
<title>Gitlore Sync Pipeline Explorer</title>
|
||||||
|
<style>
|
||||||
|
:root {
|
||||||
|
--bg: #0d1117;
|
||||||
|
--bg-secondary: #161b22;
|
||||||
|
--bg-tertiary: #1c2129;
|
||||||
|
--border: #30363d;
|
||||||
|
--text: #c9d1d9;
|
||||||
|
--text-dim: #8b949e;
|
||||||
|
--text-bright: #f0f6fc;
|
||||||
|
--cyan: #58a6ff;
|
||||||
|
--green: #3fb950;
|
||||||
|
--amber: #d29922;
|
||||||
|
--red: #f85149;
|
||||||
|
--purple: #bc8cff;
|
||||||
|
--pink: #f778ba;
|
||||||
|
--cyan-dim: rgba(88,166,255,0.15);
|
||||||
|
--green-dim: rgba(63,185,80,0.15);
|
||||||
|
--amber-dim: rgba(210,153,34,0.15);
|
||||||
|
--red-dim: rgba(248,81,73,0.15);
|
||||||
|
--purple-dim: rgba(188,140,255,0.15);
|
||||||
|
}
|
||||||
|
* { margin: 0; padding: 0; box-sizing: border-box; }
|
||||||
|
body {
|
||||||
|
font-family: 'SF Mono', 'Cascadia Code', 'Fira Code', 'JetBrains Mono', monospace;
|
||||||
|
background: var(--bg); color: var(--text);
|
||||||
|
display: flex; height: 100vh; overflow: hidden;
|
||||||
|
}
|
||||||
|
.sidebar {
|
||||||
|
width: 220px; min-width: 220px; background: var(--bg-secondary);
|
||||||
|
border-right: 1px solid var(--border); display: flex; flex-direction: column; padding: 16px 0;
|
||||||
|
}
|
||||||
|
.sidebar-title {
|
||||||
|
font-size: 11px; font-weight: 700; text-transform: uppercase;
|
||||||
|
letter-spacing: 1.2px; color: var(--text-dim); padding: 0 16px 12px;
|
||||||
|
}
|
||||||
|
.logo {
|
||||||
|
padding: 0 16px 20px; font-size: 15px; font-weight: 700; color: var(--cyan);
|
||||||
|
display: flex; align-items: center; gap: 8px;
|
||||||
|
}
|
||||||
|
.logo svg { width: 20px; height: 20px; }
|
||||||
|
.nav-item {
|
||||||
|
padding: 10px 16px; cursor: pointer; font-size: 13px; color: var(--text-dim);
|
||||||
|
transition: all 0.15s; border-left: 3px solid transparent;
|
||||||
|
display: flex; align-items: center; gap: 10px;
|
||||||
|
}
|
||||||
|
.nav-item:hover { background: var(--bg-tertiary); color: var(--text); }
|
||||||
|
.nav-item.active { background: var(--cyan-dim); color: var(--cyan); border-left-color: var(--cyan); }
|
||||||
|
.nav-dot { width: 8px; height: 8px; border-radius: 50%; flex-shrink: 0; }
|
||||||
|
.main { flex: 1; display: flex; flex-direction: column; overflow: hidden; }
|
||||||
|
.header {
|
||||||
|
padding: 16px 24px; border-bottom: 1px solid var(--border);
|
||||||
|
display: flex; align-items: center; justify-content: space-between;
|
||||||
|
}
|
||||||
|
.header h1 { font-size: 16px; font-weight: 600; color: var(--text-bright); }
|
||||||
|
.header-badge {
|
||||||
|
font-size: 11px; padding: 3px 10px; border-radius: 12px;
|
||||||
|
background: var(--cyan-dim); color: var(--cyan);
|
||||||
|
}
|
||||||
|
.canvas-wrapper { flex: 1; overflow: auto; position: relative; }
|
||||||
|
.canvas { padding: 32px; min-height: 100%; }
|
||||||
|
.flow-container { display: none; }
|
||||||
|
.flow-container.active { display: block; }
|
||||||
|
.phase { margin-bottom: 32px; }
|
||||||
|
.phase-header { display: flex; align-items: center; gap: 12px; margin-bottom: 16px; }
|
||||||
|
.phase-number {
|
||||||
|
width: 28px; height: 28px; border-radius: 50%; display: flex; align-items: center;
|
||||||
|
justify-content: center; font-size: 13px; font-weight: 700; flex-shrink: 0;
|
||||||
|
}
|
||||||
|
.phase-title { font-size: 14px; font-weight: 600; color: var(--text-bright); }
|
||||||
|
.phase-subtitle { font-size: 11px; color: var(--text-dim); margin-left: 4px; font-weight: 400; }
|
||||||
|
.flow-row {
|
||||||
|
display: flex; align-items: stretch; gap: 0; flex-wrap: wrap;
|
||||||
|
margin-left: 14px; padding-left: 26px; border-left: 2px solid var(--border);
|
||||||
|
}
|
||||||
|
.flow-row:last-child { border-left-color: transparent; }
|
||||||
|
.node {
|
||||||
|
position: relative; padding: 12px 16px; border-radius: 8px;
|
||||||
|
border: 1px solid var(--border); background: var(--bg-secondary);
|
||||||
|
font-size: 12px; cursor: pointer; transition: all 0.2s;
|
||||||
|
min-width: 180px; max-width: 260px; margin: 4px 0;
|
||||||
|
}
|
||||||
|
.node:hover {
|
||||||
|
border-color: var(--cyan); transform: translateY(-1px);
|
||||||
|
box-shadow: 0 4px 12px rgba(0,0,0,0.3);
|
||||||
|
}
|
||||||
|
.node.selected {
|
||||||
|
border-color: var(--cyan);
|
||||||
|
box-shadow: 0 0 0 1px var(--cyan), 0 4px 16px rgba(88,166,255,0.15);
|
||||||
|
}
|
||||||
|
.node-title { font-weight: 600; font-size: 12px; margin-bottom: 4px; color: var(--text-bright); }
|
||||||
|
.node-desc { font-size: 11px; color: var(--text-dim); line-height: 1.5; }
|
||||||
|
.node.api { border-left: 3px solid var(--cyan); }
|
||||||
|
.node.transform { border-left: 3px solid var(--purple); }
|
||||||
|
.node.db { border-left: 3px solid var(--green); }
|
||||||
|
.node.decision { border-left: 3px solid var(--amber); }
|
||||||
|
.node.error { border-left: 3px solid var(--red); }
|
||||||
|
.node.queue { border-left: 3px solid var(--pink); }
|
||||||
|
.arrow {
|
||||||
|
display: flex; align-items: center; padding: 0 6px;
|
||||||
|
color: var(--text-dim); font-size: 16px; flex-shrink: 0;
|
||||||
|
}
|
||||||
|
.arrow-down {
|
||||||
|
display: flex; justify-content: center; padding: 4px 0;
|
||||||
|
color: var(--text-dim); font-size: 16px; margin-left: 14px;
|
||||||
|
padding-left: 26px; border-left: 2px solid var(--border);
|
||||||
|
}
|
||||||
|
.branch-container {
|
||||||
|
margin-left: 14px; padding-left: 26px;
|
||||||
|
border-left: 2px solid var(--border); padding-bottom: 8px;
|
||||||
|
}
|
||||||
|
.branch-row { display: flex; gap: 12px; margin: 8px 0; flex-wrap: wrap; }
|
||||||
|
.branch-label {
|
||||||
|
font-size: 11px; font-weight: 600; margin: 8px 0 4px;
|
||||||
|
display: flex; align-items: center; gap: 6px;
|
||||||
|
}
|
||||||
|
.branch-label.success { color: var(--green); }
|
||||||
|
.branch-label.error { color: var(--red); }
|
||||||
|
.branch-label.retry { color: var(--amber); }
|
||||||
|
.diff-badge {
|
||||||
|
display: inline-block; font-size: 10px; padding: 2px 6px;
|
||||||
|
border-radius: 4px; margin-top: 6px; font-weight: 600;
|
||||||
|
}
|
||||||
|
.diff-badge.changed { background: var(--amber-dim); color: var(--amber); }
|
||||||
|
.diff-badge.same { background: var(--green-dim); color: var(--green); }
|
||||||
|
.detail-panel {
|
||||||
|
position: fixed; right: 0; top: 0; bottom: 0; width: 380px;
|
||||||
|
background: var(--bg-secondary); border-left: 1px solid var(--border);
|
||||||
|
transform: translateX(100%); transition: transform 0.25s ease;
|
||||||
|
z-index: 100; display: flex; flex-direction: column; overflow: hidden;
|
||||||
|
}
|
||||||
|
.detail-panel.open { transform: translateX(0); }
|
||||||
|
.detail-header {
|
||||||
|
padding: 16px 20px; border-bottom: 1px solid var(--border);
|
||||||
|
display: flex; align-items: center; justify-content: space-between;
|
||||||
|
}
|
||||||
|
.detail-header h2 { font-size: 14px; font-weight: 600; color: var(--text-bright); }
|
||||||
|
.detail-close {
|
||||||
|
cursor: pointer; color: var(--text-dim); font-size: 18px;
|
||||||
|
background: none; border: none; padding: 4px 8px; border-radius: 4px;
|
||||||
|
}
|
||||||
|
.detail-close:hover { background: var(--bg-tertiary); color: var(--text); }
|
||||||
|
.detail-body { flex: 1; overflow-y: auto; padding: 20px; }
|
||||||
|
.detail-section { margin-bottom: 20px; }
|
||||||
|
.detail-section h3 {
|
||||||
|
font-size: 11px; text-transform: uppercase; letter-spacing: 0.8px;
|
||||||
|
color: var(--text-dim); margin-bottom: 8px;
|
||||||
|
}
|
||||||
|
.detail-section p { font-size: 12px; line-height: 1.7; color: var(--text); }
|
||||||
|
.sql-block {
|
||||||
|
background: var(--bg); border: 1px solid var(--border); border-radius: 6px;
|
||||||
|
padding: 12px; font-size: 11px; line-height: 1.6; color: var(--green);
|
||||||
|
overflow-x: auto; white-space: pre; margin-top: 8px;
|
||||||
|
}
|
||||||
|
.detail-tag {
|
||||||
|
display: inline-block; font-size: 10px; padding: 2px 8px;
|
||||||
|
border-radius: 10px; margin: 2px 4px 2px 0;
|
||||||
|
}
|
||||||
|
.detail-tag.file { background: var(--purple-dim); color: var(--purple); }
|
||||||
|
.detail-tag.type-api { background: var(--cyan-dim); color: var(--cyan); }
|
||||||
|
.detail-tag.type-db { background: var(--green-dim); color: var(--green); }
|
||||||
|
.detail-tag.type-transform { background: var(--purple-dim); color: var(--purple); }
|
||||||
|
.detail-tag.type-decision { background: var(--amber-dim); color: var(--amber); }
|
||||||
|
.detail-tag.type-error { background: var(--red-dim); color: var(--red); }
|
||||||
|
.detail-tag.type-queue { background: rgba(247,120,186,0.15); color: var(--pink); }
|
||||||
|
.watermark-panel { border-top: 1px solid var(--border); background: var(--bg-secondary); }
|
||||||
|
.watermark-toggle {
|
||||||
|
padding: 10px 24px; cursor: pointer; font-size: 12px; color: var(--text-dim);
|
||||||
|
display: flex; align-items: center; gap: 8px; user-select: none;
|
||||||
|
}
|
||||||
|
.watermark-toggle:hover { color: var(--text); }
|
||||||
|
.watermark-toggle .chevron { transition: transform 0.2s; font-size: 10px; }
|
||||||
|
.watermark-toggle .chevron.open { transform: rotate(180deg); }
|
||||||
|
.watermark-content { display: none; padding: 0 24px 16px; max-height: 260px; overflow-y: auto; }
|
||||||
|
.watermark-content.open { display: block; }
|
||||||
|
.wm-table { width: 100%; border-collapse: collapse; font-size: 11px; }
|
||||||
|
.wm-table th {
|
||||||
|
text-align: left; padding: 6px 12px; color: var(--text-dim); font-weight: 600;
|
||||||
|
border-bottom: 1px solid var(--border); font-size: 10px;
|
||||||
|
text-transform: uppercase; letter-spacing: 0.5px;
|
||||||
|
}
|
||||||
|
.wm-table td { padding: 6px 12px; border-bottom: 1px solid var(--border); color: var(--text); }
|
||||||
|
.wm-table td:first-child { color: var(--cyan); font-weight: 600; }
|
||||||
|
.wm-table td:nth-child(2) { color: var(--green); }
|
||||||
|
.overview-pipeline { display: flex; gap: 0; align-items: stretch; margin: 24px 0; flex-wrap: wrap; }
|
||||||
|
.overview-stage {
|
||||||
|
flex: 1; min-width: 200px; background: var(--bg-secondary);
|
||||||
|
border: 1px solid var(--border); border-radius: 10px; padding: 20px;
|
||||||
|
cursor: pointer; transition: all 0.2s;
|
||||||
|
}
|
||||||
|
.overview-stage:hover {
|
||||||
|
border-color: var(--cyan); transform: translateY(-2px);
|
||||||
|
box-shadow: 0 6px 20px rgba(0,0,0,0.3);
|
||||||
|
}
|
||||||
|
.overview-arrow { display: flex; align-items: center; padding: 0 8px; font-size: 20px; color: var(--text-dim); }
|
||||||
|
.stage-num { font-size: 10px; font-weight: 700; text-transform: uppercase; letter-spacing: 1px; margin-bottom: 8px; }
|
||||||
|
.stage-title { font-size: 15px; font-weight: 700; color: var(--text-bright); margin-bottom: 6px; }
|
||||||
|
.stage-desc { font-size: 11px; color: var(--text-dim); line-height: 1.6; }
|
||||||
|
.stage-detail {
|
||||||
|
margin-top: 12px; padding-top: 12px; border-top: 1px solid var(--border);
|
||||||
|
font-size: 11px; color: var(--text-dim); line-height: 1.6;
|
||||||
|
}
|
||||||
|
.stage-detail code {
|
||||||
|
color: var(--amber); background: var(--amber-dim); padding: 1px 5px;
|
||||||
|
border-radius: 3px; font-size: 10px;
|
||||||
|
}
|
||||||
|
.info-box {
|
||||||
|
background: var(--bg-tertiary); border: 1px solid var(--border);
|
||||||
|
border-radius: 8px; padding: 16px; margin: 16px 0; font-size: 12px; line-height: 1.7;
|
||||||
|
}
|
||||||
|
.info-box-title { font-weight: 600; color: var(--cyan); margin-bottom: 6px; display: flex; align-items: center; gap: 6px; }
|
||||||
|
.info-box ul { margin-left: 16px; color: var(--text-dim); }
|
||||||
|
.info-box li { margin: 4px 0; }
|
||||||
|
.info-box code {
|
||||||
|
color: var(--amber); background: var(--amber-dim);
|
||||||
|
padding: 1px 5px; border-radius: 3px; font-size: 11px;
|
||||||
|
}
|
||||||
|
.legend {
|
||||||
|
display: flex; gap: 16px; flex-wrap: wrap; margin-bottom: 24px;
|
||||||
|
padding: 12px 16px; background: var(--bg-secondary);
|
||||||
|
border: 1px solid var(--border); border-radius: 8px;
|
||||||
|
}
|
||||||
|
.legend-item { display: flex; align-items: center; gap: 6px; font-size: 11px; color: var(--text-dim); }
|
||||||
|
.legend-color { width: 12px; height: 3px; border-radius: 2px; }
|
||||||
|
::-webkit-scrollbar { width: 8px; height: 8px; }
|
||||||
|
::-webkit-scrollbar-track { background: transparent; }
|
||||||
|
::-webkit-scrollbar-thumb { background: var(--border); border-radius: 4px; }
|
||||||
|
::-webkit-scrollbar-thumb:hover { background: var(--text-dim); }
|
||||||
|
</style>
|
||||||
|
</head>
|
||||||
|
<body>
|
||||||
|
|
||||||
|
<div class="sidebar">
|
||||||
|
<div class="logo">
|
||||||
|
<svg viewBox="0 0 20 20" fill="none" stroke="currentColor" stroke-width="1.5">
|
||||||
|
<circle cx="10" cy="10" r="8"/><path d="M10 6v4l3 2"/>
|
||||||
|
</svg>
|
||||||
|
lore sync
|
||||||
|
</div>
|
||||||
|
<div class="sidebar-title">Entity Flows</div>
|
||||||
|
<div class="nav-item active" data-view="overview" onclick="switchView('overview')">
|
||||||
|
<div class="nav-dot" style="background:var(--cyan)"></div>Full Sync Overview
|
||||||
|
</div>
|
||||||
|
<div class="nav-item" data-view="issues" onclick="switchView('issues')">
|
||||||
|
<div class="nav-dot" style="background:var(--green)"></div>Issues
|
||||||
|
</div>
|
||||||
|
<div class="nav-item" data-view="mrs" onclick="switchView('mrs')">
|
||||||
|
<div class="nav-dot" style="background:var(--purple)"></div>Merge Requests
|
||||||
|
</div>
|
||||||
|
<div class="nav-item" data-view="docs" onclick="switchView('docs')">
|
||||||
|
<div class="nav-dot" style="background:var(--amber)"></div>Documents
|
||||||
|
</div>
|
||||||
|
<div class="nav-item" data-view="embed" onclick="switchView('embed')">
|
||||||
|
<div class="nav-dot" style="background:var(--pink)"></div>Embeddings
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="main">
|
||||||
|
<div class="header">
|
||||||
|
<h1 id="view-title">Full Sync Overview</h1>
|
||||||
|
<span class="header-badge" id="view-badge">4 stages</span>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<div class="canvas-wrapper"><div class="canvas">
|
||||||
|
|
||||||
|
<!-- OVERVIEW -->
|
||||||
|
<div class="flow-container active" id="view-overview">
|
||||||
|
<div class="overview-pipeline">
|
||||||
|
<div class="overview-stage" onclick="switchView('issues')">
|
||||||
|
<div class="stage-num" style="color:var(--green)">Stage 1</div>
|
||||||
|
<div class="stage-title">Ingest Issues</div>
|
||||||
|
<div class="stage-desc">Fetch issues + discussions + resource events from GitLab API</div>
|
||||||
|
<div class="stage-detail">Cursor-based incremental sync.<br>Sequential discussion fetch.<br>Queue-based resource events.</div>
|
||||||
|
</div>
|
||||||
|
<div class="overview-arrow">→</div>
|
||||||
|
<div class="overview-stage" onclick="switchView('mrs')">
|
||||||
|
<div class="stage-num" style="color:var(--purple)">Stage 2</div>
|
||||||
|
<div class="stage-title">Ingest MRs</div>
|
||||||
|
<div class="stage-desc">Fetch merge requests + discussions + resource events</div>
|
||||||
|
<div class="stage-detail">Page-based incremental sync.<br>Parallel prefetch discussions.<br>Queue-based resource events.</div>
|
||||||
|
</div>
|
||||||
|
<div class="overview-arrow">→</div>
|
||||||
|
<div class="overview-stage" onclick="switchView('docs')">
|
||||||
|
<div class="stage-num" style="color:var(--amber)">Stage 3</div>
|
||||||
|
<div class="stage-title">Generate Docs</div>
|
||||||
|
<div class="stage-desc">Regenerate searchable documents for changed entities</div>
|
||||||
|
<div class="stage-detail">Driven by <code>dirty_sources</code> table.<br>Triple-hash skip optimization.<br>FTS5 index auto-updated.</div>
|
||||||
|
</div>
|
||||||
|
<div class="overview-arrow">→</div>
|
||||||
|
<div class="overview-stage" onclick="switchView('embed')">
|
||||||
|
<div class="stage-num" style="color:var(--pink)">Stage 4</div>
|
||||||
|
<div class="stage-title">Embed</div>
|
||||||
|
<div class="stage-desc">Generate vector embeddings via Ollama for semantic search</div>
|
||||||
|
<div class="stage-detail">Hash-based change detection.<br>Chunked, batched API calls.<br><b>Non-fatal</b> — graceful if Ollama down.</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="info-box">
|
||||||
|
<div class="info-box-title">Concurrency Model</div>
|
||||||
|
<ul>
|
||||||
|
<li>Stages 1 & 2 process <b>projects concurrently</b> via <code>buffer_unordered(primary_concurrency)</code></li>
|
||||||
|
<li>Each project gets its own <b>SQLite connection</b>; rate limiter is <b>shared</b></li>
|
||||||
|
<li>Discussions: <b>sequential</b> (issues) or <b>batched parallel prefetch</b> (MRs)</li>
|
||||||
|
<li>Resource events use a <b>persistent job queue</b> with atomic claim + exponential backoff</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<div class="info-box">
|
||||||
|
<div class="info-box-title">Sync Flags</div>
|
||||||
|
<ul>
|
||||||
|
<li><code>--full</code> — Resets all cursors & watermarks, forces complete re-fetch</li>
|
||||||
|
<li><code>--no-docs</code> — Skips Stage 3 (document generation)</li>
|
||||||
|
<li><code>--no-embed</code> — Skips Stage 4 (embedding generation)</li>
|
||||||
|
<li><code>--force</code> — Overrides stale single-flight lock</li>
|
||||||
|
<li><code>--project <path></code> — Sync only one project (fuzzy matching)</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<div class="info-box">
|
||||||
|
<div class="info-box-title">Single-Flight Lock</div>
|
||||||
|
<ul>
|
||||||
|
<li>Table-based lock (<code>AppLock</code>) prevents concurrent syncs</li>
|
||||||
|
<li>Heartbeat keeps the lock alive; stale locks auto-detected</li>
|
||||||
|
<li>Use <code>--force</code> to override a stale lock</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- ISSUES -->
|
||||||
|
<div class="flow-container" id="view-issues">
|
||||||
|
<div class="legend">
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API Call</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Transform</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error Path</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--pink)"></div>Queue</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
|
||||||
|
<div class="phase-title">Fetch Issues <span class="phase-subtitle">Cursor-Based Incremental Sync</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node api" data-detail="issue-api-call"><div class="node-title">GitLab API Call</div><div class="node-desc">paginate_issues() with<br>updated_after = cursor - rewind</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node decision" data-detail="issue-cursor-filter"><div class="node-title">Cursor Filter</div><div class="node-desc">updated_at > cursor_ts<br>OR tie_breaker check</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node transform" data-detail="issue-transform"><div class="node-title">transform_issue()</div><div class="node-desc">GitLab API shape →<br>local DB row shape</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="issue-transaction"><div class="node-title">Transaction</div><div class="node-desc">store_payload → upsert →<br>mark_dirty → relink</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="arrow-down">↓</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node db" data-detail="issue-cursor-update"><div class="node-title">Update Cursor</div><div class="node-desc">Every 100 issues + final<br>sync_cursors table</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--green-dim);color:var(--green)">2</div>
|
||||||
|
<div class="phase-title">Discussion Sync <span class="phase-subtitle">Sequential, Watermark-Based</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node db" data-detail="issue-disc-query"><div class="node-title">Query Stale Issues</div><div class="node-desc">updated_at > COALESCE(<br>discussions_synced_for_<br>updated_at, 0)</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node api" data-detail="issue-disc-fetch"><div class="node-title">Paginate Discussions</div><div class="node-desc">Sequential per issue<br>paginate_issue_discussions()</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node transform" data-detail="issue-disc-transform"><div class="node-title">Transform</div><div class="node-desc">transform_discussion()<br>transform_notes()</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="issue-disc-write"><div class="node-title">Write Discussion</div><div class="node-desc">store_payload → upsert<br>DELETE notes → INSERT notes</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-container">
|
||||||
|
<div class="branch-label success">✓ On Success (all pages fetched)</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node db" data-detail="issue-disc-stale"><div class="node-title">Remove Stale</div><div class="node-desc">DELETE discussions not<br>seen in this fetch</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="issue-disc-watermark"><div class="node-title">Advance Watermark</div><div class="node-desc">discussions_synced_for_<br>updated_at = updated_at</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label error">✗ On Pagination Error</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node error" data-detail="issue-disc-fail"><div class="node-title">Skip Stale Removal</div><div class="node-desc">Watermark NOT advanced<br>Will retry next sync</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:rgba(247,120,186,0.15);color:var(--pink)">3</div>
|
||||||
|
<div class="phase-title">Resource Events <span class="phase-subtitle">Queue-Based, Concurrent Fetch</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node queue" data-detail="re-cleanup"><div class="node-title">Cleanup Obsolete</div><div class="node-desc">DELETE jobs where entity<br>watermark is current</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node queue" data-detail="re-enqueue"><div class="node-title">Enqueue Jobs</div><div class="node-desc">INSERT for entities where<br>updated_at > watermark</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node queue" data-detail="re-claim"><div class="node-title">Claim Jobs</div><div class="node-desc">Atomic UPDATE...RETURNING<br>with lock acquisition</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node api" data-detail="re-fetch"><div class="node-title">Fetch Events</div><div class="node-desc">3 concurrent: state +<br>label + milestone</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-container">
|
||||||
|
<div class="branch-label success">✓ On Success</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node db" data-detail="re-store"><div class="node-title">Store Events</div><div class="node-desc">Transaction: upsert all<br>3 event types</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="re-complete"><div class="node-title">Complete + Watermark</div><div class="node-desc">DELETE job row<br>Advance watermark</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label error">✗ Permanent Error (404 / 403)</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node error" data-detail="re-permanent"><div class="node-title">Skip Permanently</div><div class="node-desc">complete_job + advance<br>watermark (coalesced)</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label retry">↻ Transient Error</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node error" data-detail="re-transient"><div class="node-title">Backoff Retry</div><div class="node-desc">fail_job: 30s x 2^(n-1)<br>capped at 480s</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- MERGE REQUESTS -->
|
||||||
|
<div class="flow-container" id="view-mrs">
|
||||||
|
<div class="legend">
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API Call</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Transform</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Diff from Issues</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error Path</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--pink)"></div>Queue</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
|
||||||
|
<div class="phase-title">Fetch MRs <span class="phase-subtitle">Page-Based Incremental Sync</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node api" data-detail="mr-api-call"><div class="node-title">GitLab API Call</div><div class="node-desc">fetch_merge_requests_page()<br>with cursor rewind</div><div class="diff-badge changed">Page-based, not streaming</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node decision" data-detail="mr-cursor-filter"><div class="node-title">Cursor Filter</div><div class="node-desc">Same logic as issues:<br>timestamp + tie-breaker</div><div class="diff-badge same">Same as issues</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node transform" data-detail="mr-transform"><div class="node-title">transform_merge_request()</div><div class="node-desc">Maps API shape →<br>local DB row</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="mr-transaction"><div class="node-title">Transaction</div><div class="node-desc">store → upsert → dirty →<br>labels + assignees + reviewers</div><div class="diff-badge changed">3 junction tables (not 2)</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="arrow-down">↓</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node db" data-detail="mr-cursor-update"><div class="node-title">Update Cursor</div><div class="node-desc">Per page (not every 100)</div><div class="diff-badge changed">Per page boundary</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--green-dim);color:var(--green)">2</div>
|
||||||
|
<div class="phase-title">MR Discussion Sync <span class="phase-subtitle">Parallel Prefetch + Serial Write</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="info-box" style="margin-left:40px;margin-bottom:16px;">
|
||||||
|
<div class="info-box-title">Key Differences from Issue Discussions</div>
|
||||||
|
<ul>
|
||||||
|
<li><b>Parallel prefetch</b> — fetches all discussions for a batch concurrently via <code>join_all()</code></li>
|
||||||
|
<li><b>Upsert pattern</b> — notes use INSERT...ON CONFLICT (not delete-all + re-insert)</li>
|
||||||
|
<li><b>Sweep stale</b> — uses <code>last_seen_at</code> timestamp comparison (not set difference)</li>
|
||||||
|
<li><b>Sync health tracking</b> — records <code>discussions_sync_attempts</code> and <code>last_error</code></li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node db" data-detail="mr-disc-query"><div class="node-title">Query Stale MRs</div><div class="node-desc">updated_at > COALESCE(<br>discussions_synced_for_<br>updated_at, 0)</div><div class="diff-badge same">Same watermark logic</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node decision" data-detail="mr-disc-batch"><div class="node-title">Batch by Concurrency</div><div class="node-desc">dependent_concurrency<br>MRs per batch</div><div class="diff-badge changed">Batched processing</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="arrow-down">↓</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node api" data-detail="mr-disc-prefetch"><div class="node-title">Parallel Prefetch</div><div class="node-desc">join_all() fetches all<br>discussions for batch</div><div class="diff-badge changed">Parallel (not sequential)</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node transform" data-detail="mr-disc-transform"><div class="node-title">Transform In-Memory</div><div class="node-desc">transform_mr_discussion()<br>+ diff position notes</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="mr-disc-write"><div class="node-title">Serial Write</div><div class="node-desc">upsert discussion<br>upsert notes (ON CONFLICT)</div><div class="diff-badge changed">Upsert, not delete+insert</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-container">
|
||||||
|
<div class="branch-label success">✓ On Full Success</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node db" data-detail="mr-disc-sweep"><div class="node-title">Sweep Stale</div><div class="node-desc">DELETE WHERE last_seen_at<br>< run_seen_at (disc + notes)</div><div class="diff-badge changed">last_seen_at sweep</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="mr-disc-watermark"><div class="node-title">Advance Watermark</div><div class="node-desc">discussions_synced_for_<br>updated_at = updated_at</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label error">✗ On Failure</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node error" data-detail="mr-disc-fail"><div class="node-title">Record Sync Health</div><div class="node-desc">Watermark NOT advanced<br>Tracks attempts + last_error</div><div class="diff-badge changed">Health tracking</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:rgba(247,120,186,0.15);color:var(--pink)">3</div>
|
||||||
|
<div class="phase-title">Resource Events <span class="phase-subtitle">Same as Issues</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="info-box" style="margin-left:40px">
|
||||||
|
<div class="info-box-title">Identical to Issue Resource Events</div>
|
||||||
|
<ul>
|
||||||
|
<li>Same queue-based approach: cleanup → enqueue → claim → fetch → store/fail</li>
|
||||||
|
<li>Same watermark column: <code>resource_events_synced_for_updated_at</code></li>
|
||||||
|
<li>Same error handling: 404/403 coalesced to empty, transient errors get backoff</li>
|
||||||
|
<li>entity_type = <code>"merge_request"</code> instead of <code>"issue"</code></li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- DOCUMENTS -->
|
||||||
|
<div class="flow-container" id="view-docs">
|
||||||
|
<div class="legend">
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>Trigger</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Extract</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">1</div>
|
||||||
|
<div class="phase-title">Dirty Source Queue <span class="phase-subtitle">Populated During Ingestion</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node api" data-detail="doc-trigger"><div class="node-title">mark_dirty_tx()</div><div class="node-desc">Called during every issue/<br>MR/discussion upsert</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="doc-dirty-table"><div class="node-title">dirty_sources Table</div><div class="node-desc">INSERT (source_type, source_id)<br>ON CONFLICT reset backoff</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--amber-dim);color:var(--amber)">2</div>
|
||||||
|
<div class="phase-title">Drain Loop <span class="phase-subtitle">Batch 500, Respects Backoff</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node db" data-detail="doc-drain"><div class="node-title">Get Dirty Sources</div><div class="node-desc">Batch 500, ORDER BY<br>attempt_count, queued_at</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node decision" data-detail="doc-dispatch"><div class="node-title">Dispatch by Type</div><div class="node-desc">issue / mr / discussion<br>→ extract function</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node decision" data-detail="doc-deleted-check"><div class="node-title">Source Exists?</div><div class="node-desc">If deleted: remove doc row<br>(cascade cleans FTS + embeds)</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="arrow-down">↓</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node transform" data-detail="doc-extract"><div class="node-title">Extract Content</div><div class="node-desc">Structured text:<br>header + metadata + body</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node decision" data-detail="doc-triple-hash"><div class="node-title">Triple-Hash Check</div><div class="node-desc">content_hash + labels_hash<br>+ paths_hash all match?</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="doc-write"><div class="node-title">SAVEPOINT Write</div><div class="node-desc">Atomic: document row +<br>labels + paths</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-container">
|
||||||
|
<div class="branch-label success">✓ On Success</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node db" data-detail="doc-clear"><div class="node-title">clear_dirty()</div><div class="node-desc">Remove from dirty_sources</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label error">✗ On Error</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node error" data-detail="doc-error"><div class="node-title">record_dirty_error()</div><div class="node-desc">Increment attempt_count<br>Exponential backoff</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label" style="color:var(--purple)">≡ Triple-Hash Match (skip)</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node db" data-detail="doc-skip"><div class="node-title">Skip Write</div><div class="node-desc">All 3 hashes match →<br>no WAL churn, clear dirty</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="info-box">
|
||||||
|
<div class="info-box-title">Full Mode (<code>--full</code>)</div>
|
||||||
|
<ul>
|
||||||
|
<li>Seeds <b>ALL</b> entities into <code>dirty_sources</code> via keyset pagination</li>
|
||||||
|
<li>Triple-hash optimization prevents redundant writes even in full mode</li>
|
||||||
|
<li>Runs FTS <code>OPTIMIZE</code> after drain completes</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- EMBEDDINGS -->
|
||||||
|
<div class="flow-container" id="view-embed">
|
||||||
|
<div class="legend">
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--cyan)"></div>API (Ollama)</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--purple)"></div>Processing</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--green)"></div>Database</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--amber)"></div>Decision</div>
|
||||||
|
<div class="legend-item"><div class="legend-color" style="background:var(--red)"></div>Error</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--amber-dim);color:var(--amber)">1</div>
|
||||||
|
<div class="phase-title">Change Detection <span class="phase-subtitle">Hash + Config Drift</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node decision" data-detail="embed-detect"><div class="node-title">find_pending_documents()</div><div class="node-desc">No metadata row? OR<br>document_hash mismatch? OR<br>config drift?</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="embed-paginate"><div class="node-title">Keyset Pagination</div><div class="node-desc">500 documents per page<br>ordered by doc ID</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--purple-dim);color:var(--purple)">2</div>
|
||||||
|
<div class="phase-title">Chunking <span class="phase-subtitle">Split + Overflow Guard</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node transform" data-detail="embed-chunk"><div class="node-title">split_into_chunks()</div><div class="node-desc">Split by paragraph boundaries<br>with configurable overlap</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node decision" data-detail="embed-overflow"><div class="node-title">Overflow Guard</div><div class="node-desc">Too many chunks?<br>Skip to prevent rowid collision</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node transform" data-detail="embed-work"><div class="node-title">Build ChunkWork</div><div class="node-desc">Assign encoded chunk IDs<br>per document</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="phase">
|
||||||
|
<div class="phase-header">
|
||||||
|
<div class="phase-number" style="background:var(--cyan-dim);color:var(--cyan)">3</div>
|
||||||
|
<div class="phase-title">Ollama Embedding <span class="phase-subtitle">Batched API Calls</span></div>
|
||||||
|
</div>
|
||||||
|
<div class="flow-row">
|
||||||
|
<div class="node api" data-detail="embed-batch"><div class="node-title">Batch Embed</div><div class="node-desc">32 chunks per Ollama<br>API call</div></div>
|
||||||
|
<div class="arrow">→</div>
|
||||||
|
<div class="node db" data-detail="embed-store"><div class="node-title">Store Vectors</div><div class="node-desc">sqlite-vec embeddings table<br>+ embedding_metadata</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-container">
|
||||||
|
<div class="branch-label success">✓ On Success</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node db" data-detail="embed-success"><div class="node-title">SAVEPOINT Commit</div><div class="node-desc">Atomic per page:<br>clear old + write new</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label retry">↻ Context-Length Error</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node error" data-detail="embed-ctx-error"><div class="node-title">Retry Individually</div><div class="node-desc">Re-embed each chunk solo<br>to isolate oversized one</div></div>
|
||||||
|
</div>
|
||||||
|
<div class="branch-label error">✗ Other Error</div>
|
||||||
|
<div class="branch-row">
|
||||||
|
<div class="node error" data-detail="embed-other-error"><div class="node-title">Record Error</div><div class="node-desc">Store in embedding_metadata<br>for retry next run</div></div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
<div class="info-box">
|
||||||
|
<div class="info-box-title">Full Mode (<code>--full</code>)</div>
|
||||||
|
<ul>
|
||||||
|
<li>DELETEs all <code>embedding_metadata</code> and <code>embeddings</code> rows first</li>
|
||||||
|
<li>Every document re-processed from scratch</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
<div class="info-box">
|
||||||
|
<div class="info-box-title">Non-Fatal in Sync</div>
|
||||||
|
<ul>
|
||||||
|
<li>Stage 4 failures (Ollama down, model missing) are <b>graceful</b></li>
|
||||||
|
<li>Sync completes successfully; embeddings just won't be updated</li>
|
||||||
|
<li>Semantic search degrades to FTS-only mode</li>
|
||||||
|
</ul>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
</div></div>
|
||||||
|
|
||||||
|
<!-- Watermark Panel -->
|
||||||
|
<div class="watermark-panel">
|
||||||
|
<div class="watermark-toggle" onclick="toggleWatermarks()">
|
||||||
|
<span class="chevron" id="wm-chevron">▲</span>
|
||||||
|
Watermark & Cursor Reference
|
||||||
|
</div>
|
||||||
|
<div class="watermark-content" id="wm-content">
|
||||||
|
<table class="wm-table">
|
||||||
|
<thead><tr><th>Table</th><th>Column(s)</th><th>Purpose</th></tr></thead>
|
||||||
|
<tbody>
|
||||||
|
<tr><td>sync_cursors</td><td>updated_at_cursor + tie_breaker_id</td><td>Incremental fetch: "last entity we saw" per project+type</td></tr>
|
||||||
|
<tr><td>issues</td><td>discussions_synced_for_updated_at</td><td>Per-issue discussion watermark</td></tr>
|
||||||
|
<tr><td>issues</td><td>resource_events_synced_for_updated_at</td><td>Per-issue resource event watermark</td></tr>
|
||||||
|
<tr><td>merge_requests</td><td>discussions_synced_for_updated_at</td><td>Per-MR discussion watermark</td></tr>
|
||||||
|
<tr><td>merge_requests</td><td>resource_events_synced_for_updated_at</td><td>Per-MR resource event watermark</td></tr>
|
||||||
|
<tr><td>dirty_sources</td><td>queued_at + next_attempt_at</td><td>Document regeneration queue with backoff</td></tr>
|
||||||
|
<tr><td>embedding_metadata</td><td>document_hash + chunk_max_bytes + model + dims</td><td>Embedding staleness detection</td></tr>
|
||||||
|
<tr><td>pending_dependent_fetches</td><td>locked_at + next_retry_at + attempts</td><td>Resource event job queue with backoff</td></tr>
|
||||||
|
</tbody>
|
||||||
|
</table>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<!-- Detail Panel -->
|
||||||
|
<div class="detail-panel" id="detail-panel">
|
||||||
|
<div class="detail-header">
|
||||||
|
<h2 id="detail-title">Node Details</h2>
|
||||||
|
<button class="detail-close" onclick="closeDetail()">×</button>
|
||||||
|
</div>
|
||||||
|
<div class="detail-body" id="detail-body"></div>
|
||||||
|
</div>
|
||||||
|
|
||||||
|
<script>
|
||||||
|
const viewTitles = {
|
||||||
|
overview: 'Full Sync Overview', issues: 'Issue Ingestion Flow',
|
||||||
|
mrs: 'Merge Request Ingestion Flow', docs: 'Document Generation Flow',
|
||||||
|
embed: 'Embedding Generation Flow',
|
||||||
|
};
|
||||||
|
const viewBadges = {
|
||||||
|
overview: '4 stages', issues: '3 phases', mrs: '3 phases',
|
||||||
|
docs: '2 phases', embed: '3 phases',
|
||||||
|
};
|
||||||
|
|
||||||
|
function switchView(view) {
|
||||||
|
document.querySelectorAll('.flow-container').forEach(function(el) { el.classList.remove('active'); });
|
||||||
|
document.getElementById('view-' + view).classList.add('active');
|
||||||
|
document.querySelectorAll('.nav-item').forEach(function(el) {
|
||||||
|
el.classList.toggle('active', el.dataset.view === view);
|
||||||
|
});
|
||||||
|
document.getElementById('view-title').textContent = viewTitles[view];
|
||||||
|
document.getElementById('view-badge').textContent = viewBadges[view];
|
||||||
|
closeDetail();
|
||||||
|
}
|
||||||
|
|
||||||
|
function toggleWatermarks() {
|
||||||
|
document.getElementById('wm-content').classList.toggle('open');
|
||||||
|
document.getElementById('wm-chevron').classList.toggle('open');
|
||||||
|
}
|
||||||
|
|
||||||
|
var details = {
|
||||||
|
'issue-api-call': { title: 'GitLab API: Paginate Issues', type: 'api', file: 'src/ingestion/issues.rs:51-140', desc: 'Streams issues from the GitLab API using cursor-based incremental sync. The API is called with updated_after set to the last known cursor minus a configurable rewind window (to handle clock skew between GitLab and the local database).', sql: 'GET /api/v4/projects/{id}/issues\n ?updated_after={cursor - rewind_seconds}\n &order_by=updated_at&sort=asc\n &per_page=100' },
|
||||||
|
'issue-cursor-filter': { title: 'Cursor Filter (Dedup)', type: 'decision', file: 'src/ingestion/issues.rs:95-110', desc: 'Because of the cursor rewind, some issues will be re-fetched that we already have. The cursor filter skips these using a two-part comparison: primary on updated_at timestamp, with gitlab_id as a tie-breaker when timestamps are equal.', sql: '// Pseudocode:\nif issue.updated_at > cursor_ts:\n ACCEPT // newer than cursor\nelif issue.updated_at == cursor_ts\n AND issue.gitlab_id > tie_breaker_id:\n ACCEPT // same timestamp, higher ID\nelse:\n SKIP // already processed' },
|
||||||
|
'issue-transform': { title: 'Transform Issue', type: 'transform', file: 'src/gitlab/transformers/issue.rs', desc: 'Maps the GitLab API response shape to the local database row shape. Parses ISO 8601 timestamps to milliseconds-since-epoch, extracts label names, assignee usernames, milestone info, and due dates.' },
|
||||||
|
'issue-transaction': { title: 'Issue Write Transaction', type: 'db', file: 'src/ingestion/issues.rs:190-220', desc: 'All operations for a single issue are wrapped in one SQLite transaction for atomicity. If any step fails, the entire issue write is rolled back.', sql: 'BEGIN;\n-- 1. Store raw JSON payload (compressed, deduped)\nINSERT INTO payloads ...;\n-- 2. Upsert issue row\nINSERT INTO issues ... ON CONFLICT(gitlab_id)\n DO UPDATE SET ...;\n-- 3. Mark dirty for document regen\nINSERT INTO dirty_sources ...;\n-- 4. Relink labels\nDELETE FROM issue_labels WHERE issue_id = ?;\nINSERT INTO labels ... ON CONFLICT DO UPDATE;\nINSERT INTO issue_labels ...;\n-- 5. Relink assignees\nDELETE FROM issue_assignees WHERE issue_id = ?;\nINSERT INTO issue_assignees ...;\nCOMMIT;' },
|
||||||
|
'issue-cursor-update': { title: 'Update Sync Cursor', type: 'db', file: 'src/ingestion/issues.rs:130-140', desc: 'The sync cursor is updated every 100 issues (for crash recovery) and once at the end of the stream. If the process crashes mid-sync, it resumes from at most 100 issues back.', sql: 'INSERT INTO sync_cursors\n (project_id, resource_type,\n updated_at_cursor, tie_breaker_id)\nVALUES (?1, \'issues\', ?2, ?3)\nON CONFLICT(project_id, resource_type)\n DO UPDATE SET\n updated_at_cursor = ?2,\n tie_breaker_id = ?3;' },
|
||||||
|
'issue-disc-query': { title: 'Query Issues Needing Discussion Sync', type: 'db', file: 'src/ingestion/issues.rs:450-471', desc: 'Finds all issues in this project whose updated_at timestamp exceeds their per-row discussion watermark. Issues that have not changed since their last discussion sync are skipped entirely.', sql: 'SELECT id, iid, updated_at\nFROM issues\nWHERE project_id = ?1\n AND updated_at > COALESCE(\n discussions_synced_for_updated_at, 0\n );' },
|
||||||
|
'issue-disc-fetch': { title: 'Paginate Issue Discussions', type: 'api', file: 'src/ingestion/discussions.rs:73-205', desc: 'Discussions are fetched sequentially per issue (rusqlite Connection is not Send, so async parallelism is not possible here). Each issue\'s discussions are streamed page by page from the GitLab API.', sql: 'GET /api/v4/projects/{id}/issues/{iid}\n /discussions?per_page=100' },
|
||||||
|
'issue-disc-transform': { title: 'Transform Discussion + Notes', type: 'transform', file: 'src/gitlab/transformers/discussion.rs', desc: 'Transforms the raw GitLab discussion payload into normalized rows. Sets NoteableRef::Issue. Computes resolvable/resolved status, first_note_at/last_note_at timestamps, and per-note position indices.' },
|
||||||
|
'issue-disc-write': { title: 'Write Discussion (Full Refresh)', type: 'db', file: 'src/ingestion/discussions.rs:140-180', desc: 'Issue discussions use a full-refresh pattern: all existing notes for a discussion are deleted and re-inserted. This is simpler than upsert but means partial failures lose the previous state.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO discussions ... ON CONFLICT DO UPDATE;\nINSERT INTO dirty_sources ...;\n-- Full refresh: delete all then re-insert\nDELETE FROM notes WHERE discussion_id = ?;\nINSERT INTO notes VALUES (...);\nCOMMIT;' },
|
||||||
|
'issue-disc-stale': { title: 'Remove Stale Discussions', type: 'db', file: 'src/ingestion/discussions.rs:185-195', desc: 'After successfully fetching ALL discussion pages for an issue, any discussions in the DB that were not seen in this fetch are deleted. Uses a temp table for >500 IDs to avoid SQLite\'s 999-variable limit.', sql: '-- For small sets (<= 500):\nDELETE FROM discussions\nWHERE issue_id = ?\n AND gitlab_id NOT IN (...);\n\n-- For large sets (> 500):\nCREATE TEMP TABLE seen_ids(id TEXT);\nINSERT INTO seen_ids ...;\nDELETE FROM discussions\nWHERE issue_id = ?\n AND gitlab_id NOT IN\n (SELECT id FROM seen_ids);\nDROP TABLE seen_ids;' },
|
||||||
|
'issue-disc-watermark': { title: 'Advance Discussion Watermark', type: 'db', file: 'src/ingestion/discussions.rs:198', desc: 'Sets the per-issue watermark to the issue\'s current updated_at, signaling that discussions are now synced for this version of the issue.', sql: 'UPDATE issues\nSET discussions_synced_for_updated_at\n = updated_at\nWHERE id = ?;' },
|
||||||
|
'issue-disc-fail': { title: 'Pagination Error Handling', type: 'error', file: 'src/ingestion/discussions.rs:182', desc: 'If pagination fails mid-stream, stale discussion removal is skipped (we don\'t know the full set) and the watermark is NOT advanced. The issue will be retried on the next sync run.' },
|
||||||
|
're-cleanup': { title: 'Cleanup Obsolete Jobs', type: 'queue', file: 'src/ingestion/orchestrator.rs:490-520', desc: 'Before enqueuing new jobs, delete any existing jobs for entities whose watermark is already current. These are leftover from a previous run.', sql: 'DELETE FROM pending_dependent_fetches\nWHERE project_id = ?\n AND job_type = \'resource_events\'\n AND entity_local_id IN (\n SELECT id FROM issues\n WHERE project_id = ?\n AND updated_at <= COALESCE(\n resource_events_synced_for_updated_at, 0\n )\n );' },
|
||||||
|
're-enqueue': { title: 'Enqueue Resource Event Jobs', type: 'queue', file: 'src/ingestion/orchestrator.rs:525-555', desc: 'For each entity whose updated_at exceeds its resource event watermark, insert a job into the queue. Uses INSERT OR IGNORE for idempotency.', sql: 'INSERT OR IGNORE INTO pending_dependent_fetches\n (project_id, entity_type, entity_iid,\n entity_local_id, job_type, enqueued_at)\nSELECT project_id, \'issue\', iid, id,\n \'resource_events\', ?now\nFROM issues\nWHERE project_id = ?\n AND updated_at > COALESCE(\n resource_events_synced_for_updated_at, 0\n );' },
|
||||||
|
're-claim': { title: 'Claim Jobs (Atomic Lock)', type: 'queue', file: 'src/core/dependent_queue.rs', desc: 'Atomically claims a batch of unlocked jobs whose backoff period has elapsed. Uses UPDATE...RETURNING for lock acquisition in a single statement.', sql: 'UPDATE pending_dependent_fetches\nSET locked_at = ?now\nWHERE rowid IN (\n SELECT rowid\n FROM pending_dependent_fetches\n WHERE project_id = ?\n AND job_type = \'resource_events\'\n AND locked_at IS NULL\n AND (next_retry_at IS NULL\n OR next_retry_at <= ?now)\n ORDER BY enqueued_at ASC\n LIMIT ?batch_size\n)\nRETURNING *;' },
|
||||||
|
're-fetch': { title: 'Fetch 3 Event Types Concurrently', type: 'api', file: 'src/gitlab/client.rs:732-771', desc: 'Uses tokio::join! (not try_join!) to fetch state, label, and milestone events concurrently. Permanent errors (404, 403) are coalesced to empty vecs via coalesce_inaccessible().', sql: 'tokio::join!(\n fetch_issue_state_events(proj, iid),\n fetch_issue_label_events(proj, iid),\n fetch_issue_milestone_events(proj, iid),\n)\n// Each: coalesce_inaccessible()\n// 404/403 -> Ok(vec![])\n// Other errors -> propagated' },
|
||||||
|
're-store': { title: 'Store Resource Events', type: 'db', file: 'src/ingestion/orchestrator.rs:620-640', desc: 'All three event types are upserted in a single transaction.', sql: 'BEGIN;\nINSERT INTO resource_state_events ...\n ON CONFLICT DO UPDATE;\nINSERT INTO resource_label_events ...\n ON CONFLICT DO UPDATE;\nINSERT INTO resource_milestone_events ...\n ON CONFLICT DO UPDATE;\nCOMMIT;' },
|
||||||
|
're-complete': { title: 'Complete Job + Advance Watermark', type: 'db', file: 'src/ingestion/orchestrator.rs:645-660', desc: 'After successful storage, the job row is deleted and the entity\'s watermark is advanced.', sql: 'DELETE FROM pending_dependent_fetches\n WHERE rowid = ?;\n\nUPDATE issues\nSET resource_events_synced_for_updated_at\n = updated_at\nWHERE id = ?;' },
|
||||||
|
're-permanent': { title: 'Permanent Error: Skip Entity', type: 'error', file: 'src/ingestion/orchestrator.rs:665-680', desc: '404 (endpoint doesn\'t exist) and 403 (insufficient permissions) are permanent. The job is completed and watermark advanced, so this entity is permanently skipped until next updated on GitLab.' },
|
||||||
|
're-transient': { title: 'Transient Error: Exponential Backoff', type: 'error', file: 'src/core/dependent_queue.rs', desc: 'Network errors, 500s, rate limits get exponential backoff. Formula: 30s * 2^(attempts-1), capped at 480s (8 minutes).', sql: 'UPDATE pending_dependent_fetches\nSET locked_at = NULL,\n attempts = attempts + 1,\n next_retry_at = ?now\n + 30000 * pow(2, attempts),\n -- capped at 480000ms (8 min)\n last_error = ?error_msg\nWHERE rowid = ?;' },
|
||||||
|
'mr-api-call': { title: 'GitLab API: Fetch MR Pages', type: 'api', file: 'src/ingestion/merge_requests.rs:51-151', desc: 'Unlike issues which stream, MRs use explicit page-based pagination via fetch_merge_requests_page(). Each page returns items plus a next_page indicator.', sql: 'GET /api/v4/projects/{id}/merge_requests\n ?updated_after={cursor - rewind}\n &order_by=updated_at&sort=asc\n &per_page=100&page={n}' },
|
||||||
|
'mr-cursor-filter': { title: 'Cursor Filter', type: 'decision', file: 'src/ingestion/merge_requests.rs:90-105', desc: 'Identical logic to issues: timestamp comparison with gitlab_id tie-breaker.' },
|
||||||
|
'mr-transform': { title: 'Transform Merge Request', type: 'transform', file: 'src/gitlab/transformers/mr.rs', desc: 'Maps GitLab MR response to local row. Handles draft detection (prefers draft field, falls back to work_in_progress), detailed_merge_status, merge_user resolution, and reviewer extraction.' },
|
||||||
|
'mr-transaction': { title: 'MR Write Transaction', type: 'db', file: 'src/ingestion/merge_requests.rs:170-210', desc: 'Same pattern as issues but with THREE junction tables: labels, assignees, AND reviewers.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO merge_requests ...\n ON CONFLICT DO UPDATE;\nINSERT INTO dirty_sources ...;\n-- 3 junction tables:\nDELETE FROM mr_labels WHERE mr_id = ?;\nINSERT INTO mr_labels ...;\nDELETE FROM mr_assignees WHERE mr_id = ?;\nINSERT INTO mr_assignees ...;\nDELETE FROM mr_reviewers WHERE mr_id = ?;\nINSERT INTO mr_reviewers ...;\nCOMMIT;' },
|
||||||
|
'mr-cursor-update': { title: 'Update Cursor Per Page', type: 'db', file: 'src/ingestion/merge_requests.rs:140-150', desc: 'Unlike issues (every 100 items), MR cursor is updated at each page boundary for better crash recovery.' },
|
||||||
|
'mr-disc-query': { title: 'Query MRs Needing Discussion Sync', type: 'db', file: 'src/ingestion/merge_requests.rs:430-451', desc: 'Same watermark pattern as issues. Runs AFTER MR ingestion to avoid memory growth.', sql: 'SELECT id, iid, updated_at\nFROM merge_requests\nWHERE project_id = ?1\n AND updated_at > COALESCE(\n discussions_synced_for_updated_at, 0\n );' },
|
||||||
|
'mr-disc-batch': { title: 'Batch by Concurrency', type: 'decision', file: 'src/ingestion/orchestrator.rs:420-465', desc: 'MRs are processed in batches sized by dependent_concurrency. Each batch first prefetches all discussions in parallel, then writes serially.' },
|
||||||
|
'mr-disc-prefetch': { title: 'Parallel Prefetch', type: 'api', file: 'src/ingestion/mr_discussions.rs:66-120', desc: 'All MRs in the batch have their discussions fetched concurrently via join_all(). Each MR\'s discussions are fetched in one call, transformed in memory, and returned as PrefetchedMrDiscussions.', sql: '// For each MR in batch, concurrently:\nGET /api/v4/projects/{id}/merge_requests\n /{iid}/discussions?per_page=100\n\n// All fetched + transformed in memory\n// before any DB writes happen' },
|
||||||
|
'mr-disc-transform': { title: 'Transform MR Discussions', type: 'transform', file: 'src/ingestion/mr_discussions.rs:125-160', desc: 'Uses transform_mr_discussion() which additionally handles DiffNote positions (file paths, line ranges, SHA triplets).' },
|
||||||
|
'mr-disc-write': { title: 'Serial Write (Upsert Pattern)', type: 'db', file: 'src/ingestion/mr_discussions.rs:165-220', desc: 'Unlike issue discussions (delete-all + re-insert), MR discussions use INSERT...ON CONFLICT DO UPDATE for both discussions and notes. Safer for partial failures.', sql: 'BEGIN;\nINSERT INTO payloads ...;\nINSERT INTO discussions ...\n ON CONFLICT DO UPDATE\n SET ..., last_seen_at = ?run_ts;\nINSERT INTO dirty_sources ...;\n-- Upsert notes (not delete+insert):\nINSERT INTO notes ...\n ON CONFLICT DO UPDATE\n SET ..., last_seen_at = ?run_ts;\nCOMMIT;' },
|
||||||
|
'mr-disc-sweep': { title: 'Sweep Stale (last_seen_at)', type: 'db', file: 'src/ingestion/mr_discussions.rs:225-245', desc: 'Staleness detected via last_seen_at timestamps. Both discussions AND notes are swept independently.', sql: '-- Sweep stale discussions:\nDELETE FROM discussions\nWHERE merge_request_id = ?\n AND last_seen_at < ?run_seen_at;\n\n-- Sweep stale notes:\nDELETE FROM notes\nWHERE discussion_id IN (\n SELECT id FROM discussions\n WHERE merge_request_id = ?\n) AND last_seen_at < ?run_seen_at;' },
|
||||||
|
'mr-disc-watermark': { title: 'Advance MR Discussion Watermark', type: 'db', file: 'src/ingestion/mr_discussions.rs:248', desc: 'Same as issues: stamps the per-MR watermark.', sql: 'UPDATE merge_requests\nSET discussions_synced_for_updated_at\n = updated_at\nWHERE id = ?;' },
|
||||||
|
'mr-disc-fail': { title: 'Failure: Sync Health Tracking', type: 'error', file: 'src/ingestion/mr_discussions.rs:252-260', desc: 'Unlike issues, MR discussion failures are tracked: discussions_sync_attempts is incremented and discussions_sync_last_error is recorded. Watermark is NOT advanced.' },
|
||||||
|
'doc-trigger': { title: 'mark_dirty_tx()', type: 'api', file: 'src/ingestion/dirty_tracker.rs', desc: 'Called during every upsert in ingestion. Inserts into dirty_sources, or on conflict resets backoff. This bridges ingestion (stages 1-2) and document generation (stage 3).', sql: 'INSERT INTO dirty_sources\n (source_type, source_id, queued_at)\nVALUES (?1, ?2, ?now)\nON CONFLICT(source_type, source_id)\n DO UPDATE SET\n queued_at = ?now,\n attempt_count = 0,\n next_attempt_at = NULL,\n last_error = NULL;' },
|
||||||
|
'doc-dirty-table': { title: 'dirty_sources Table', type: 'db', file: 'src/ingestion/dirty_tracker.rs', desc: 'Persistent queue of entities needing document regeneration. Supports exponential backoff for failed extractions.' },
|
||||||
|
'doc-drain': { title: 'Get Dirty Sources (Batched)', type: 'db', file: 'src/documents/regenerator.rs:35-45', desc: 'Fetches up to 500 dirty entries per batch, prioritizing fewer attempts. Respects exponential backoff.', sql: 'SELECT source_type, source_id\nFROM dirty_sources\nWHERE next_attempt_at IS NULL\n OR next_attempt_at <= ?now\nORDER BY attempt_count ASC,\n queued_at ASC\nLIMIT 500;' },
|
||||||
|
'doc-dispatch': { title: 'Dispatch by Source Type', type: 'decision', file: 'src/documents/extractor.rs', desc: 'Routes to the appropriate extraction function: "issue" -> extract_issue_document(), "merge_request" -> extract_mr_document(), "discussion" -> extract_discussion_document().' },
|
||||||
|
'doc-deleted-check': { title: 'Source Exists Check', type: 'decision', file: 'src/documents/regenerator.rs:48-55', desc: 'If the source entity was deleted, the extractor returns None. The regenerator deletes the document row. FK cascades clean up FTS and embeddings.' },
|
||||||
|
'doc-extract': { title: 'Extract Structured Content', type: 'transform', file: 'src/documents/extractor.rs', desc: 'Builds searchable text:\n[[Issue]] #42: Title\nProject: group/repo\nURL: ...\nLabels: [bug, urgent]\nState: opened\n\n--- Description ---\n...\n\nDiscussions inherit parent labels and extract DiffNote file paths.' },
|
||||||
|
'doc-triple-hash': { title: 'Triple-Hash Write Optimization', type: 'decision', file: 'src/documents/regenerator.rs:55-62', desc: 'Checks content_hash + labels_hash + paths_hash against existing document. If ALL three match, write is completely skipped. Critical for --full mode performance.' },
|
||||||
|
'doc-write': { title: 'SAVEPOINT Atomic Write', type: 'db', file: 'src/documents/regenerator.rs:58-65', desc: 'Document, labels, and paths written inside a SAVEPOINT for atomicity.', sql: 'SAVEPOINT doc_write;\nINSERT INTO documents ...\n ON CONFLICT DO UPDATE SET\n content = ?, content_hash = ?,\n labels_hash = ?, paths_hash = ?;\nDELETE FROM document_labels\n WHERE doc_id = ?;\nINSERT INTO document_labels ...;\nDELETE FROM document_paths\n WHERE doc_id = ?;\nINSERT INTO document_paths ...;\nRELEASE doc_write;' },
|
||||||
|
'doc-clear': { title: 'Clear Dirty Entry', type: 'db', file: 'src/ingestion/dirty_tracker.rs', desc: 'On success, the dirty_sources row is deleted.', sql: 'DELETE FROM dirty_sources\nWHERE source_type = ?\n AND source_id = ?;' },
|
||||||
|
'doc-error': { title: 'Record Error + Backoff', type: 'error', file: 'src/ingestion/dirty_tracker.rs', desc: 'Increments attempt_count, sets next_attempt_at with exponential backoff. Entry stays for retry.', sql: 'UPDATE dirty_sources\nSET attempt_count = attempt_count + 1,\n next_attempt_at = ?now\n + compute_backoff(attempt_count),\n last_error = ?error_msg\nWHERE source_type = ?\n AND source_id = ?;' },
|
||||||
|
'doc-skip': { title: 'Skip Write (Hash Match)', type: 'db', file: 'src/documents/regenerator.rs:57', desc: 'When all three hashes match, the document has not actually changed. Common when updated_at changes but content/labels/paths remain the same. Dirty entry is cleared without writes.' },
|
||||||
|
'embed-detect': { title: 'Change Detection', type: 'decision', file: 'src/embedding/change_detector.rs', desc: 'Document needs re-embedding if: (1) No embedding_metadata row, (2) document_hash mismatch, (3) Config drift in chunk_max_bytes, model, or dims.', sql: 'SELECT d.id, d.content, d.content_hash\nFROM documents d\nLEFT JOIN embedding_metadata em\n ON em.document_id = d.id\nWHERE em.document_id IS NULL\n OR em.document_hash != d.content_hash\n OR em.chunk_max_bytes != ?config\n OR em.model != ?model\n OR em.dims != ?dims;' },
|
||||||
|
'embed-paginate': { title: 'Keyset Pagination', type: 'db', file: 'src/embedding/pipeline.rs:80-100', desc: '500 documents per page using keyset pagination. Each page wrapped in a SAVEPOINT.' },
|
||||||
|
'embed-chunk': { title: 'Split Into Chunks', type: 'transform', file: 'src/embedding/chunking.rs', desc: 'Splits content at paragraph boundaries with configurable max size and overlap.' },
|
||||||
|
'embed-overflow': { title: 'Overflow Guard', type: 'decision', file: 'src/embedding/pipeline.rs:110-120', desc: 'If a document produces too many chunks, it is skipped to prevent rowid collisions in the encoded chunk ID scheme.' },
|
||||||
|
'embed-work': { title: 'Build ChunkWork Items', type: 'transform', file: 'src/embedding/pipeline.rs:125-140', desc: 'Each chunk gets an encoded ID (document_id * 1000000 + chunk_index) for the sqlite-vec primary key.' },
|
||||||
|
'embed-batch': { title: 'Batch Embed via Ollama', type: 'api', file: 'src/embedding/pipeline.rs:150-200', desc: 'Sends 32 chunks per Ollama API call. Model default: nomic-embed-text.', sql: 'POST http://localhost:11434/api/embed\n{\n "model": "nomic-embed-text",\n "input": ["chunk1...", "chunk2...", ...]\n}' },
|
||||||
|
'embed-store': { title: 'Store Vectors', type: 'db', file: 'src/embedding/pipeline.rs:205-230', desc: 'Vectors stored in sqlite-vec virtual table. Metadata in embedding_metadata. Old embeddings cleared on first successful chunk.', sql: '-- Clear old embeddings:\nDELETE FROM embeddings\n WHERE rowid / 1000000 = ?doc_id;\n\n-- Insert new vector:\nINSERT INTO embeddings(rowid, embedding)\nVALUES (?chunk_id, ?vector_blob);\n\n-- Update metadata:\nINSERT INTO embedding_metadata ...\n ON CONFLICT DO UPDATE SET\n document_hash = ?,\n chunk_max_bytes = ?,\n model = ?, dims = ?;' },
|
||||||
|
'embed-success': { title: 'SAVEPOINT Commit', type: 'db', file: 'src/embedding/pipeline.rs:240-250', desc: 'Each page of 500 documents wrapped in a SAVEPOINT. Completed pages survive crashes.' },
|
||||||
|
'embed-ctx-error': { title: 'Context-Length Retry', type: 'error', file: 'src/embedding/pipeline.rs:260-280', desc: 'If Ollama returns context-length error for a batch, each chunk is retried individually to isolate the oversized one.' },
|
||||||
|
'embed-other-error': { title: 'Record Error for Retry', type: 'error', file: 'src/embedding/pipeline.rs:285-295', desc: 'Network/model errors recorded in embedding_metadata. Document detected as pending again on next run.' },
|
||||||
|
};
|
||||||
|
|
||||||
|
function escapeHtml(str) {
|
||||||
|
var div = document.createElement('div');
|
||||||
|
div.appendChild(document.createTextNode(str));
|
||||||
|
return div.textContent;
|
||||||
|
}
|
||||||
|
|
||||||
|
function buildDetailContent(d) {
|
||||||
|
var container = document.createDocumentFragment();
|
||||||
|
|
||||||
|
// Tags section
|
||||||
|
var tagSection = document.createElement('div');
|
||||||
|
tagSection.className = 'detail-section';
|
||||||
|
var typeTag = document.createElement('span');
|
||||||
|
typeTag.className = 'detail-tag type-' + d.type;
|
||||||
|
typeTag.textContent = d.type.toUpperCase();
|
||||||
|
tagSection.appendChild(typeTag);
|
||||||
|
if (d.file) {
|
||||||
|
var fileTag = document.createElement('span');
|
||||||
|
fileTag.className = 'detail-tag file';
|
||||||
|
fileTag.textContent = d.file;
|
||||||
|
tagSection.appendChild(fileTag);
|
||||||
|
}
|
||||||
|
container.appendChild(tagSection);
|
||||||
|
|
||||||
|
// Description
|
||||||
|
var descSection = document.createElement('div');
|
||||||
|
descSection.className = 'detail-section';
|
||||||
|
var descH3 = document.createElement('h3');
|
||||||
|
descH3.textContent = 'Description';
|
||||||
|
descSection.appendChild(descH3);
|
||||||
|
var descP = document.createElement('p');
|
||||||
|
descP.textContent = d.desc;
|
||||||
|
descSection.appendChild(descP);
|
||||||
|
container.appendChild(descSection);
|
||||||
|
|
||||||
|
// SQL
|
||||||
|
if (d.sql) {
|
||||||
|
var sqlSection = document.createElement('div');
|
||||||
|
sqlSection.className = 'detail-section';
|
||||||
|
var sqlH3 = document.createElement('h3');
|
||||||
|
sqlH3.textContent = 'Key Query / Code';
|
||||||
|
sqlSection.appendChild(sqlH3);
|
||||||
|
var sqlBlock = document.createElement('div');
|
||||||
|
sqlBlock.className = 'sql-block';
|
||||||
|
sqlBlock.textContent = d.sql;
|
||||||
|
sqlSection.appendChild(sqlBlock);
|
||||||
|
container.appendChild(sqlSection);
|
||||||
|
}
|
||||||
|
|
||||||
|
return container;
|
||||||
|
}
|
||||||
|
|
||||||
|
function showDetail(key) {
|
||||||
|
var d = details[key];
|
||||||
|
if (!d) return;
|
||||||
|
var panel = document.getElementById('detail-panel');
|
||||||
|
document.getElementById('detail-title').textContent = d.title;
|
||||||
|
var body = document.getElementById('detail-body');
|
||||||
|
while (body.firstChild) body.removeChild(body.firstChild);
|
||||||
|
body.appendChild(buildDetailContent(d));
|
||||||
|
document.querySelectorAll('.node.selected').forEach(function(n) { n.classList.remove('selected'); });
|
||||||
|
var clicked = document.querySelector('[data-detail="' + key + '"]');
|
||||||
|
if (clicked) clicked.classList.add('selected');
|
||||||
|
panel.classList.add('open');
|
||||||
|
}
|
||||||
|
|
||||||
|
function closeDetail() {
|
||||||
|
document.getElementById('detail-panel').classList.remove('open');
|
||||||
|
document.querySelectorAll('.node.selected').forEach(function(n) { n.classList.remove('selected'); });
|
||||||
|
}
|
||||||
|
|
||||||
|
document.addEventListener('click', function(e) {
|
||||||
|
var node = e.target.closest('.node[data-detail]');
|
||||||
|
if (node) { showDetail(node.dataset.detail); return; }
|
||||||
|
if (!e.target.closest('.detail-panel') && !e.target.closest('.node')) closeDetail();
|
||||||
|
});
|
||||||
|
document.addEventListener('keydown', function(e) { if (e.key === 'Escape') closeDetail(); });
|
||||||
|
</script>
|
||||||
|
</body>
|
||||||
|
</html>
|
||||||
84
migrations/007_documents.sql
Normal file
84
migrations/007_documents.sql
Normal file
@@ -0,0 +1,84 @@
|
|||||||
|
-- Migration 007: Documents, Document Labels, Document Paths, Dirty Sources, Pending Discussion Fetches
|
||||||
|
-- Schema version: 7
|
||||||
|
-- Adds CP3 document storage and queue tables for search pipeline
|
||||||
|
|
||||||
|
-- Unified searchable documents (derived from issues/MRs/discussions)
|
||||||
|
CREATE TABLE documents (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion')),
|
||||||
|
source_id INTEGER NOT NULL, -- local DB id in the source table
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
author_username TEXT, -- for discussions: first note author
|
||||||
|
label_names TEXT, -- JSON array (display/debug only)
|
||||||
|
created_at INTEGER, -- ms epoch UTC
|
||||||
|
updated_at INTEGER, -- ms epoch UTC
|
||||||
|
url TEXT,
|
||||||
|
title TEXT, -- null for discussions
|
||||||
|
content_text TEXT NOT NULL, -- canonical text for embedding/search
|
||||||
|
content_hash TEXT NOT NULL, -- SHA-256 for change detection
|
||||||
|
labels_hash TEXT NOT NULL DEFAULT '', -- SHA-256 over sorted labels (write optimization)
|
||||||
|
paths_hash TEXT NOT NULL DEFAULT '', -- SHA-256 over sorted paths (write optimization)
|
||||||
|
is_truncated INTEGER NOT NULL DEFAULT 0,
|
||||||
|
truncated_reason TEXT CHECK (
|
||||||
|
truncated_reason IN (
|
||||||
|
'token_limit_middle_drop','single_note_oversized','first_last_oversized',
|
||||||
|
'hard_cap_oversized'
|
||||||
|
)
|
||||||
|
OR truncated_reason IS NULL
|
||||||
|
),
|
||||||
|
UNIQUE(source_type, source_id)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
|
||||||
|
CREATE INDEX idx_documents_author ON documents(author_username);
|
||||||
|
CREATE INDEX idx_documents_source ON documents(source_type, source_id);
|
||||||
|
CREATE INDEX idx_documents_hash ON documents(content_hash);
|
||||||
|
|
||||||
|
-- Fast label filtering (indexed exact-match)
|
||||||
|
CREATE TABLE document_labels (
|
||||||
|
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
|
||||||
|
label_name TEXT NOT NULL,
|
||||||
|
PRIMARY KEY(document_id, label_name)
|
||||||
|
) WITHOUT ROWID;
|
||||||
|
CREATE INDEX idx_document_labels_label ON document_labels(label_name);
|
||||||
|
|
||||||
|
-- Fast path filtering (DiffNote file paths)
|
||||||
|
CREATE TABLE document_paths (
|
||||||
|
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
|
||||||
|
path TEXT NOT NULL,
|
||||||
|
PRIMARY KEY(document_id, path)
|
||||||
|
) WITHOUT ROWID;
|
||||||
|
CREATE INDEX idx_document_paths_path ON document_paths(path);
|
||||||
|
|
||||||
|
-- Queue for incremental document regeneration (with retry tracking)
|
||||||
|
-- Uses next_attempt_at for index-friendly backoff queries
|
||||||
|
CREATE TABLE dirty_sources (
|
||||||
|
source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion')),
|
||||||
|
source_id INTEGER NOT NULL,
|
||||||
|
queued_at INTEGER NOT NULL, -- ms epoch UTC
|
||||||
|
attempt_count INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_attempt_at INTEGER,
|
||||||
|
last_error TEXT,
|
||||||
|
next_attempt_at INTEGER, -- ms epoch UTC; NULL means ready immediately
|
||||||
|
PRIMARY KEY(source_type, source_id)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
|
||||||
|
|
||||||
|
-- Resumable queue for dependent discussion fetching
|
||||||
|
-- Uses next_attempt_at for index-friendly backoff queries
|
||||||
|
CREATE TABLE pending_discussion_fetches (
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id),
|
||||||
|
noteable_type TEXT NOT NULL, -- 'Issue' | 'MergeRequest'
|
||||||
|
noteable_iid INTEGER NOT NULL,
|
||||||
|
queued_at INTEGER NOT NULL, -- ms epoch UTC
|
||||||
|
attempt_count INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_attempt_at INTEGER,
|
||||||
|
last_error TEXT,
|
||||||
|
next_attempt_at INTEGER, -- ms epoch UTC; NULL means ready immediately
|
||||||
|
PRIMARY KEY(project_id, noteable_type, noteable_iid)
|
||||||
|
);
|
||||||
|
CREATE INDEX idx_pending_discussions_next_attempt ON pending_discussion_fetches(next_attempt_at);
|
||||||
|
|
||||||
|
-- Update schema version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (7, strftime('%s', 'now') * 1000, 'Documents, labels, paths, dirty sources, pending discussion fetches');
|
||||||
42
migrations/008_fts5.sql
Normal file
42
migrations/008_fts5.sql
Normal file
@@ -0,0 +1,42 @@
|
|||||||
|
-- Migration 008: FTS5 Full-Text Search Index
|
||||||
|
-- Schema version: 8
|
||||||
|
-- Adds full-text search on documents table with sync triggers
|
||||||
|
|
||||||
|
-- Full-text search with porter stemmer and prefix indexes for type-ahead
|
||||||
|
CREATE VIRTUAL TABLE documents_fts USING fts5(
|
||||||
|
title,
|
||||||
|
content_text,
|
||||||
|
content='documents',
|
||||||
|
content_rowid='id',
|
||||||
|
tokenize='porter unicode61',
|
||||||
|
prefix='2 3 4'
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Keep FTS in sync via triggers.
|
||||||
|
-- IMPORTANT: COALESCE(title, '') ensures FTS5 external-content table never
|
||||||
|
-- receives NULL values, which can cause inconsistencies with delete operations.
|
||||||
|
-- FTS5 delete requires exact match of original values; NULL != NULL in SQL,
|
||||||
|
-- so a NULL title on insert would make the delete trigger fail silently.
|
||||||
|
CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
|
||||||
|
INSERT INTO documents_fts(rowid, title, content_text)
|
||||||
|
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
|
||||||
|
END;
|
||||||
|
|
||||||
|
CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
|
||||||
|
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
|
||||||
|
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
|
||||||
|
END;
|
||||||
|
|
||||||
|
-- Only rebuild FTS when searchable text actually changes (not metadata-only updates)
|
||||||
|
CREATE TRIGGER documents_au AFTER UPDATE ON documents
|
||||||
|
WHEN old.title IS NOT new.title OR old.content_text != new.content_text
|
||||||
|
BEGIN
|
||||||
|
INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
|
||||||
|
VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
|
||||||
|
INSERT INTO documents_fts(rowid, title, content_text)
|
||||||
|
VALUES (new.id, COALESCE(new.title, ''), new.content_text);
|
||||||
|
END;
|
||||||
|
|
||||||
|
-- Update schema version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (8, strftime('%s', 'now') * 1000, 'FTS5 full-text search index with sync triggers');
|
||||||
54
migrations/009_embeddings.sql
Normal file
54
migrations/009_embeddings.sql
Normal file
@@ -0,0 +1,54 @@
|
|||||||
|
-- Migration 009: Embeddings (Gate B)
|
||||||
|
-- Schema version: 9
|
||||||
|
-- Adds sqlite-vec vector storage and embedding metadata for semantic search
|
||||||
|
-- Requires sqlite-vec extension to be loaded before applying
|
||||||
|
|
||||||
|
-- NOTE: sqlite-vec vec0 virtual tables cannot participate in FK cascades.
|
||||||
|
-- We must use an explicit trigger to delete orphan embeddings when documents
|
||||||
|
-- are deleted. See documents_embeddings_ad trigger below.
|
||||||
|
|
||||||
|
-- sqlite-vec virtual table for vector search
|
||||||
|
-- Storage rule: embeddings.rowid = document_id * 1000 + chunk_index
|
||||||
|
-- This encodes (document_id, chunk_index) into a single integer rowid.
|
||||||
|
-- Supports up to 1000 chunks per document (32M chars at 32k/chunk).
|
||||||
|
CREATE VIRTUAL TABLE embeddings USING vec0(
|
||||||
|
embedding float[768]
|
||||||
|
);
|
||||||
|
|
||||||
|
-- Embedding provenance + change detection (one row per chunk)
|
||||||
|
-- NOTE: Two hash columns serve different purposes:
|
||||||
|
-- document_hash: SHA-256 of full documents.content_text (staleness detection)
|
||||||
|
-- chunk_hash: SHA-256 of this individual chunk's text (debug/provenance)
|
||||||
|
-- Pending detection uses document_hash (not chunk_hash) because staleness is
|
||||||
|
-- a document-level condition: if the document changed, ALL chunks need re-embedding.
|
||||||
|
CREATE TABLE embedding_metadata (
|
||||||
|
document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
|
||||||
|
chunk_index INTEGER NOT NULL DEFAULT 0, -- 0-indexed position within document
|
||||||
|
model TEXT NOT NULL, -- 'nomic-embed-text'
|
||||||
|
dims INTEGER NOT NULL, -- 768
|
||||||
|
document_hash TEXT NOT NULL, -- SHA-256 of full documents.content_text (staleness)
|
||||||
|
chunk_hash TEXT NOT NULL, -- SHA-256 of this chunk's text (provenance)
|
||||||
|
created_at INTEGER NOT NULL, -- ms epoch UTC
|
||||||
|
last_error TEXT, -- error message from last failed attempt
|
||||||
|
attempt_count INTEGER NOT NULL DEFAULT 0,
|
||||||
|
last_attempt_at INTEGER, -- ms epoch UTC
|
||||||
|
PRIMARY KEY(document_id, chunk_index)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX idx_embedding_metadata_errors
|
||||||
|
ON embedding_metadata(last_error) WHERE last_error IS NOT NULL;
|
||||||
|
CREATE INDEX idx_embedding_metadata_doc ON embedding_metadata(document_id);
|
||||||
|
|
||||||
|
-- CRITICAL: Delete ALL chunk embeddings when a document is deleted.
|
||||||
|
-- vec0 virtual tables don't support FK ON DELETE CASCADE, so we need this trigger.
|
||||||
|
-- embedding_metadata has ON DELETE CASCADE, so only vec0 needs explicit cleanup.
|
||||||
|
-- Range: [document_id * 1000, document_id * 1000 + 999]
|
||||||
|
CREATE TRIGGER documents_embeddings_ad AFTER DELETE ON documents BEGIN
|
||||||
|
DELETE FROM embeddings
|
||||||
|
WHERE rowid >= old.id * 1000
|
||||||
|
AND rowid < (old.id + 1) * 1000;
|
||||||
|
END;
|
||||||
|
|
||||||
|
-- Update schema version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (9, strftime('%s', 'now') * 1000, 'Embeddings vec0 table, metadata, orphan cleanup trigger');
|
||||||
14
migrations/010_chunk_config.sql
Normal file
14
migrations/010_chunk_config.sql
Normal file
@@ -0,0 +1,14 @@
|
|||||||
|
-- Migration 010: Chunk config tracking + adaptive dedup support
|
||||||
|
-- Schema version: 10
|
||||||
|
|
||||||
|
ALTER TABLE embedding_metadata ADD COLUMN chunk_max_bytes INTEGER;
|
||||||
|
ALTER TABLE embedding_metadata ADD COLUMN chunk_count INTEGER;
|
||||||
|
|
||||||
|
-- Partial index: accelerates drift detection and adaptive dedup queries on sentinel rows
|
||||||
|
CREATE INDEX idx_embedding_metadata_sentinel
|
||||||
|
ON embedding_metadata(document_id, chunk_index)
|
||||||
|
WHERE chunk_index = 0;
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (10, strftime('%s', 'now') * 1000,
|
||||||
|
'Add chunk_max_bytes and chunk_count to embedding_metadata');
|
||||||
128
migrations/011_resource_events.sql
Normal file
128
migrations/011_resource_events.sql
Normal file
@@ -0,0 +1,128 @@
|
|||||||
|
-- Migration 011: Resource event tables, entity references, and dependent fetch queue
|
||||||
|
-- Powers temporal queries (timeline, file-history, trace) via GitLab Resource Events APIs.
|
||||||
|
|
||||||
|
-- State change events (opened/closed/reopened/merged/locked)
|
||||||
|
CREATE TABLE resource_state_events (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
|
||||||
|
state TEXT NOT NULL,
|
||||||
|
actor_gitlab_id INTEGER,
|
||||||
|
actor_username TEXT,
|
||||||
|
created_at INTEGER NOT NULL, -- ms epoch UTC
|
||||||
|
source_commit TEXT,
|
||||||
|
source_merge_request_iid INTEGER, -- iid from source_merge_request ref
|
||||||
|
CHECK (
|
||||||
|
(issue_id IS NOT NULL AND merge_request_id IS NULL) OR
|
||||||
|
(issue_id IS NULL AND merge_request_id IS NOT NULL)
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX uq_state_events_gitlab ON resource_state_events(gitlab_id, project_id);
|
||||||
|
CREATE INDEX idx_state_events_issue ON resource_state_events(issue_id) WHERE issue_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_state_events_mr ON resource_state_events(merge_request_id) WHERE merge_request_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_state_events_created ON resource_state_events(created_at);
|
||||||
|
|
||||||
|
-- Label change events (add/remove)
|
||||||
|
CREATE TABLE resource_label_events (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
|
||||||
|
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
|
||||||
|
label_name TEXT NOT NULL,
|
||||||
|
actor_gitlab_id INTEGER,
|
||||||
|
actor_username TEXT,
|
||||||
|
created_at INTEGER NOT NULL, -- ms epoch UTC
|
||||||
|
CHECK (
|
||||||
|
(issue_id IS NOT NULL AND merge_request_id IS NULL) OR
|
||||||
|
(issue_id IS NULL AND merge_request_id IS NOT NULL)
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX uq_label_events_gitlab ON resource_label_events(gitlab_id, project_id);
|
||||||
|
CREATE INDEX idx_label_events_issue ON resource_label_events(issue_id) WHERE issue_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_label_events_mr ON resource_label_events(merge_request_id) WHERE merge_request_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_label_events_created ON resource_label_events(created_at);
|
||||||
|
|
||||||
|
-- Milestone change events (add/remove)
|
||||||
|
CREATE TABLE resource_milestone_events (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
|
||||||
|
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
|
||||||
|
milestone_title TEXT NOT NULL,
|
||||||
|
milestone_id INTEGER,
|
||||||
|
actor_gitlab_id INTEGER,
|
||||||
|
actor_username TEXT,
|
||||||
|
created_at INTEGER NOT NULL, -- ms epoch UTC
|
||||||
|
CHECK (
|
||||||
|
(issue_id IS NOT NULL AND merge_request_id IS NULL) OR
|
||||||
|
(issue_id IS NULL AND merge_request_id IS NOT NULL)
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX uq_milestone_events_gitlab ON resource_milestone_events(gitlab_id, project_id);
|
||||||
|
CREATE INDEX idx_milestone_events_issue ON resource_milestone_events(issue_id) WHERE issue_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_milestone_events_mr ON resource_milestone_events(merge_request_id) WHERE merge_request_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_milestone_events_created ON resource_milestone_events(created_at);
|
||||||
|
|
||||||
|
-- Cross-reference table (Gate 2): source/target entity pairs
|
||||||
|
CREATE TABLE entity_references (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
source_entity_type TEXT NOT NULL CHECK (source_entity_type IN ('issue', 'merge_request')),
|
||||||
|
source_entity_id INTEGER NOT NULL, -- local DB id
|
||||||
|
target_entity_type TEXT NOT NULL CHECK (target_entity_type IN ('issue', 'merge_request')),
|
||||||
|
target_entity_id INTEGER, -- local DB id (NULL if unresolved)
|
||||||
|
target_project_path TEXT, -- for unresolved cross-project refs
|
||||||
|
target_entity_iid INTEGER, -- for unresolved refs
|
||||||
|
reference_type TEXT NOT NULL CHECK (reference_type IN ('closes', 'mentioned', 'related')),
|
||||||
|
source_method TEXT NOT NULL CHECK (source_method IN ('api', 'note_parse', 'description_parse')),
|
||||||
|
created_at INTEGER NOT NULL -- ms epoch UTC
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX uq_entity_refs ON entity_references(
|
||||||
|
project_id,
|
||||||
|
source_entity_type,
|
||||||
|
source_entity_id,
|
||||||
|
target_entity_type,
|
||||||
|
COALESCE(target_entity_id, -1),
|
||||||
|
COALESCE(target_project_path, ''),
|
||||||
|
COALESCE(target_entity_iid, -1),
|
||||||
|
reference_type,
|
||||||
|
source_method
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX idx_entity_refs_source ON entity_references(source_entity_type, source_entity_id);
|
||||||
|
CREATE INDEX idx_entity_refs_target ON entity_references(target_entity_id) WHERE target_entity_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_entity_refs_unresolved ON entity_references(target_project_path, target_entity_iid) WHERE target_entity_id IS NULL;
|
||||||
|
|
||||||
|
-- Generic dependent fetch queue (resource_events, mr_closes_issues, mr_diffs)
|
||||||
|
CREATE TABLE pending_dependent_fetches (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
entity_type TEXT NOT NULL CHECK (entity_type IN ('issue', 'merge_request')),
|
||||||
|
entity_iid INTEGER NOT NULL,
|
||||||
|
entity_local_id INTEGER NOT NULL,
|
||||||
|
job_type TEXT NOT NULL CHECK (job_type IN ('resource_events', 'mr_closes_issues', 'mr_diffs')),
|
||||||
|
payload_json TEXT, -- optional extra data for the job
|
||||||
|
enqueued_at INTEGER NOT NULL, -- ms epoch UTC
|
||||||
|
locked_at INTEGER, -- ms epoch UTC (NULL = available)
|
||||||
|
attempts INTEGER NOT NULL DEFAULT 0,
|
||||||
|
next_retry_at INTEGER, -- ms epoch UTC (NULL = no backoff)
|
||||||
|
last_error TEXT
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX uq_pending_fetches ON pending_dependent_fetches(project_id, entity_type, entity_iid, job_type);
|
||||||
|
CREATE INDEX idx_pending_fetches_claimable ON pending_dependent_fetches(job_type, locked_at) WHERE locked_at IS NULL;
|
||||||
|
CREATE INDEX idx_pending_fetches_retryable ON pending_dependent_fetches(next_retry_at) WHERE locked_at IS NULL AND next_retry_at IS NOT NULL;
|
||||||
|
|
||||||
|
-- Update schema version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (11, strftime('%s', 'now') * 1000, 'Resource events, entity references, and dependent fetch queue');
|
||||||
65
migrations/012_nullable_label_milestone.sql
Normal file
65
migrations/012_nullable_label_milestone.sql
Normal file
@@ -0,0 +1,65 @@
|
|||||||
|
-- Migration 012: Make label_name and milestone_title nullable
|
||||||
|
-- GitLab returns null for these when the referenced label/milestone has been deleted.
|
||||||
|
|
||||||
|
-- Recreate resource_label_events with nullable label_name
|
||||||
|
CREATE TABLE resource_label_events_new (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
|
||||||
|
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
|
||||||
|
label_name TEXT,
|
||||||
|
actor_gitlab_id INTEGER,
|
||||||
|
actor_username TEXT,
|
||||||
|
created_at INTEGER NOT NULL,
|
||||||
|
CHECK (
|
||||||
|
(issue_id IS NOT NULL AND merge_request_id IS NULL) OR
|
||||||
|
(issue_id IS NULL AND merge_request_id IS NOT NULL)
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
INSERT INTO resource_label_events_new
|
||||||
|
SELECT * FROM resource_label_events;
|
||||||
|
|
||||||
|
DROP TABLE resource_label_events;
|
||||||
|
ALTER TABLE resource_label_events_new RENAME TO resource_label_events;
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX uq_label_events_gitlab ON resource_label_events(gitlab_id, project_id);
|
||||||
|
CREATE INDEX idx_label_events_issue ON resource_label_events(issue_id) WHERE issue_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_label_events_mr ON resource_label_events(merge_request_id) WHERE merge_request_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_label_events_created ON resource_label_events(created_at);
|
||||||
|
|
||||||
|
-- Recreate resource_milestone_events with nullable milestone_title
|
||||||
|
CREATE TABLE resource_milestone_events_new (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
gitlab_id INTEGER NOT NULL,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
issue_id INTEGER REFERENCES issues(id) ON DELETE CASCADE,
|
||||||
|
merge_request_id INTEGER REFERENCES merge_requests(id) ON DELETE CASCADE,
|
||||||
|
action TEXT NOT NULL CHECK (action IN ('add', 'remove')),
|
||||||
|
milestone_title TEXT,
|
||||||
|
milestone_id INTEGER,
|
||||||
|
actor_gitlab_id INTEGER,
|
||||||
|
actor_username TEXT,
|
||||||
|
created_at INTEGER NOT NULL,
|
||||||
|
CHECK (
|
||||||
|
(issue_id IS NOT NULL AND merge_request_id IS NULL) OR
|
||||||
|
(issue_id IS NULL AND merge_request_id IS NOT NULL)
|
||||||
|
)
|
||||||
|
);
|
||||||
|
|
||||||
|
INSERT INTO resource_milestone_events_new
|
||||||
|
SELECT * FROM resource_milestone_events;
|
||||||
|
|
||||||
|
DROP TABLE resource_milestone_events;
|
||||||
|
ALTER TABLE resource_milestone_events_new RENAME TO resource_milestone_events;
|
||||||
|
|
||||||
|
CREATE UNIQUE INDEX uq_milestone_events_gitlab ON resource_milestone_events(gitlab_id, project_id);
|
||||||
|
CREATE INDEX idx_milestone_events_issue ON resource_milestone_events(issue_id) WHERE issue_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_milestone_events_mr ON resource_milestone_events(merge_request_id) WHERE merge_request_id IS NOT NULL;
|
||||||
|
CREATE INDEX idx_milestone_events_created ON resource_milestone_events(created_at);
|
||||||
|
|
||||||
|
-- Update schema version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (12, strftime('%s', 'now') * 1000, 'Make label_name and milestone_title nullable for deleted labels/milestones');
|
||||||
10
migrations/013_resource_event_watermarks.sql
Normal file
10
migrations/013_resource_event_watermarks.sql
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
-- Migration 013: Add resource event sync watermarks
|
||||||
|
-- Mirrors the discussions_synced_for_updated_at pattern so that only entities
|
||||||
|
-- whose updated_at exceeds the last resource event sync get re-enqueued.
|
||||||
|
|
||||||
|
ALTER TABLE issues ADD COLUMN resource_events_synced_for_updated_at INTEGER;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN resource_events_synced_for_updated_at INTEGER;
|
||||||
|
|
||||||
|
-- Update schema version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (13, strftime('%s', 'now') * 1000, 'Add resource event sync watermarks to issues and merge_requests');
|
||||||
12
migrations/014_sync_runs_enrichment.sql
Normal file
12
migrations/014_sync_runs_enrichment.sql
Normal file
@@ -0,0 +1,12 @@
|
|||||||
|
-- Migration 014: sync_runs enrichment for observability
|
||||||
|
-- Adds correlation ID and aggregate counts for queryable sync history
|
||||||
|
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN run_id TEXT;
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN total_items_processed INTEGER DEFAULT 0;
|
||||||
|
ALTER TABLE sync_runs ADD COLUMN total_errors INTEGER DEFAULT 0;
|
||||||
|
|
||||||
|
-- Index for correlation queries (find run by run_id from logs)
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_sync_runs_run_id ON sync_runs(run_id);
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (14, strftime('%s', 'now') * 1000, 'Sync runs enrichment for observability');
|
||||||
17
migrations/015_commit_shas_and_closes_watermark.sql
Normal file
17
migrations/015_commit_shas_and_closes_watermark.sql
Normal file
@@ -0,0 +1,17 @@
|
|||||||
|
-- Migration 015: Add commit SHAs to merge_requests, closes_issues watermark,
|
||||||
|
-- and missing label_name index on resource_label_events.
|
||||||
|
|
||||||
|
-- Commit SHAs link MRs to actual git history (needed for Gate 4: file-history, Gate 5: trace)
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN merge_commit_sha TEXT;
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN squash_commit_sha TEXT;
|
||||||
|
|
||||||
|
-- Watermark for closes_issues sync (same pattern as resource_events_synced_for_updated_at)
|
||||||
|
-- Prevents re-fetching closes_issues for MRs that haven't changed since last sync
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN closes_issues_synced_for_updated_at INTEGER;
|
||||||
|
|
||||||
|
-- Missing index from original spec: enables efficient label-name filtering in timeline queries
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_label_events_label ON resource_label_events(label_name);
|
||||||
|
|
||||||
|
-- Update schema version
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (15, strftime('%s', 'now') * 1000, 'Add commit SHAs, closes_issues watermark, and label event index');
|
||||||
20
migrations/016_mr_file_changes.sql
Normal file
20
migrations/016_mr_file_changes.sql
Normal file
@@ -0,0 +1,20 @@
|
|||||||
|
-- Migration 016: MR file changes table
|
||||||
|
-- Powers file-history and trace commands (Gates 4-5)
|
||||||
|
|
||||||
|
CREATE TABLE mr_file_changes (
|
||||||
|
id INTEGER PRIMARY KEY,
|
||||||
|
merge_request_id INTEGER NOT NULL REFERENCES merge_requests(id) ON DELETE CASCADE,
|
||||||
|
project_id INTEGER NOT NULL REFERENCES projects(id) ON DELETE CASCADE,
|
||||||
|
old_path TEXT,
|
||||||
|
new_path TEXT NOT NULL,
|
||||||
|
change_type TEXT NOT NULL CHECK (change_type IN ('added', 'modified', 'renamed', 'deleted')),
|
||||||
|
UNIQUE(merge_request_id, new_path)
|
||||||
|
);
|
||||||
|
|
||||||
|
CREATE INDEX idx_mfc_project_path ON mr_file_changes(project_id, new_path);
|
||||||
|
CREATE INDEX idx_mfc_project_old_path ON mr_file_changes(project_id, old_path) WHERE old_path IS NOT NULL;
|
||||||
|
CREATE INDEX idx_mfc_mr ON mr_file_changes(merge_request_id);
|
||||||
|
CREATE INDEX idx_mfc_renamed ON mr_file_changes(project_id, change_type) WHERE change_type = 'renamed';
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (16, strftime('%s', 'now') * 1000, 'MR file changes table');
|
||||||
28
migrations/017_who_indexes.sql
Normal file
28
migrations/017_who_indexes.sql
Normal file
@@ -0,0 +1,28 @@
|
|||||||
|
-- Migration 017: Composite indexes for `who` query paths
|
||||||
|
|
||||||
|
-- Expert/Overlap: DiffNote path prefix + timestamp filter.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_notes_diffnote_path_created
|
||||||
|
ON notes(position_new_path, created_at, project_id)
|
||||||
|
WHERE note_type = 'DiffNote' AND is_system = 0;
|
||||||
|
|
||||||
|
-- Active/Workload: discussion participation lookups.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_notes_discussion_author
|
||||||
|
ON notes(discussion_id, author_username)
|
||||||
|
WHERE is_system = 0;
|
||||||
|
|
||||||
|
-- Active (project-scoped): unresolved discussions by recency, scoped by project.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent
|
||||||
|
ON discussions(project_id, last_note_at)
|
||||||
|
WHERE resolvable = 1 AND resolved = 0;
|
||||||
|
|
||||||
|
-- Active (global): unresolved discussions by recency (no project scope).
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_discussions_unresolved_recent_global
|
||||||
|
ON discussions(last_note_at)
|
||||||
|
WHERE resolvable = 1 AND resolved = 0;
|
||||||
|
|
||||||
|
-- Workload: issue assignees by username.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_issue_assignees_username
|
||||||
|
ON issue_assignees(username, issue_id);
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (17, strftime('%s', 'now') * 1000, 'Composite indexes for who query paths');
|
||||||
10
migrations/018_fix_assignees_composite_index.sql
Normal file
10
migrations/018_fix_assignees_composite_index.sql
Normal file
@@ -0,0 +1,10 @@
|
|||||||
|
-- Migration 018: Fix composite index on issue_assignees
|
||||||
|
-- Migration 005 created idx_issue_assignees_username(username) as single-column.
|
||||||
|
-- Migration 017 attempted to recreate as (username, issue_id) but IF NOT EXISTS
|
||||||
|
-- silently skipped it. Drop and recreate with the correct composite columns.
|
||||||
|
|
||||||
|
DROP INDEX IF EXISTS idx_issue_assignees_username;
|
||||||
|
CREATE INDEX idx_issue_assignees_username ON issue_assignees(username, issue_id);
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (18, strftime('%s', 'now') * 1000, 'Fix composite index on issue_assignees');
|
||||||
16
migrations/019_list_performance.sql
Normal file
16
migrations/019_list_performance.sql
Normal file
@@ -0,0 +1,16 @@
|
|||||||
|
-- Standalone updated_at DESC indexes for ORDER BY without temp B-tree sort.
|
||||||
|
-- The existing composite indexes (project_id, updated_at) only help when
|
||||||
|
-- filtering by project first.
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_issues_updated_at_desc
|
||||||
|
ON issues(updated_at DESC);
|
||||||
|
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_mrs_updated_at_desc
|
||||||
|
ON merge_requests(updated_at DESC);
|
||||||
|
|
||||||
|
-- Covering index for correlated subquery: unresolved discussion count per issue.
|
||||||
|
-- MRs already have idx_discussions_mr_resolved (migration 006).
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_discussions_issue_resolved
|
||||||
|
ON discussions(issue_id, resolvable, resolved);
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (19, strftime('%s', 'now') * 1000, 'List performance indexes');
|
||||||
7
migrations/020_mr_diffs_watermark.sql
Normal file
7
migrations/020_mr_diffs_watermark.sql
Normal file
@@ -0,0 +1,7 @@
|
|||||||
|
-- Migration 020: Watermark column for MR diffs sync
|
||||||
|
-- Tracks which MRs have had their file changes fetched, same pattern as closes_issues_synced_for_updated_at
|
||||||
|
|
||||||
|
ALTER TABLE merge_requests ADD COLUMN diffs_synced_for_updated_at INTEGER;
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (20, strftime('%s', 'now') * 1000, 'MR diffs sync watermark');
|
||||||
9
migrations/021_work_item_status.sql
Normal file
9
migrations/021_work_item_status.sql
Normal file
@@ -0,0 +1,9 @@
|
|||||||
|
ALTER TABLE issues ADD COLUMN status_name TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN status_category TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN status_color TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN status_icon_name TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN status_synced_at INTEGER;
|
||||||
|
CREATE INDEX IF NOT EXISTS idx_issues_project_status_name ON issues(project_id, status_name);
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (21, strftime('%s', 'now') * 1000, 'Work item status columns for issues');
|
||||||
5
migrations/023_issue_detail_fields.sql
Normal file
5
migrations/023_issue_detail_fields.sql
Normal file
@@ -0,0 +1,5 @@
|
|||||||
|
ALTER TABLE issues ADD COLUMN closed_at TEXT;
|
||||||
|
ALTER TABLE issues ADD COLUMN confidential INTEGER NOT NULL DEFAULT 0;
|
||||||
|
|
||||||
|
INSERT INTO schema_version (version, applied_at, description)
|
||||||
|
VALUES (23, strftime('%s', 'now') * 1000, 'Add closed_at and confidential to issues');
|
||||||
1260
phase-a-review.html
Normal file
1260
phase-a-review.html
Normal file
File diff suppressed because it is too large
Load Diff
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user