docs: update README with notes, drift, error tolerance, scoring config, and expanded command reference

Major additions: - lore notes command: full documentation of rich note querying with filters (author, type, path, resolution, time range, body substring), sort/format options, field selection, and browser opening - lore drift command: discussion divergence detection documentation - Error Tolerance section: table of all 8 auto-correction types with examples and mode behavior, stderr JSON warning format, fuzzy suggestion format for unrecognized commands - Command Aliases table: primary commands and their accepted aliases - scoring config section: all weight/half-life/decay parameters for the who-expert scoring engine (authorWeight, reviewerWeight, noteBonus, half-life periods, closedMrMultiplier, excludedUsernames) Updates to existing sections: - Timeline: entity-direct seeding syntax (issue:N, i:N, mr:N, m:N), hybrid search pipeline description replacing pure FTS5, discussion thread collection, --fields flag, numbered progress spinners - Search: --after/--updated-after renamed to --since/--updated-since, progress spinner behavior, note type filter - Who: --explain-score, --as-of, --include-bots, --all-history, --detail - Sync: --no-file-changes flag - Robot-docs: --brief flag - Field selection: expanded to note which commands support --fields
feat(cli): improve error recovery with alias-aware suggestions and error tolerance manifest
2026-02-13 17:27:59 -05:00 · 2026-02-13 17:27:49 -05:00 · 2026-02-13 17:27:39 -05:00 · 2026-02-13 17:19:36 -05:00 · 2026-02-13 15:22:45 -05:00 · 2026-02-13 15:01:28 -05:00
83 changed files with 19222 additions and 7235 deletions
--- a/.beads/issues.jsonl
+++ b/.beads/issues.jsonl
--- a/.beads/last-touched
+++ b/.beads/last-touched
@@ -1 +1 @@
-bd-2kop
+bd-1yx
--- a/.github/workflows/roam.yml
+++ b/.github/workflows/roam.yml
@@ -0,0 +1,21 @@
+name: Roam Code Analysis
+on:
+  pull_request:
+    branches: [main, master]
+permissions:
+  contents: read
+  pull-requests: write
+jobs:
+  roam:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v4
+        with:
+          fetch-depth: 0
+      - uses: actions/setup-python@v5
+        with:
+          python-version: "3.12"
+      - run: pip install roam-code
+      - run: roam index
+      - run: roam fitness
+      - run: roam pr-risk --json
--- a/.gitignore
+++ b/.gitignore
@@ -41,6 +41,9 @@ lore.config.json
 *.db-shm


+# Mock seed data
+tools/mock-seed/
+
 # Added by cargo

 /target
--- a/.roam/fitness.yaml
+++ b/.roam/fitness.yaml
@@ -0,0 +1,11 @@
+rules:
+  - name: No circular imports in core
+    type: dependency
+    source: "src/**"
+    forbidden_target: "tests/**"
+    reason: "Production code should not import test modules"
+  - name: Complexity threshold
+    type: metric
+    metric: cognitive_complexity
+    threshold: 30
+    reason: "Functions above 30 cognitive complexity need refactoring"
--- a/Cargo.lock
+++ b/Cargo.lock
@@ -1106,7 +1106,7 @@ checksum = "5e5032e24019045c762d3c0f28f5b6b8bbf38563a65908389bf7978758920897"

 [[package]]
 name = "lore"
-version = "0.6.2"
+version = "0.8.2"
 dependencies = [
 "async-stream",
 "chrono",
--- a/Cargo.toml
+++ b/Cargo.toml
@@ -1,6 +1,6 @@
 [package]
 name = "lore"
-version = "0.6.2"
+version = "0.8.2"
 edition = "2024"
 description = "Gitlore - Local GitLab data management with semantic search"
 authors = ["Taylor Eernisse"]
--- a/PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md
+++ b/PROPOSED_CODE_FILE_REORGANIZATION_PLAN.md
@@ -0,0 +1,425 @@
+# Proposed Code File Reorganization Plan
+
+## Executive Summary
+
+The codebase is 79 Rust source files / 46K lines across 7 top-level modules. Most modules (`gitlab/`, `embedding/`, `search/`, `documents/`, `ingestion/`) are well-organized. The pain points are:
+
+1. **`core/` is a grab-bag** — 22 files mixing infrastructure, domain logic, DB operations, and an entire timeline pipeline
+2. **`main.rs` is 2713 lines** — ~30 handler functions that bridge CLI args to commands
+3. **`cli/mod.rs` is 949 lines** — every clap argument struct is packed into one file
+4. **Giant command files** — `who.rs` (6067 lines), `list.rs` (2931 lines) are unwieldy
+
+This plan is organized into **three tiers** based on impact-to-risk ratio. Tier 1 changes are "no-brainers" — they reduce confusion with minimal import churn. Tier 2 changes are valuable but involve more cross-cutting import updates. Tier 3 changes are "maybe later" — they'd be nice but the juice might not be worth the squeeze right now.
+
+---
+
+## Current Structure (Annotated)
+
+```
+src/
+├── main.rs              (2713 lines) ← dispatch + ~30 handler functions + error helpers
+├── lib.rs               (9 lines)
+├── cli/
+│   ├── mod.rs           (949 lines)  ← ALL clap arg structs crammed here
+│   ├── autocorrect.rs   (945 lines)
+│   ├── progress.rs      (92 lines)
+│   ├── robot.rs         (111 lines)
+│   └── commands/
+│       ├── mod.rs       (50 lines) — re-exports
+│       ├── auth_test.rs
+│       ├── count.rs     (406 lines)
+│       ├── doctor.rs    (576 lines)
+│       ├── drift.rs     (642 lines)
+│       ├── embed.rs
+│       ├── generate_docs.rs (320 lines)
+│       ├── ingest.rs    (1064 lines)
+│       ├── init.rs      (174 lines)
+│       ├── list.rs      (2931 lines) ← handles issues, MRs, AND notes listing
+│       ├── search.rs    (418 lines)
+│       ├── show.rs      (1377 lines)
+│       ├── stats.rs     (505 lines)
+│       ├── sync_status.rs (454 lines)
+│       ├── sync.rs      (576 lines)
+│       ├── timeline.rs  (488 lines)
+│       └── who.rs       (6067 lines) ← 5 sub-modes: expert, workload, active, overlap, reviews
+├── core/
+│   ├── mod.rs           (25 lines)
+│   ├── backoff.rs       ← retry logic (used by ingestion)
+│   ├── config.rs        (789 lines) ← configuration types
+│   ├── db.rs            (970 lines) ← connection + 22 migrations
+│   ├── dependent_queue.rs (330 lines) ← job queue (used by ingestion orchestrator)
+│   ├── error.rs         (295 lines) ← error enum + exit codes
+│   ├── events_db.rs     (199 lines) ← resource event upserts (used by ingestion)
+│   ├── lock.rs          (228 lines) ← filesystem sync lock
+│   ├── logging.rs       (179 lines) ← tracing filter builders
+│   ├── metrics.rs       (566 lines) ← tracing-based stage timing
+│   ├── note_parser.rs   (563 lines) ← cross-ref extraction from note bodies
+│   ├── paths.rs         ← config/db/log file path resolution
+│   ├── payloads.rs      (204 lines) ← raw JSON payload storage
+│   ├── project.rs       (274 lines) ← fuzzy project resolution from DB
+│   ├── references.rs    (551 lines) ← entity cross-reference extraction
+│   ├── shutdown.rs      ← graceful shutdown via tokio signal
+│   ├── sync_run.rs      (218 lines) ← sync run recording to DB
+│   ├── time.rs          ← time conversion utilities
+│   ├── timeline.rs      (284 lines) ← timeline types + EntityRef
+│   ├── timeline_collect.rs (695 lines) ← Stage 4: collect events from DB
+│   ├── timeline_expand.rs (557 lines) ← Stage 3: expand via cross-refs
+│   └── timeline_seed.rs (552 lines) ← Stage 1: FTS search seeding
+├── documents/           ← well-organized, 3 focused files
+├── embedding/           ← well-organized, 6 focused files
+├── gitlab/              ← well-organized, with transformers/ subdir
+├── ingestion/           ← well-organized, 8 focused files
+└── search/              ← well-organized, 5 focused files
+```
+
+---
+
+## Tier 1: No-Brainers (Do First)
+
+### 1.1 Extract `timeline/` from `core/`
+
+**What:** Move the 4 timeline files into their own top-level module `src/timeline/`.
+
+**Current location:**
+- `core/timeline.rs` (284 lines) — types: `EntityRef`, `ExpandedEntityRef`, `TimelineEvent`, `TimelineEventType`, etc.
+- `core/timeline_seed.rs` (552 lines) — Stage 1: FTS-based seeding
+- `core/timeline_expand.rs` (557 lines) — Stage 3: cross-reference expansion
+- `core/timeline_collect.rs` (695 lines) — Stage 4: event collection from DB
+
+**New structure:**
+```
+src/timeline/
+├── mod.rs       ← types (from timeline.rs) + re-exports
+├── seed.rs      ← from timeline_seed.rs
+├── expand.rs    ← from timeline_expand.rs
+└── collect.rs   ← from timeline_collect.rs
+```
+
+**Rationale:** These 4 files form a cohesive 5-stage pipeline (SEED→HYDRATE→EXPAND→COLLECT→RENDER). They have nothing to do with "core" infrastructure like `db.rs`, `config.rs`, or `error.rs`. They only import from `core::error`, `core::time`, and `search::fts` — all of which remain accessible via `crate::core::*` and `crate::search::*` after the move.
+
+**Import changes needed:**
+- `cli/commands/timeline.rs`: `use crate::core::timeline::*` → `use crate::timeline::*`, same for `timeline_seed`, `timeline_expand`, `timeline_collect`
+- `core/mod.rs`: remove the 4 `pub mod timeline*` lines
+- `lib.rs`: add `pub mod timeline;`
+
+**Risk: LOW** — Only 1 consumer (`cli/commands/timeline.rs`) + internal cross-references between the 4 files.
+
+---
+
+### 1.2 Extract `xref/` (cross-reference extraction) from `core/`
+
+**What:** Move `note_parser.rs` and `references.rs` into `src/xref/`.
+
+**Current location:**
+- `core/note_parser.rs` (563 lines) — parses note bodies for "mentioned in group/repo#123" patterns, persists to `note_cross_references` table
+- `core/references.rs` (551 lines) — extracts entity references from state events and closing MRs, writes to `entity_references` table
+
+**New structure:**
+```
+src/xref/
+├── mod.rs           ← re-exports
+├── note_parser.rs   ← from core/note_parser.rs
+└── references.rs    ← from core/references.rs
+```
+
+**Rationale:** These files implement a specific domain concept — extracting and persisting cross-references between issues and MRs. They are not "core infrastructure." They're consumed by `ingestion/orchestrator.rs` for the cross-reference extraction phase, and the data they produce is consumed by the timeline pipeline. Putting them in their own module makes the data flow clearer: `ingestion → xref → timeline`.
+
+**Import changes needed:**
+- `ingestion/orchestrator.rs`: `use crate::core::references::*` → `use crate::xref::references::*`
+- `ingestion/orchestrator.rs`: `use crate::core::note_parser::*` (if used directly — needs verification) → `use crate::xref::*`
+- `core/mod.rs`: remove `pub mod note_parser; pub mod references;`
+- `lib.rs`: add `pub mod xref;`
+- Internal: the files use `super::error::Result` and `super::time::now_ms` which become `crate::core::error::Result` and `crate::core::time::now_ms`
+
+**Risk: LOW** — 2-3 consumers at most. The files already use `super::` internally which just needs updating to `crate::core::`.
+
+---
+
+## Tier 2: Good Improvements (Do After Tier 1)
+
+### 2.1 Group ingestion-adjacent DB operations
+
+**What:** Move `events_db.rs`, `dependent_queue.rs`, `payloads.rs`, and `sync_run.rs` from `core/` into `ingestion/` since they exclusively serve the ingestion pipeline.
+
+**Current consumers:**
+- `events_db.rs` → only used by `cli/commands/count.rs` (for event counts)
+- `dependent_queue.rs` → only used by `ingestion/orchestrator.rs` and `main.rs` (to release locked jobs)
+- `payloads.rs` → only used by `ingestion/discussions.rs`, `ingestion/issues.rs`, `ingestion/merge_requests.rs`, `ingestion/mr_discussions.rs`
+- `sync_run.rs` → only used by `cli/commands/sync.rs` and `cli/commands/sync_status.rs`
+
+**New structure:**
+```
+src/ingestion/
+├── (existing files...)
+├── events_db.rs       ← from core/events_db.rs
+├── dependent_queue.rs ← from core/dependent_queue.rs
+├── payloads.rs        ← from core/payloads.rs
+└── sync_run.rs        ← from core/sync_run.rs
+```
+
+**Rationale:** All 4 files exist to support the ingestion pipeline:
+- `events_db.rs` upserts resource state/label/milestone events fetched during ingestion
+- `dependent_queue.rs` manages the job queue that drives incremental discussion fetching
+- `payloads.rs` stores the raw JSON payloads fetched from GitLab
+- `sync_run.rs` records when syncs start/finish and their metrics
+
+When you're looking for "how does ingestion work?", you'd naturally look in `ingestion/`. Having these scattered in `core/` requires knowing the hidden dependency.
+
+**Import changes needed:**
+- `events_db.rs`: 1 consumer in `cli/commands/count.rs` changes from `crate::core::events_db` → `crate::ingestion::events_db`
+- `dependent_queue.rs`: 2 consumers — `ingestion/orchestrator.rs` (becomes `super::dependent_queue`) and `main.rs`
+- `payloads.rs`: 4 consumers in `ingestion/*.rs` (become `super::payloads`)
+- `sync_run.rs`: 2 consumers in `cli/commands/sync.rs` and `sync_status.rs`
+- Internal references change from `super::error` / `super::time` to `crate::core::error` / `crate::core::time`
+
+**Risk: MEDIUM** — More import changes, but all straightforward. The internal `super::` references need the most attention.
+
+**Alternatively:** If moving feels like too much churn, a lighter option is to create `core/ingestion_db.rs` that re-exports from these 4 files, making the grouping visible without moving files. But I think the move is cleaner.
+
+---
+
+### 2.2 Split `cli/mod.rs` — move arg structs to their command files
+
+**What:** Move each `*Args` struct from `cli/mod.rs` into the corresponding `cli/commands/*.rs` file. Keep `Cli` struct, `Commands` enum, and `detect_robot_mode_from_env()` in `cli/mod.rs`.
+
+**Currently `cli/mod.rs` (949 lines) contains:**
+- `Cli` struct (81 lines) — the root clap parser
+- `Commands` enum (193 lines) — all subcommand variants
+- `IssuesArgs` (86 lines) → move to `commands/list.rs` or stay near issues handling
+- `MrsArgs` (93 lines) → move to `commands/list.rs` or stay near MRs handling
+- `NotesArgs` (99 lines) → move to `commands/list.rs`
+- `IngestArgs` (33 lines) → move to `commands/ingest.rs`
+- `StatsArgs` (19 lines) → move to `commands/stats.rs`
+- `SearchArgs` (58 lines) → move to `commands/search.rs`
+- `GenerateDocsArgs` (9 lines) → move to `commands/generate_docs.rs`
+- `SyncArgs` (39 lines) → move to `commands/sync.rs`
+- `EmbedArgs` (15 lines) → move to `commands/embed.rs`
+- `TimelineArgs` (53 lines) → move to `commands/timeline.rs`
+- `WhoArgs` (76 lines) → move to `commands/who.rs`
+- `CountArgs` (9 lines) → move to `commands/count.rs`
+
+**After refactoring, `cli/mod.rs` shrinks to ~300 lines** (just `Cli` + `Commands` + the inlined variants like `Init`, `Drift`, `Backup`, `Reset`).
+
+**Rationale:** When adding a new flag to the `who` command, you currently have to edit `cli/mod.rs` (the args struct), `cli/commands/who.rs` (the implementation), and `main.rs` (the dispatch). If the args struct lives in `commands/who.rs`, you only need two files. This is the standard pattern in mature clap-based Rust CLIs.
+
+**Import changes needed:**
+- `main.rs` currently does `use lore::cli::{..., WhoArgs, ...}` — these would become `use lore::cli::commands::{..., WhoArgs, ...}` or the `commands/mod.rs` re-exports them
+- Each `commands/*.rs` gets its own `#[derive(Parser)]` struct
+- `Commands` enum in `cli/mod.rs` keeps using the types but imports from `commands::*`
+
+**Risk: MEDIUM** — Lots of `use` path changes in `main.rs`, but purely mechanical. No logic changes.
+
+---
+
+## Tier 3: Consider Later
+
+### 3.1 Split `main.rs` (2713 lines)
+
+**The problem:** `main.rs` contains `main()`, ~30 `handle_*` functions, error handling, clap error formatting, fuzzy command matching, and the `robot-docs` JSON manifest (a 400+ line inline JSON literal).
+
+**Possible approach:**
+- Extract `handle_*` functions into `cli/dispatch.rs` (the routing layer)
+- Extract error handling into `cli/errors.rs`
+- Extract `handle_robot_docs` + the JSON manifest into `cli/robot_docs.rs`
+- Keep `main()` in `main.rs` at ~150 lines (just the tracing setup + dispatch call)
+
+**Why Tier 3:** This is the messiest split. The handler functions depend on the `cli::commands::*` functions AND the `cli::robot::*` helpers AND direct `std::process::exit` calls. Making this work cleanly requires careful thought about the error boundary between `main.rs` (binary) and `lib.rs` (library).
+
+**Risk: HIGH** — Every handler function touches `robot_mode`, constructs its own timer, opens the DB, and manages error display. The boilerplate is high but consistent, so splitting would just move it around without reducing complexity.
+
+---
+
+### 3.2 Split `cli/commands/who.rs` (6067 lines)
+
+**The problem:** This file implements 5 distinct modes (expert, workload, active, overlap, reviews), each with its own query, scoring model, and output formatting. It also includes the time-decay scoring model (~500 lines) and per-MR detail breakdown logic.
+
+**Possible split:**
+```
+src/cli/commands/who/
+├── mod.rs         ← WhoRun dispatcher, shared types
+├── expert.rs      ← expert mode (path-based file expertise lookup)
+├── workload.rs    ← workload mode (user's assigned issues/MRs)
+├── active.rs      ← active discussions mode
+├── overlap.rs     ← file overlap between users
+├── reviews.rs     ← review pattern analysis
+└── scoring.rs     ← time-decay expert scoring model
+```
+
+**Why Tier 3:** The 5 modes share many helper functions, database connection patterns, and output formatting logic. Splitting would require carefully identifying the shared helpers and deciding where they live. The file is big but internally consistent — the modes use a shared dispatcher pattern and common types.
+
+---
+
+### 3.3 Split `cli/commands/list.rs` (2931 lines)
+
+**The problem:** This file handles issue listing, MR listing, AND note listing — three related but distinct operations with separate query builders, output formatters, and test suites.
+
+**Possible split:**
+```
+src/cli/commands/
+├── list_issues.rs   ← issue listing + query builder
+├── list_mrs.rs      ← MR listing + query builder
+├── list_notes.rs    ← note listing + query builder
+└── list.rs          ← shared types (ListFilters, etc.) + re-exports
+```
+
+**Why Tier 3:** Same issue as `who.rs` — the three listing modes share query building patterns, field selection logic, and sorting code. Splitting requires identifying and extracting the shared pieces first.
+
+---
+
+## Files NOT Recommended to Move
+
+These files belong exactly where they are:
+
+| File | Why it belongs in `core/` |
+|------|--------------------------|
+| `config.rs` | Config types used by nearly everything |
+| `db.rs` | Database connection + migrations — foundational |
+| `error.rs` | Error types used by every module |
+| `paths.rs` | File path resolution — infrastructure |
+| `logging.rs` | Tracing setup — infrastructure |
+| `lock.rs` | Filesystem sync lock — infrastructure |
+| `shutdown.rs` | Graceful shutdown signal — infrastructure |
+| `backoff.rs` | Retry math — infrastructure |
+| `time.rs` | Time conversion — used everywhere |
+| `metrics.rs` | Tracing metrics layer — infrastructure |
+| `project.rs` | Fuzzy project resolution — used by 8+ consumers across modules |
+
+These files are legitimate "core infrastructure" used across multiple modules. Moving them would create import churn with no clarity gain.
+
+---
+
+## Files NOT Recommended to Split/Merge
+
+| File | Why leave it alone |
+|------|-------------------|
+| `documents/extractor.rs` (2341 lines) | One cohesive extractor per entity type — the size comes from per-type formatting logic, not mixed concerns |
+| `ingestion/orchestrator.rs` (1703 lines) | Single orchestration flow — splitting would scatter the pipeline |
+| `gitlab/graphql.rs` (1293 lines) | GraphQL client with adaptive paging — cohesive |
+| `gitlab/client.rs` (851 lines) | REST client with all endpoints — cohesive |
+| `cli/autocorrect.rs` (945 lines) | Correction registry + fuzzy matching — splitting gains nothing |
+
+---
+
+## Proposed Final Structure (Tiers 1+2)
+
+```
+src/
+├── main.rs              (2713 lines — unchanged for now)
+├── lib.rs               (adds: pub mod timeline; pub mod xref;)
+├── cli/
+│   ├── mod.rs           (~300 lines — Cli + Commands only, args moved out)
+│   ├── autocorrect.rs   (unchanged)
+│   ├── progress.rs      (unchanged)
+│   ├── robot.rs         (unchanged)
+│   └── commands/
+│       ├── mod.rs       (re-exports + WhoArgs, IssuesArgs, etc.)
+│       ├── (all existing files — unchanged but with args structs moved in)
+│       └── ...
+├── core/                (slimmed: 14 files → infrastructure only)
+│   ├── mod.rs
+│   ├── backoff.rs
+│   ├── config.rs
+│   ├── db.rs
+│   ├── error.rs
+│   ├── lock.rs
+│   ├── logging.rs
+│   ├── metrics.rs
+│   ├── paths.rs
+│   ├── project.rs
+│   ├── shutdown.rs
+│   └── time.rs
+├── timeline/            (NEW — extracted from core/)
+│   ├── mod.rs           (types from core/timeline.rs)
+│   ├── seed.rs          (from core/timeline_seed.rs)
+│   ├── expand.rs        (from core/timeline_expand.rs)
+│   └── collect.rs       (from core/timeline_collect.rs)
+├── xref/                (NEW — extracted from core/)
+│   ├── mod.rs
+│   ├── note_parser.rs   (from core/note_parser.rs)
+│   └── references.rs    (from core/references.rs)
+├── ingestion/           (gains 4 files from core/)
+│   ├── (existing files...)
+│   ├── events_db.rs     (from core/events_db.rs)
+│   ├── dependent_queue.rs (from core/dependent_queue.rs)
+│   ├── payloads.rs      (from core/payloads.rs)
+│   └── sync_run.rs      (from core/sync_run.rs)
+├── documents/           (unchanged)
+├── embedding/           (unchanged)
+├── gitlab/              (unchanged)
+└── search/              (unchanged)
+```
+
+---
+
+## Import Change Tracking
+
+### Tier 1.1: Timeline extraction
+
+| Consumer file | Old import | New import |
+|---------------|-----------|------------|
+| `cli/commands/timeline.rs:10-15` | `crate::core::timeline::*` | `crate::timeline::*` |
+| `cli/commands/timeline.rs:13` | `crate::core::timeline_collect::collect_events` | `crate::timeline::collect_events` (or `crate::timeline::collect::collect_events`) |
+| `cli/commands/timeline.rs:14` | `crate::core::timeline_expand::expand_timeline` | `crate::timeline::expand_timeline` |
+| `cli/commands/timeline.rs:15` | `crate::core::timeline_seed::seed_timeline` | `crate::timeline::seed_timeline` |
+| `core/timeline_seed.rs:7-8` | `super::timeline::*` | `super::*` (or `crate::timeline::*` depending on structure) |
+| `core/timeline_expand.rs:6` | `super::timeline::*` | `super::*` |
+| `core/timeline_collect.rs:4` | `super::timeline::*` | `super::*` |
+| `core/timeline_seed.rs:8` | `crate::search::*` | `crate::search::*` (no change) |
+| `core/timeline_seed.rs:6-7` | `super::error::Result` | `crate::core::error::Result` |
+| `core/timeline_expand.rs:5` | `super::error::Result` | `crate::core::error::Result` |
+| `core/timeline_collect.rs:3` | `super::error::*` | `crate::core::error::*` |
+
+### Tier 1.2: Cross-reference extraction
+
+| Consumer file | Old import | New import |
+|---------------|-----------|------------|
+| `ingestion/orchestrator.rs:10-12` | `crate::core::references::*` | `crate::xref::references::*` |
+| `core/note_parser.rs:7-8` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
+| `core/references.rs:4-5` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
+
+### Tier 2.1: Ingestion-adjacent DB ops
+
+| Consumer file | Old import | New import |
+|---------------|-----------|------------|
+| `cli/commands/count.rs:9` | `crate::core::events_db::*` | `crate::ingestion::events_db::*` |
+| `ingestion/orchestrator.rs:6-8` | `crate::core::dependent_queue::*` | `super::dependent_queue::*` |
+| `main.rs:37` | `crate::core::dependent_queue::release_all_locked_jobs` | `crate::ingestion::dependent_queue::release_all_locked_jobs` |
+| `ingestion/discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
+| `ingestion/issues.rs:9` | `crate::core::payloads::*` | `super::payloads::*` |
+| `ingestion/merge_requests.rs:8` | `crate::core::payloads::*` | `super::payloads::*` |
+| `ingestion/mr_discussions.rs:7` | `crate::core::payloads::*` | `super::payloads::*` |
+| `cli/commands/sync.rs` | (uses `crate::core::sync_run::*`) | `crate::ingestion::sync_run::*` |
+| `cli/commands/sync_status.rs` | (uses `crate::core::sync_run::*` or `crate::core::metrics::*`) | check and update |
+| Internal: `events_db.rs:4-5` | `super::error::*`, `super::time::*` | `crate::core::error::*`, `crate::core::time::*` |
+| Internal: `dependent_queue.rs:5-6` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
+| Internal: `payloads.rs:9-10` | `super::error::Result`, `super::time::now_ms` | `crate::core::error::Result`, `crate::core::time::now_ms` |
+| Internal: `sync_run.rs:2-4` | `super::error::*`, `super::metrics::*`, `super::time::*` | `crate::core::error::*`, `crate::core::metrics::*`, `crate::core::time::*` |
+
+---
+
+## Execution Order
+
+1. **Tier 1.1** — Extract timeline → `src/timeline/` (LOW risk, 1 consumer)
+2. **Tier 1.2** — Extract xref → `src/xref/` (LOW risk, 1-2 consumers)
+3. **Cargo check + clippy + test** after each tier
+4. **Tier 2.1** — Move ingestion DB ops (MEDIUM risk, more consumers)
+5. **Cargo check + clippy + test**
+6. **Tier 2.2** — Split `cli/mod.rs` args (MEDIUM risk, mostly mechanical)
+7. **Cargo check + clippy + test + fmt**
+
+Each tier should be its own commit for easy rollback.
+
+---
+
+## What This Achieves
+
+**Before:** A developer looking at `core/` sees 22 files and has to mentally sort "infrastructure vs. domain logic vs. pipeline stage." The timeline pipeline is invisible unless you know to look in `core/`.
+
+**After:**
+- `core/` has 12 files, all clearly infrastructure (db, config, error, paths, logging, lock, shutdown, backoff, time, metrics, project)
+- `timeline/` is a discoverable first-class module showing the 5-stage pipeline
+- `xref/` makes the cross-reference extraction domain visible
+- `ingestion/` contains everything related to data fetching: the orchestrator, entity ingestors, AND their supporting DB operations
+- `cli/mod.rs` is lean — just the top-level Cli struct and Commands enum
+
+A new developer (or coding agent) can now answer "where is the timeline code?" → `src/timeline/`, "where is ingestion?" → `src/ingestion/`, "where is cross-reference extraction?" → `src/xref/`, without needing institutional knowledge.
--- a/README.md
+++ b/README.md
@@ -19,7 +19,10 @@ Local GitLab data management with semantic search, people intelligence, and temp
 - **Cross-reference tracking**: Automatic extraction of "closes", "mentioned" relationships between MRs and issues
 - **Work item status enrichment**: Fetches issue statuses (e.g., "To do", "In progress", "Done") from GitLab's GraphQL API with adaptive page sizing, color-coded display, and case-insensitive filtering
 - **Resource event history**: Tracks state changes, label events, and milestone events for issues and MRs
+- **Note querying**: Rich filtering over discussion notes by author, type, path, resolution status, time range, and body content
+- **Discussion drift detection**: Semantic analysis of how discussions diverge from original issue intent
 - **Robot mode**: Machine-readable JSON output with structured errors, meaningful exit codes, and actionable recovery steps
+- **Error tolerance**: Auto-corrects common CLI mistakes (case, typos, single-dash flags, value casing) with teaching feedback
 - **Observability**: Verbosity controls, JSON log format, structured metrics, and stage timing

 ## Installation
@@ -71,6 +74,12 @@ lore who @asmith
 # Timeline of events related to deployments
 lore timeline "deployment"

+# Timeline for a specific issue
+lore timeline issue:42
+
+# Query notes by author
+lore notes --author alice --since 7d
+
 # Robot mode (machine-readable JSON)
 lore -J issues -n 5 | jq .
 ```
@@ -109,6 +118,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
    "model": "nomic-embed-text",
    "baseUrl": "http://localhost:11434",
    "concurrency": 4
+  },
+  "scoring": {
+    "authorWeight": 25,
+    "reviewerWeight": 10,
+    "noteBonus": 1,
+    "authorHalfLifeDays": 180,
+    "reviewerHalfLifeDays": 90,
+    "noteHalfLifeDays": 45,
+    "excludedUsernames": ["bot-user"]
  }
 }
 ```
@@ -135,6 +153,15 @@ Configuration is stored in `~/.config/lore/config.json` (or `$XDG_CONFIG_HOME/lo
 | `embedding` | `model` | `nomic-embed-text` | Model name for embeddings |
 | `embedding` | `baseUrl` | `http://localhost:11434` | Ollama server URL |
 | `embedding` | `concurrency` | `4` | Concurrent embedding requests |
+| `scoring` | `authorWeight` | `25` | Points per MR where the user authored code touching the path |
+| `scoring` | `reviewerWeight` | `10` | Points per MR where the user reviewed code touching the path |
+| `scoring` | `noteBonus` | `1` | Bonus per inline review comment (DiffNote) |
+| `scoring` | `reviewerAssignmentWeight` | `3` | Points per MR where the user was assigned as reviewer |
+| `scoring` | `authorHalfLifeDays` | `180` | Half-life in days for author contribution decay |
+| `scoring` | `reviewerHalfLifeDays` | `90` | Half-life in days for reviewer contribution decay |
+| `scoring` | `noteHalfLifeDays` | `45` | Half-life in days for note/comment decay |
+| `scoring` | `closedMrMultiplier` | `0.5` | Score multiplier for closed (not merged) MRs |
+| `scoring` | `excludedUsernames` | `[]` | Usernames excluded from expert results (e.g., bots) |

 ### Config File Resolution

@@ -262,18 +289,21 @@ lore search "login flow" --mode semantic      # Vector similarity only
 lore search "auth" --type issue               # Filter by source type
 lore search "auth" --type mr                  # MR documents only
 lore search "auth" --type discussion          # Discussion documents only
+lore search "auth" --type note               # Individual notes only
 lore search "deploy" --author username        # Filter by author
 lore search "deploy" -p group/repo           # Filter by project
 lore search "deploy" --label backend          # Filter by label (AND logic)
 lore search "deploy" --path src/             # Filter by file path (trailing / for prefix)
-lore search "deploy" --after 7d              # Created after (7d, 2w, 1m, or YYYY-MM-DD)
-lore search "deploy" --updated-after 2w      # Updated after
+lore search "deploy" --since 7d              # Created since (7d, 2w, 1m, or YYYY-MM-DD)
+lore search "deploy" --updated-since 2w      # Updated since
 lore search "deploy" -n 50                    # Limit results (default 20, max 100)
 lore search "deploy" --explain               # Show ranking explanation per result
 lore search "deploy" --fts-mode raw          # Raw FTS5 query syntax (advanced)
 ```

-The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. Use `raw` for advanced FTS5 query syntax (AND, OR, NOT, phrase matching, prefix queries).
+The `--fts-mode` flag defaults to `safe`, which sanitizes user input into valid FTS5 queries with automatic fallback. FTS5 boolean operators (`AND`, `OR`, `NOT`, `NEAR`) are passed through in safe mode, so queries like `"switch AND health"` work without switching to raw mode. Use `raw` for advanced FTS5 query syntax (phrase matching, column filters, prefix queries).
+
+A progress spinner displays during search, showing the active mode (e.g., `Searching (hybrid)...`). In robot mode, spinners are suppressed for clean JSON output.

 Requires `lore generate-docs` (or `lore sync`) to have been run at least once. Semantic and hybrid modes require `lore embed` (or `lore sync`) to have generated vector embeddings via Ollama.

@@ -283,7 +313,7 @@ People intelligence: discover experts, analyze workloads, review patterns, activ

 #### Expert Mode

-Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis).
+Find who has expertise in a code area based on authoring and reviewing history (DiffNote analysis). Scores use exponential half-life decay so recent contributions count more than older ones. Scoring weights and half-life periods are configurable via the `scoring` config section.

 ```bash
 lore who src/features/auth/           # Who knows about this directory?
@@ -292,6 +322,9 @@ lore who --path README.md             # Root files need --path flag
 lore who --path Makefile              # Dotless root files too
 lore who src/ --since 3m              # Limit to recent 3 months
 lore who src/ -p group/repo           # Scope to project
+lore who src/ --explain-score         # Show per-component score breakdown
+lore who src/ --as-of 30d            # Score as if "now" was 30 days ago
+lore who src/ --include-bots          # Include bot users in results
 ```

 The target is auto-detected as a path when it contains `/`. For root files without `/` (e.g., `README.md`), use the `--path` flag. Default time window: 6 months.
@@ -348,13 +381,22 @@ Shows: users with touch counts (author vs. review), linked MR references. Defaul
 | `-p` / `--project` | Scope to a project (fuzzy match) |
 | `--since` | Time window (7d, 2w, 6m, YYYY-MM-DD). Default varies by mode. |
 | `-n` / `--limit` | Max results per section (1-500, default 20) |
+| `--all-history` | Remove the default time window, query all history |
+| `--detail` | Show per-MR detail breakdown (expert mode only) |
+| `--explain-score` | Show per-component score breakdown (expert mode only) |
+| `--as-of` | Score as if "now" is a past date (ISO 8601 or duration like 30d, expert mode only) |
+| `--include-bots` | Include bot users normally excluded via `scoring.excludedUsernames` |

 ### `lore timeline`

 Reconstruct a chronological timeline of events matching a keyword query. The pipeline discovers related entities through cross-reference graph traversal and assembles a unified, time-ordered event stream.

 ```bash
-lore timeline "deployment"                    # Events related to deployments
+lore timeline "deployment"                    # Search-based seeding (hybrid search)
+lore timeline issue:42                        # Direct entity seeding by issue IID
+lore timeline i:42                            # Shorthand for issue:42
+lore timeline mr:99                           # Direct entity seeding by MR IID
+lore timeline m:99                            # Shorthand for mr:99
 lore timeline "auth" -p group/repo           # Scoped to a project
 lore timeline "auth" --since 30d             # Only recent events
 lore timeline "migration" --depth 2          # Deeper cross-reference expansion
@@ -363,6 +405,8 @@ lore timeline "deploy" -n 50                 # Limit event count
 lore timeline "auth" --max-seeds 5           # Fewer seed entities
 ```

+The query can be either a search string (hybrid search finds matching entities) or an entity reference (`issue:N`, `i:N`, `mr:N`, `m:N`) which directly seeds the timeline from a specific entity and its cross-references.
+
 #### Flags

 | Flag | Default | Description |
@@ -375,13 +419,16 @@ lore timeline "auth" --max-seeds 5           # Fewer seed entities
 | `--max-seeds` | `10` | Maximum seed entities from search |
 | `--max-entities` | `50` | Maximum entities discovered via cross-references |
 | `--max-evidence` | `10` | Maximum evidence notes included |
+| `--fields` | all | Select output fields (comma-separated, or 'minimal' preset) |

 #### Pipeline Stages

-1. **SEED** -- Full-text search identifies the most relevant issues and MRs matching the query. Documents are ranked by BM25 relevance.
-2. **HYDRATE** -- Evidence notes are extracted: the top FTS-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced.
+Each stage displays a numbered progress spinner (e.g., `[1/3] Seeding timeline...`). In robot mode, spinners are suppressed for clean JSON output.
+
+1. **SEED** -- Hybrid search (FTS5 lexical + Ollama vector similarity via Reciprocal Rank Fusion) identifies the most relevant issues and MRs. Falls back to lexical-only if Ollama is unavailable. Discussion notes matching the query are also discovered and attached to their parent entities.
+2. **HYDRATE** -- Evidence notes are extracted: the top search-matched discussion notes with 200-character snippets explaining *why* each entity was surfaced. Matched discussions are collected as full thread candidates.
 3. **EXPAND** -- Breadth-first traversal over the `entity_references` graph discovers related entities via "closes", "related", and optionally "mentioned" references up to the configured depth.
-4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, and evidence notes. Events are sorted chronologically with stable tiebreaking.
+4. **COLLECT** -- Events are gathered for all discovered entities. Event types include: creation, state changes, label adds/removes, milestone assignments, merge events, evidence notes, and full discussion threads. Events are sorted chronologically with stable tiebreaking.
 5. **RENDER** -- Events are formatted as human-readable text or structured JSON (robot mode).

 #### Event Types
@@ -395,13 +442,70 @@ lore timeline "auth" --max-seeds 5           # Fewer seed entities
 | `MilestoneSet` | Milestone assigned |
 | `MilestoneRemoved` | Milestone removed |
 | `Merged` | MR merged (deduplicated against state events) |
-| `NoteEvidence` | Discussion note matched by FTS, with snippet |
+| `NoteEvidence` | Discussion note matched by search, with snippet |
+| `DiscussionThread` | Full discussion thread with all non-system notes |
 | `CrossReferenced` | Reference to another entity |

 #### Unresolved References

 When graph expansion encounters cross-project references to entities not yet synced locally, these are collected as unresolved references in the output. This enables discovery of external dependencies and can inform future sync targets.

+### `lore notes`
+
+Query individual notes from discussions with rich filtering options.
+
+```bash
+lore notes                                    # List 50 most recent notes
+lore notes --author alice --since 7d         # Notes by alice in last 7 days
+lore notes --for-issue 42 -p group/repo      # Notes on issue #42
+lore notes --for-mr 99 -p group/repo         # Notes on MR !99
+lore notes --path src/ --resolution unresolved  # Unresolved diff notes in src/
+lore notes --note-type DiffNote              # Only inline code review comments
+lore notes --contains "TODO"                 # Substring search in note body
+lore notes --include-system                  # Include system-generated notes
+lore notes --since 2w --until 2024-12-31     # Time-bounded range
+lore notes --sort updated --asc              # Sort by update time, ascending
+lore notes --format csv                      # CSV output
+lore notes --format jsonl                    # Line-delimited JSON
+lore notes -o                                # Open first result in browser
+
+# Field selection (robot mode)
+lore -J notes --fields minimal               # Compact: id, author_username, body, created_at_iso
+```
+
+#### Filters
+
+| Flag | Description |
+|------|-------------|
+| `-a` / `--author` | Filter by note author username |
+| `--note-type` | Filter by note type (DiffNote, DiscussionNote) |
+| `--contains` | Substring search in note body |
+| `--note-id` | Filter by internal note ID |
+| `--gitlab-note-id` | Filter by GitLab note ID |
+| `--discussion-id` | Filter by discussion ID |
+| `--include-system` | Include system notes (excluded by default) |
+| `--for-issue` | Notes on a specific issue IID (requires `-p`) |
+| `--for-mr` | Notes on a specific MR IID (requires `-p`) |
+| `-p` / `--project` | Scope to a project (fuzzy match) |
+| `--since` | Notes created since (7d, 2w, 1m, or YYYY-MM-DD) |
+| `--until` | Notes created until (YYYY-MM-DD, inclusive end-of-day) |
+| `--path` | Filter by file path (DiffNotes only; trailing `/` for prefix match) |
+| `--resolution` | Filter by resolution status (`any`, `unresolved`, `resolved`) |
+| `--sort` | Sort by `created` (default) or `updated` |
+| `--asc` | Sort ascending (default: descending) |
+| `--format` | Output format: `table` (default), `json`, `jsonl`, `csv` |
+| `-o` / `--open` | Open first result in browser |
+
+### `lore drift`
+
+Detect discussion divergence from the original intent of an issue by comparing the semantic similarity of discussion content against the issue description.
+
+```bash
+lore drift issues 42                         # Check divergence on issue #42
+lore drift issues 42 --threshold 0.6        # Higher threshold (stricter)
+lore drift issues 42 -p group/repo          # Scope to project
+```
+
 ### `lore sync`

 Run the full sync pipeline: ingest from GitLab (including work item status enrichment via GraphQL), generate searchable documents, and compute embeddings.
@@ -413,6 +517,7 @@ lore sync --force            # Override stale lock
 lore sync --no-embed         # Skip embedding step
 lore sync --no-docs          # Skip document regeneration
 lore sync --no-events        # Skip resource event fetching
+lore sync --no-file-changes  # Skip MR file change fetching
 lore sync --dry-run          # Preview what would be synced
 ```

@@ -571,6 +676,7 @@ Machine-readable command manifest for agent self-discovery. Returns a JSON schem
 ```bash
 lore robot-docs                   # Pretty-printed JSON
 lore --robot robot-docs           # Compact JSON for parsing
+lore robot-docs --brief           # Omit response_schema (~60% smaller)
 ```

 ### `lore version`
@@ -622,7 +728,7 @@ The `actions` array contains executable shell commands an agent can run to recov

 ### Field Selection

-The `--fields` flag on `issues` and `mrs` list commands controls which fields appear in the JSON response, reducing token usage for AI agent workflows:
+The `--fields` flag controls which fields appear in the JSON response, reducing token usage for AI agent workflows. Supported on `issues`, `mrs`, `notes`, `search`, `timeline`, and `who` list commands:

 ```bash
 # Minimal preset (~60% fewer tokens)
@@ -639,6 +745,48 @@ Valid fields for issues: `iid`, `title`, `state`, `author_username`, `labels`, `

 Valid fields for MRs: `iid`, `title`, `state`, `author_username`, `labels`, `draft`, `target_branch`, `source_branch`, `discussion_count`, `unresolved_count`, `created_at_iso`, `updated_at_iso`, `web_url`, `project_path`, `reviewers`

+### Error Tolerance
+
+The CLI auto-corrects common mistakes before parsing, emitting a teaching note to stderr. Corrections work in both human and robot modes:
+
+| Correction | Example | Mode |
+|-----------|---------|------|
+| Single-dash long flag | `-robot` -> `--robot` | All |
+| Case normalization | `--Robot` -> `--robot` | All |
+| Flag prefix expansion | `--proj` -> `--project` (unambiguous only) | All |
+| Fuzzy flag match | `--projct` -> `--project` | All (threshold 0.9 in robot, 0.8 in human) |
+| Subcommand alias | `merge_requests` -> `mrs`, `robotdocs` -> `robot-docs` | All |
+| Value normalization | `--state Opened` -> `--state opened` | All |
+| Value fuzzy match | `--state opend` -> `--state opened` | All |
+| Subcommand prefix | `lore iss` -> `lore issues` (unambiguous only, via clap) | All |
+
+In robot mode, corrections emit structured JSON to stderr:
+
+```json
+{"warning":{"type":"ARG_CORRECTED","corrections":[...],"teaching":["Use double-dash for long flags: --robot (not -robot)"]}}
+```
+
+When a command or flag is still unrecognized after corrections, the error response includes a fuzzy suggestion and, for enum-like flags, lists valid values:
+
+```json
+{"error":{"code":"UNKNOWN_COMMAND","message":"...","suggestion":"Did you mean 'lore issues'? Example: lore --robot issues -n 10. Run 'lore robot-docs' for all commands"}}
+```
+
+### Command Aliases
+
+Commands accept aliases for common variations:
+
+| Primary | Aliases |
+|---------|---------|
+| `issues` | `issue` |
+| `mrs` | `mr`, `merge-requests`, `merge-request` |
+| `notes` | `note` |
+| `search` | `find`, `query` |
+| `stats` | `stat` |
+| `status` | `st` |
+
+Unambiguous prefixes also work via subcommand inference (e.g., `lore iss` -> `lore issues`, `lore time` -> `lore timeline`).
+
 ### Agent Self-Discovery

 The `robot-docs` command provides a complete machine-readable manifest including response schemas for every command:
--- a/migrations/022_notes_query_index.sql
+++ b/migrations/022_notes_query_index.sql
@@ -0,0 +1,21 @@
+-- Migration 022: Composite query indexes for notes + author_id column
+-- Optimizes author-scoped and project-scoped date-range queries on notes.
+-- Adds discussion JOIN indexes and immutable author identity column.
+
+-- Composite index for author-scoped queries (who command, notes --author)
+CREATE INDEX IF NOT EXISTS idx_notes_user_created
+ON notes(project_id, author_username COLLATE NOCASE, created_at DESC, id DESC)
+WHERE is_system = 0;
+
+-- Composite index for project-scoped date-range queries
+CREATE INDEX IF NOT EXISTS idx_notes_project_created
+ON notes(project_id, created_at DESC, id DESC)
+WHERE is_system = 0;
+
+-- Discussion JOIN indexes
+CREATE INDEX IF NOT EXISTS idx_discussions_issue_id ON discussions(issue_id);
+CREATE INDEX IF NOT EXISTS idx_discussions_mr_id ON discussions(merge_request_id);
+
+-- Immutable author identity column (GitLab numeric user ID)
+ALTER TABLE notes ADD COLUMN author_id INTEGER;
+CREATE INDEX IF NOT EXISTS idx_notes_author_id ON notes(author_id) WHERE author_id IS NOT NULL;
--- a/migrations/024_note_documents.sql
+++ b/migrations/024_note_documents.sql
@@ -0,0 +1,153 @@
+-- Migration 024: Add 'note' source_type to documents and dirty_sources
+-- SQLite does not support ALTER CONSTRAINT, so we use the table-rebuild pattern.
+
+-- ============================================================
+-- 1. Rebuild dirty_sources with updated CHECK constraint
+-- ============================================================
+
+CREATE TABLE dirty_sources_new (
+  source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
+  source_id INTEGER NOT NULL,
+  queued_at INTEGER NOT NULL,
+  attempt_count INTEGER NOT NULL DEFAULT 0,
+  last_attempt_at INTEGER,
+  last_error TEXT,
+  next_attempt_at INTEGER,
+  PRIMARY KEY(source_type, source_id)
+);
+
+INSERT INTO dirty_sources_new SELECT * FROM dirty_sources;
+DROP TABLE dirty_sources;
+ALTER TABLE dirty_sources_new RENAME TO dirty_sources;
+CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
+
+-- ============================================================
+-- 2. Rebuild documents with updated CHECK constraint
+-- ============================================================
+
+-- 2a. Backup junction table data
+CREATE TEMP TABLE _doc_labels_backup AS SELECT * FROM document_labels;
+CREATE TEMP TABLE _doc_paths_backup AS SELECT * FROM document_paths;
+
+-- 2b. Drop all triggers that reference documents
+DROP TRIGGER IF EXISTS documents_ai;
+DROP TRIGGER IF EXISTS documents_ad;
+DROP TRIGGER IF EXISTS documents_au;
+DROP TRIGGER IF EXISTS documents_embeddings_ad;
+
+-- 2c. Drop junction tables (they have FK references to documents)
+DROP TABLE IF EXISTS document_labels;
+DROP TABLE IF EXISTS document_paths;
+
+-- 2d. Create new documents table with 'note' in CHECK constraint
+CREATE TABLE documents_new (
+  id INTEGER PRIMARY KEY,
+  source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
+  source_id INTEGER NOT NULL,
+  project_id INTEGER NOT NULL REFERENCES projects(id),
+  author_username TEXT,
+  label_names TEXT,
+  created_at INTEGER,
+  updated_at INTEGER,
+  url TEXT,
+  title TEXT,
+  content_text TEXT NOT NULL,
+  content_hash TEXT NOT NULL,
+  labels_hash TEXT NOT NULL DEFAULT '',
+  paths_hash TEXT NOT NULL DEFAULT '',
+  is_truncated INTEGER NOT NULL DEFAULT 0,
+  truncated_reason TEXT CHECK (
+    truncated_reason IN (
+      'token_limit_middle_drop','single_note_oversized','first_last_oversized',
+      'hard_cap_oversized'
+    )
+    OR truncated_reason IS NULL
+  ),
+  UNIQUE(source_type, source_id)
+);
+
+-- 2e. Copy all existing data
+INSERT INTO documents_new SELECT * FROM documents;
+
+-- 2f. Swap tables
+DROP TABLE documents;
+ALTER TABLE documents_new RENAME TO documents;
+
+-- 2g. Recreate all indexes on documents
+CREATE INDEX idx_documents_project_updated ON documents(project_id, updated_at);
+CREATE INDEX idx_documents_author ON documents(author_username);
+CREATE INDEX idx_documents_source ON documents(source_type, source_id);
+CREATE INDEX idx_documents_hash ON documents(content_hash);
+
+-- 2h. Recreate junction tables
+CREATE TABLE document_labels (
+  document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+  label_name TEXT NOT NULL,
+  PRIMARY KEY(document_id, label_name)
+) WITHOUT ROWID;
+CREATE INDEX idx_document_labels_label ON document_labels(label_name);
+
+CREATE TABLE document_paths (
+  document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+  path TEXT NOT NULL,
+  PRIMARY KEY(document_id, path)
+) WITHOUT ROWID;
+CREATE INDEX idx_document_paths_path ON document_paths(path);
+
+-- 2i. Restore junction table data from backups
+INSERT INTO document_labels SELECT * FROM _doc_labels_backup;
+INSERT INTO document_paths SELECT * FROM _doc_paths_backup;
+
+-- 2j. Recreate FTS triggers (from migration 008)
+CREATE TRIGGER documents_ai AFTER INSERT ON documents BEGIN
+  INSERT INTO documents_fts(rowid, title, content_text)
+  VALUES (new.id, COALESCE(new.title, ''), new.content_text);
+END;
+
+CREATE TRIGGER documents_ad AFTER DELETE ON documents BEGIN
+  INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
+  VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
+END;
+
+CREATE TRIGGER documents_au AFTER UPDATE ON documents
+WHEN old.title IS NOT new.title OR old.content_text != new.content_text
+BEGIN
+  INSERT INTO documents_fts(documents_fts, rowid, title, content_text)
+  VALUES('delete', old.id, COALESCE(old.title, ''), old.content_text);
+  INSERT INTO documents_fts(rowid, title, content_text)
+  VALUES (new.id, COALESCE(new.title, ''), new.content_text);
+END;
+
+-- 2k. Recreate embeddings cleanup trigger (from migration 009)
+CREATE TRIGGER documents_embeddings_ad AFTER DELETE ON documents BEGIN
+  DELETE FROM embeddings
+    WHERE rowid >= old.id * 1000
+      AND rowid < (old.id + 1) * 1000;
+END;
+
+-- 2l. Rebuild FTS index to ensure consistency after table swap
+INSERT INTO documents_fts(documents_fts) VALUES('rebuild');
+
+-- ============================================================
+-- 3. Defense triggers: clean up documents when notes are
+--    deleted or flipped to system notes
+-- ============================================================
+
+CREATE TRIGGER notes_ad_cleanup AFTER DELETE ON notes
+WHEN old.is_system = 0
+BEGIN
+  DELETE FROM documents WHERE source_type = 'note' AND source_id = old.id;
+END;
+
+CREATE TRIGGER notes_au_system_cleanup AFTER UPDATE OF is_system ON notes
+WHEN NEW.is_system = 1 AND OLD.is_system = 0
+BEGIN
+  DELETE FROM documents WHERE source_type = 'note' AND source_id = OLD.id;
+END;
+
+-- ============================================================
+-- 4. Drop temp backup tables
+-- ============================================================
+
+DROP TABLE IF EXISTS _doc_labels_backup;
+DROP TABLE IF EXISTS _doc_paths_backup;
--- a/migrations/025_note_dirty_backfill.sql
+++ b/migrations/025_note_dirty_backfill.sql
@@ -0,0 +1,8 @@
+-- Backfill existing non-system notes into dirty queue for document generation.
+-- Only seeds notes that don't already have documents and aren't already queued.
+INSERT INTO dirty_sources (source_type, source_id, queued_at)
+SELECT 'note', n.id, CAST(strftime('%s', 'now') AS INTEGER) * 1000
+FROM notes n
+LEFT JOIN documents d ON d.source_type = 'note' AND d.source_id = n.id
+WHERE n.is_system = 0 AND d.id IS NULL
+ON CONFLICT(source_type, source_id) DO NOTHING;
--- a/migrations/026_scoring_indexes.sql
+++ b/migrations/026_scoring_indexes.sql
@@ -0,0 +1,20 @@
+-- Indexes for time-decay expert scoring: dual-path matching and reviewer participation.
+
+CREATE INDEX IF NOT EXISTS idx_notes_old_path_author
+  ON notes(position_old_path, author_username, created_at)
+  WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
+
+CREATE INDEX IF NOT EXISTS idx_mfc_old_path_project_mr
+  ON mr_file_changes(old_path, project_id, merge_request_id)
+  WHERE old_path IS NOT NULL;
+
+CREATE INDEX IF NOT EXISTS idx_mfc_new_path_project_mr
+  ON mr_file_changes(new_path, project_id, merge_request_id);
+
+CREATE INDEX IF NOT EXISTS idx_notes_diffnote_discussion_author
+  ON notes(discussion_id, author_username, created_at)
+  WHERE note_type = 'DiffNote' AND is_system = 0;
+
+CREATE INDEX IF NOT EXISTS idx_notes_old_path_project_created
+  ON notes(position_old_path, project_id, created_at)
+  WHERE note_type = 'DiffNote' AND is_system = 0 AND position_old_path IS NOT NULL;
--- a/plans/time-decay-expert-scoring.md
+++ b/plans/time-decay-expert-scoring.md
@@ -4,7 +4,7 @@ title: ""
 status: iterating
 iteration: 6
 target_iterations: 8
-beads_revision: 1
+beads_revision: 2
 related_plans: []
 created: 2026-02-08
 updated: 2026-02-12
--- a/src/cli/autocorrect.rs
+++ b/src/cli/autocorrect.rs
@@ -21,6 +21,10 @@ pub enum CorrectionRule {
    SingleDashLongFlag,
    CaseNormalization,
    FuzzyFlag,
+    SubcommandAlias,
+    ValueNormalization,
+    ValueFuzzy,
+    FlagPrefix,
 }

 /// Result of the correction pass over raw args.
@@ -183,9 +187,38 @@ const COMMAND_FLAGS: &[(&str, &[&str])] = &[
            "--fields",
            "--detail",
            "--no-detail",
+            "--as-of",
+            "--explain-score",
+            "--include-bots",
+            "--all-history",
        ],
    ),
    ("drift", &["--threshold", "--project"]),
+    (
+        "notes",
+        &[
+            "--limit",
+            "--fields",
+            "--format",
+            "--author",
+            "--note-type",
+            "--contains",
+            "--note-id",
+            "--gitlab-note-id",
+            "--discussion-id",
+            "--include-system",
+            "--for-issue",
+            "--for-mr",
+            "--project",
+            "--since",
+            "--until",
+            "--path",
+            "--resolution",
+            "--sort",
+            "--asc",
+            "--open",
+        ],
+    ),
    (
        "init",
        &[
@@ -232,18 +265,45 @@ pub const ENUM_VALUES: &[(&str, &[&str])] = &[
    ("--state", &["opened", "closed", "merged", "locked", "all"]),
    ("--mode", &["lexical", "hybrid", "semantic"]),
    ("--sort", &["updated", "created", "iid"]),
-    ("--type", &["issue", "mr", "discussion"]),
+    ("--type", &["issue", "mr", "discussion", "note"]),
    ("--fts-mode", &["safe", "raw"]),
    ("--color", &["auto", "always", "never"]),
    ("--log-format", &["text", "json"]),
    ("--for", &["issue", "mr"]),
 ];

+// ---------------------------------------------------------------------------
+// Subcommand alias map (for forms clap aliases can't express)
+// ---------------------------------------------------------------------------
+
+/// Subcommand aliases for non-standard forms (underscores, no separators).
+/// Clap `visible_alias`/`alias` handles hyphenated forms (`merge-requests`);
+/// this map catches the rest.
+const SUBCOMMAND_ALIASES: &[(&str, &str)] = &[
+    ("merge_requests", "mrs"),
+    ("merge_request", "mrs"),
+    ("mergerequests", "mrs"),
+    ("mergerequest", "mrs"),
+    ("generate_docs", "generate-docs"),
+    ("generatedocs", "generate-docs"),
+    ("gendocs", "generate-docs"),
+    ("gen-docs", "generate-docs"),
+    ("robot_docs", "robot-docs"),
+    ("robotdocs", "robot-docs"),
+    ("sync_status", "status"),
+    ("syncstatus", "status"),
+    ("auth_test", "auth"),
+    ("authtest", "auth"),
+];
+
 // ---------------------------------------------------------------------------
 // Correction thresholds
 // ---------------------------------------------------------------------------

 const FUZZY_FLAG_THRESHOLD: f64 = 0.8;
+/// Stricter threshold for robot mode — only high-confidence corrections to
+/// avoid misleading agents. Still catches obvious typos like `--projct`.
+const FUZZY_FLAG_THRESHOLD_STRICT: f64 = 0.9;

 // ---------------------------------------------------------------------------
 // Core logic
@@ -303,20 +363,29 @@ fn valid_flags_for(subcommand: Option<&str>) -> Vec<&'static str> {

 /// Run the pre-clap correction pass on raw args.
 ///
-/// When `strict` is true (robot mode), only deterministic corrections are applied
-/// (single-dash long flags, case normalization). Fuzzy matching is disabled to
-/// prevent misleading agents with speculative corrections.
+/// Three-phase pipeline:
+/// - Phase A: Subcommand alias correction (case-insensitive alias map)
+/// - Phase B: Per-arg flag corrections (single-dash, case, prefix, fuzzy)
+/// - Phase C: Enum value normalization (case + fuzzy + prefix on known values)
+///
+/// When `strict` is true (robot mode), fuzzy matching uses a higher threshold
+/// (0.9 vs 0.8) to avoid speculative corrections while still catching obvious
+/// typos like `--projct` → `--project`.
 ///
 /// Returns the (possibly modified) args and any corrections applied.
 pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
-    let subcommand = detect_subcommand(&raw);
-    let valid = valid_flags_for(subcommand);
-
-    let mut corrected = Vec::with_capacity(raw.len());
    let mut corrections = Vec::new();
+
+    // Phase A: Subcommand alias correction
+    let args = correct_subcommand(raw, &mut corrections);
+
+    // Phase B: Per-arg flag corrections
+    let valid = valid_flags_for(detect_subcommand(&args));
+
+    let mut corrected = Vec::with_capacity(args.len());
    let mut past_terminator = false;

-    for arg in raw {
+    for arg in args {
        // B1: Stop correcting after POSIX `--` option terminator
        if arg == "--" {
            past_terminator = true;
@@ -338,12 +407,177 @@ pub fn correct_args(raw: Vec<String>, strict: bool) -> CorrectionResult {
        }
    }

+    // Phase C: Enum value normalization
+    normalize_enum_values(&mut corrected, &mut corrections);
+
    CorrectionResult {
        args: corrected,
        corrections,
    }
 }

+/// Phase A: Replace subcommand aliases with their canonical names.
+///
+/// Handles forms that can't be expressed as clap `alias`/`visible_alias`
+/// (underscores, no-separator forms). Case-insensitive matching.
+fn correct_subcommand(mut args: Vec<String>, corrections: &mut Vec<Correction>) -> Vec<String> {
+    // Find the subcommand position index, then check the alias map.
+    // Can't use iterators easily because we need to mutate args[i].
+    let mut skip_next = false;
+    let mut subcmd_idx = None;
+    for (i, arg) in args.iter().enumerate().skip(1) {
+        if skip_next {
+            skip_next = false;
+            continue;
+        }
+        if arg.starts_with('-') {
+            if arg.contains('=') {
+                continue;
+            }
+            if matches!(arg.as_str(), "--config" | "-c" | "--color" | "--log-format") {
+                skip_next = true;
+            }
+            continue;
+        }
+        subcmd_idx = Some(i);
+        break;
+    }
+    if let Some(i) = subcmd_idx
+        && let Some((_, canonical)) = SUBCOMMAND_ALIASES
+            .iter()
+            .find(|(alias, _)| alias.eq_ignore_ascii_case(&args[i]))
+    {
+        corrections.push(Correction {
+            original: args[i].clone(),
+            corrected: (*canonical).to_string(),
+            rule: CorrectionRule::SubcommandAlias,
+            confidence: 1.0,
+        });
+        args[i] = (*canonical).to_string();
+    }
+    args
+}
+
+/// Phase C: Normalize enum values for flags with known valid values.
+///
+/// Handles both `--flag value` and `--flag=value` forms. Corrections are:
+/// 1. Case normalization: `Opened` → `opened`
+/// 2. Prefix expansion: `open` → `opened` (only if unambiguous)
+/// 3. Fuzzy matching: `opend` → `opened`
+fn normalize_enum_values(args: &mut [String], corrections: &mut Vec<Correction>) {
+    let mut i = 0;
+    while i < args.len() {
+        // Respect POSIX `--` option terminator — don't normalize values after it
+        if args[i] == "--" {
+            break;
+        }
+
+        // Handle --flag=value form
+        if let Some(eq_pos) = args[i].find('=') {
+            let flag = args[i][..eq_pos].to_string();
+            let value = args[i][eq_pos + 1..].to_string();
+            if let Some(valid_vals) = lookup_enum_values(&flag)
+                && let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals)
+            {
+                let original = args[i].clone();
+                let corrected = format!("{flag}={corrected_val}");
+                args[i] = corrected.clone();
+                corrections.push(Correction {
+                    original,
+                    corrected,
+                    rule: if is_case_only {
+                        CorrectionRule::ValueNormalization
+                    } else {
+                        CorrectionRule::ValueFuzzy
+                    },
+                    confidence: 0.95,
+                });
+            }
+            i += 1;
+            continue;
+        }
+
+        // Handle --flag value form
+        if args[i].starts_with("--")
+            && let Some(valid_vals) = lookup_enum_values(&args[i])
+            && i + 1 < args.len()
+            && !args[i + 1].starts_with('-')
+        {
+            let value = args[i + 1].clone();
+            if let Some((corrected_val, is_case_only)) = normalize_value(&value, valid_vals) {
+                let original = args[i + 1].clone();
+                args[i + 1] = corrected_val.to_string();
+                corrections.push(Correction {
+                    original,
+                    corrected: corrected_val.to_string(),
+                    rule: if is_case_only {
+                        CorrectionRule::ValueNormalization
+                    } else {
+                        CorrectionRule::ValueFuzzy
+                    },
+                    confidence: 0.95,
+                });
+            }
+            i += 2;
+            continue;
+        }
+
+        i += 1;
+    }
+}
+
+/// Look up valid enum values for a flag (case-insensitive flag name match).
+fn lookup_enum_values(flag: &str) -> Option<&'static [&'static str]> {
+    let lower = flag.to_lowercase();
+    ENUM_VALUES
+        .iter()
+        .find(|(f, _)| f.to_lowercase() == lower)
+        .map(|(_, vals)| *vals)
+}
+
+/// Try to normalize a value against a set of valid values.
+///
+/// Returns `Some((corrected, is_case_only))` if a correction is needed:
+/// - `is_case_only = true` for pure case normalization
+/// - `is_case_only = false` for prefix/fuzzy corrections
+///
+/// Returns `None` if the value is already valid or no match is found.
+fn normalize_value(input: &str, valid_values: &[&str]) -> Option<(String, bool)> {
+    // Already valid (exact match)? No correction needed.
+    if valid_values.contains(&input) {
+        return None;
+    }
+
+    let lower = input.to_lowercase();
+
+    // Case-insensitive exact match
+    if let Some(&val) = valid_values.iter().find(|v| v.to_lowercase() == lower) {
+        return Some((val.to_string(), true));
+    }
+
+    // Prefix match (e.g., "open" → "opened") — only if unambiguous
+    let prefix_matches: Vec<&&str> = valid_values
+        .iter()
+        .filter(|v| v.starts_with(&*lower))
+        .collect();
+    if prefix_matches.len() == 1 {
+        return Some(((*prefix_matches[0]).to_string(), false));
+    }
+
+    // Fuzzy match
+    let best = valid_values
+        .iter()
+        .map(|v| (*v, jaro_winkler(&lower, v)))
+        .max_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));
+    if let Some((val, score)) = best
+        && score >= 0.8
+    {
+        return Some((val.to_string(), false));
+    }
+
+    None
+}
+
 /// Clap built-in flags that should never be corrected. These are handled by clap
 /// directly and are not in our GLOBAL_FLAGS registry.
 const CLAP_BUILTINS: &[&str] = &["--help", "--version"];
@@ -462,10 +696,34 @@ fn try_correct(arg: &str, valid_flags: &[&str], strict: bool) -> Option<Correcti
        });
    }

-    // Rule 3: Fuzzy flag match — `--staate` -> `--state` (skip in strict mode)
-    if !strict
-        && let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
-        && score >= FUZZY_FLAG_THRESHOLD
+    // Rule 3: Prefix match — `--proj` -> `--project` (only if unambiguous)
+    let prefix_matches: Vec<&str> = valid_flags
+        .iter()
+        .filter(|f| f.starts_with(&*lower) && f.to_lowercase() != lower)
+        .copied()
+        .collect();
+    if prefix_matches.len() == 1 {
+        let matched = prefix_matches[0];
+        let corrected = match value_suffix {
+            Some(suffix) => format!("{matched}{suffix}"),
+            None => matched.to_string(),
+        };
+        return Some(Correction {
+            original: arg.to_string(),
+            corrected,
+            rule: CorrectionRule::FlagPrefix,
+            confidence: 0.95,
+        });
+    }
+
+    // Rule 4: Fuzzy flag match — higher threshold in strict/robot mode
+    let threshold = if strict {
+        FUZZY_FLAG_THRESHOLD_STRICT
+    } else {
+        FUZZY_FLAG_THRESHOLD
+    };
+    if let Some((best_flag, score)) = best_fuzzy_match(&lower, valid_flags)
+        && score >= threshold
    {
        let corrected = match value_suffix {
            Some(suffix) => format!("{best_flag}{suffix}"),
@@ -539,6 +797,30 @@ pub fn format_teaching_note(correction: &Correction) -> String {
                correction.corrected, correction.original
            )
        }
+        CorrectionRule::SubcommandAlias => {
+            format!(
+                "Use canonical command name: {} (not {})",
+                correction.corrected, correction.original
+            )
+        }
+        CorrectionRule::ValueNormalization => {
+            format!(
+                "Values are lowercase: {} (not {})",
+                correction.corrected, correction.original
+            )
+        }
+        CorrectionRule::ValueFuzzy => {
+            format!(
+                "Correct value spelling: {} (not {})",
+                correction.corrected, correction.original
+            )
+        }
+        CorrectionRule::FlagPrefix => {
+            format!(
+                "Use full flag name: {} (not {})",
+                correction.corrected, correction.original
+            )
+        }
    }
 }

@@ -722,17 +1004,20 @@ mod tests {
        assert_eq!(result.args[1], "--help");
    }

-    // ---- I6: Strict mode (robot) disables fuzzy matching ----
+    // ---- Strict mode (robot) uses higher fuzzy threshold ----

    #[test]
-    fn strict_mode_disables_fuzzy() {
-        // Fuzzy match works in non-strict
+    fn strict_mode_rejects_low_confidence_fuzzy() {
+        // `--staate` vs `--state` — close but may be below strict threshold (0.9)
+        // The exact score depends on Jaro-Winkler; this tests that the strict
+        // threshold is higher than non-strict.
        let non_strict = correct_args(args("lore --robot issues --staate opened"), false);
        assert_eq!(non_strict.corrections.len(), 1);
        assert_eq!(non_strict.corrections[0].rule, CorrectionRule::FuzzyFlag);

-        // Fuzzy match disabled in strict
-        let strict = correct_args(args("lore --robot issues --staate opened"), true);
+        // In strict mode, same typo might or might not match depending on JW score.
+        // We verify that at least wildly wrong flags are still rejected.
+        let strict = correct_args(args("lore --robot issues --xyzzy foo"), true);
        assert!(strict.corrections.is_empty());
    }

@@ -751,6 +1036,155 @@ mod tests {
        assert_eq!(result.corrections[0].corrected, "--robot");
    }

+    // ---- Subcommand alias correction ----
+
+    #[test]
+    fn subcommand_alias_merge_requests_underscore() {
+        let result = correct_args(args("lore --robot merge_requests -n 10"), false);
+        assert!(
+            result
+                .corrections
+                .iter()
+                .any(|c| c.rule == CorrectionRule::SubcommandAlias && c.corrected == "mrs")
+        );
+        assert!(result.args.contains(&"mrs".to_string()));
+    }
+
+    #[test]
+    fn subcommand_alias_mergerequests_no_sep() {
+        let result = correct_args(args("lore --robot mergerequests"), false);
+        assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
+    }
+
+    #[test]
+    fn subcommand_alias_generate_docs_underscore() {
+        let result = correct_args(args("lore generate_docs"), false);
+        assert!(
+            result
+                .corrections
+                .iter()
+                .any(|c| c.corrected == "generate-docs")
+        );
+    }
+
+    #[test]
+    fn subcommand_alias_case_insensitive() {
+        let result = correct_args(args("lore Merge_Requests"), false);
+        assert!(result.corrections.iter().any(|c| c.corrected == "mrs"));
+    }
+
+    #[test]
+    fn subcommand_alias_valid_command_untouched() {
+        let result = correct_args(args("lore issues -n 10"), false);
+        assert!(result.corrections.is_empty());
+    }
+
+    // ---- Enum value normalization ----
+
+    #[test]
+    fn value_case_normalization() {
+        let result = correct_args(args("lore issues --state Opened"), false);
+        assert!(
+            result
+                .corrections
+                .iter()
+                .any(|c| c.rule == CorrectionRule::ValueNormalization && c.corrected == "opened")
+        );
+        assert!(result.args.contains(&"opened".to_string()));
+    }
+
+    #[test]
+    fn value_case_normalization_eq_form() {
+        let result = correct_args(args("lore issues --state=Opened"), false);
+        assert!(
+            result
+                .corrections
+                .iter()
+                .any(|c| c.corrected == "--state=opened")
+        );
+    }
+
+    #[test]
+    fn value_prefix_expansion() {
+        // "open" is a unique prefix of "opened"
+        let result = correct_args(args("lore issues --state open"), false);
+        assert!(
+            result
+                .corrections
+                .iter()
+                .any(|c| c.corrected == "opened" && c.rule == CorrectionRule::ValueFuzzy)
+        );
+    }
+
+    #[test]
+    fn value_fuzzy_typo() {
+        let result = correct_args(args("lore issues --state opend"), false);
+        assert!(result.corrections.iter().any(|c| c.corrected == "opened"));
+    }
+
+    #[test]
+    fn value_already_valid_untouched() {
+        let result = correct_args(args("lore issues --state opened"), false);
+        // No value corrections expected (flag corrections may still exist)
+        assert!(!result.corrections.iter().any(|c| matches!(
+            c.rule,
+            CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
+        )));
+    }
+
+    #[test]
+    fn value_mode_case() {
+        let result = correct_args(args("lore search --mode Hybrid query"), false);
+        assert!(result.corrections.iter().any(|c| c.corrected == "hybrid"));
+    }
+
+    #[test]
+    fn value_normalization_respects_option_terminator() {
+        // Values after `--` are positional and must not be corrected
+        let result = correct_args(args("lore search -- --state Opened"), false);
+        assert!(!result.corrections.iter().any(|c| matches!(
+            c.rule,
+            CorrectionRule::ValueNormalization | CorrectionRule::ValueFuzzy
+        )));
+        assert_eq!(result.args[4], "Opened"); // preserved as-is
+    }
+
+    // ---- Flag prefix matching ----
+
+    #[test]
+    fn flag_prefix_project() {
+        let result = correct_args(args("lore issues --proj group/repo"), false);
+        assert!(
+            result
+                .corrections
+                .iter()
+                .any(|c| c.rule == CorrectionRule::FlagPrefix && c.corrected == "--project")
+        );
+    }
+
+    #[test]
+    fn flag_prefix_ambiguous_not_corrected() {
+        // --s could be --state, --since, --sort, --status — ambiguous
+        let result = correct_args(args("lore issues --s opened"), false);
+        assert!(
+            !result
+                .corrections
+                .iter()
+                .any(|c| c.rule == CorrectionRule::FlagPrefix)
+        );
+    }
+
+    #[test]
+    fn flag_prefix_with_eq_value() {
+        let result = correct_args(args("lore issues --proj=group/repo"), false);
+        assert!(
+            result
+                .corrections
+                .iter()
+                .any(|c| c.corrected == "--project=group/repo")
+        );
+    }
+
    // ---- Teaching notes ----

    #[test]
@@ -790,6 +1224,43 @@ mod tests {
        assert!(note.contains("spelling"));
    }

+    #[test]
+    fn teaching_note_subcommand_alias() {
+        let c = Correction {
+            original: "merge_requests".to_string(),
+            corrected: "mrs".to_string(),
+            rule: CorrectionRule::SubcommandAlias,
+            confidence: 1.0,
+        };
+        let note = format_teaching_note(&c);
+        assert!(note.contains("canonical"));
+        assert!(note.contains("mrs"));
+    }
+
+    #[test]
+    fn teaching_note_value_normalization() {
+        let c = Correction {
+            original: "Opened".to_string(),
+            corrected: "opened".to_string(),
+            rule: CorrectionRule::ValueNormalization,
+            confidence: 0.95,
+        };
+        let note = format_teaching_note(&c);
+        assert!(note.contains("lowercase"));
+    }
+
+    #[test]
+    fn teaching_note_flag_prefix() {
+        let c = Correction {
+            original: "--proj".to_string(),
+            corrected: "--project".to_string(),
+            rule: CorrectionRule::FlagPrefix,
+            confidence: 0.95,
+        };
+        let note = format_teaching_note(&c);
+        assert!(note.contains("full flag name"));
+    }
+
    // ---- Post-clap suggestion helpers ----

    #[test]
--- a/src/cli/commands/generate_docs.rs
+++ b/src/cli/commands/generate_docs.rs
@@ -39,6 +39,7 @@ pub fn run_generate_docs(
        result.seeded += seed_dirty(&conn, SourceType::Issue, project_filter)?;
        result.seeded += seed_dirty(&conn, SourceType::MergeRequest, project_filter)?;
        result.seeded += seed_dirty(&conn, SourceType::Discussion, project_filter)?;
+        result.seeded += seed_dirty_notes(&conn, project_filter)?;
    }

    let regen =
@@ -67,6 +68,10 @@ fn seed_dirty(
        SourceType::Issue => "issues",
        SourceType::MergeRequest => "merge_requests",
        SourceType::Discussion => "discussions",
+        SourceType::Note => {
+            // NOTE-2E will implement seed_dirty_notes separately (needs is_system filter)
+            unreachable!("Note seeding handled by seed_dirty_notes, not seed_dirty")
+        }
    };
    let type_str = source_type.as_str();
    let now = chrono::Utc::now().timestamp_millis();
@@ -125,6 +130,55 @@ fn seed_dirty(
    Ok(total_seeded)
 }

+fn seed_dirty_notes(conn: &Connection, project_filter: Option<&str>) -> Result<usize> {
+    let now = chrono::Utc::now().timestamp_millis();
+    let mut total_seeded: usize = 0;
+    let mut last_id: i64 = 0;
+
+    loop {
+        let inserted = if let Some(project) = project_filter {
+            let project_id = resolve_project(conn, project)?;
+
+            conn.execute(
+                "INSERT INTO dirty_sources (source_type, source_id, queued_at, attempt_count, last_attempt_at, last_error, next_attempt_at)
+                 SELECT 'note', id, ?1, 0, NULL, NULL, NULL
+                 FROM notes WHERE id > ?2 AND project_id = ?3 AND is_system = 0 ORDER BY id LIMIT ?4
+                 ON CONFLICT(source_type, source_id) DO NOTHING",
+                rusqlite::params![now, last_id, project_id, FULL_MODE_CHUNK_SIZE],
+            )?
+        } else {
+            conn.execute(
+                "INSERT INTO dirty_sources (source_type, source_id, queued_at, attempt_count, last_attempt_at, last_error, next_attempt_at)
+                 SELECT 'note', id, ?1, 0, NULL, NULL, NULL
+                 FROM notes WHERE id > ?2 AND is_system = 0 ORDER BY id LIMIT ?3
+                 ON CONFLICT(source_type, source_id) DO NOTHING",
+                rusqlite::params![now, last_id, FULL_MODE_CHUNK_SIZE],
+            )?
+        };
+
+        if inserted == 0 {
+            break;
+        }
+
+        let max_id: i64 = conn.query_row(
+            "SELECT MAX(id) FROM (SELECT id FROM notes WHERE id > ?1 AND is_system = 0 ORDER BY id LIMIT ?2)",
+            rusqlite::params![last_id, FULL_MODE_CHUNK_SIZE],
+            |row| row.get(0),
+        )?;
+
+        total_seeded += inserted;
+        last_id = max_id;
+    }
+
+    info!(
+        source_type = "note",
+        seeded = total_seeded,
+        "Seeded dirty_sources"
+    );
+
+    Ok(total_seeded)
+}
+
 pub fn print_generate_docs(result: &GenerateDocsResult) {
    let mode = if result.full_mode {
        "full"
@@ -186,3 +240,81 @@ pub fn print_generate_docs_json(result: &GenerateDocsResult, elapsed_ms: u64) {
    };
    println!("{}", serde_json::to_string(&output).unwrap());
 }
+
+#[cfg(test)]
+mod tests {
+    use std::path::Path;
+
+    use crate::core::db::{create_connection, run_migrations};
+
+    use super::*;
+
+    fn setup_db() -> Connection {
+        let conn = create_connection(Path::new(":memory:")).unwrap();
+        run_migrations(&conn).unwrap();
+        conn.execute(
+            "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url) VALUES (1, 100, 'group/project', 'https://gitlab.com/group/project')",
+            [],
+        ).unwrap();
+        conn.execute(
+            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 1, 'Test', 'opened', 1000, 2000, 3000)",
+            [],
+        ).unwrap();
+        conn.execute(
+            "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
+            [],
+        ).unwrap();
+        conn
+    }
+
+    fn insert_note(conn: &Connection, id: i64, gitlab_id: i64, is_system: bool) {
+        conn.execute(
+            "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (?1, ?2, 1, 1, 'alice', 'note body', 1000, 2000, 3000, ?3)",
+            rusqlite::params![id, gitlab_id, is_system as i32],
+        ).unwrap();
+    }
+
+    #[test]
+    fn test_full_seed_includes_notes() {
+        let conn = setup_db();
+        insert_note(&conn, 1, 101, false);
+        insert_note(&conn, 2, 102, false);
+        insert_note(&conn, 3, 103, false);
+        insert_note(&conn, 4, 104, true); // system note — should be excluded
+
+        let seeded = seed_dirty_notes(&conn, None).unwrap();
+        assert_eq!(seeded, 3);
+
+        let count: i64 = conn
+            .query_row(
+                "SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
+                [],
+                |row| row.get(0),
+            )
+            .unwrap();
+        assert_eq!(count, 3);
+    }
+
+    #[test]
+    fn test_note_document_count_stable_after_second_generate_docs_full() {
+        let conn = setup_db();
+        insert_note(&conn, 1, 101, false);
+        insert_note(&conn, 2, 102, false);
+
+        let first = seed_dirty_notes(&conn, None).unwrap();
+        assert_eq!(first, 2);
+
+        // Second run should be idempotent (ON CONFLICT DO NOTHING)
+        let second = seed_dirty_notes(&conn, None).unwrap();
+        assert_eq!(second, 0);
+
+        let count: i64 = conn
+            .query_row(
+                "SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
+                [],
+                |row| row.get(0),
+            )
+            .unwrap();
+        assert_eq!(count, 2);
+    }
+}
--- a/src/cli/commands/list.rs
+++ b/src/cli/commands/list.rs
@@ -6,6 +6,7 @@ use crate::Config;
 use crate::cli::robot::{RobotMeta, expand_fields_preset, filter_fields};
 use crate::core::db::create_connection;
 use crate::core::error::{LoreError, Result};
+use crate::core::path_resolver::escape_like as note_escape_like;
 use crate::core::paths::get_db_path;
 use crate::core::project::resolve_project;
 use crate::core::time::{ms_to_iso, now_ms, parse_since};
@@ -966,77 +967,566 @@ pub fn open_mr_in_browser(result: &MrListResult) -> Option<String> {
    }
 }

-#[cfg(test)]
-mod tests {
-    use super::*;
+// ---------------------------------------------------------------------------
+// Note output formatting
+// ---------------------------------------------------------------------------

-    #[test]
-    fn truncate_leaves_short_strings_alone() {
-        assert_eq!(truncate_with_ellipsis("short", 10), "short");
-    }
-
-    #[test]
-    fn truncate_adds_ellipsis_to_long_strings() {
-        assert_eq!(
-            truncate_with_ellipsis("this is a very long title", 15),
-            "this is a ve..."
-        );
-    }
-
-    #[test]
-    fn truncate_handles_exact_length() {
-        assert_eq!(truncate_with_ellipsis("exactly10!", 10), "exactly10!");
-    }
-
-    #[test]
-    fn relative_time_formats_correctly() {
-        let now = now_ms();
-
-        assert_eq!(format_relative_time(now - 30_000), "just now");
-        assert_eq!(format_relative_time(now - 120_000), "2 min ago");
-        assert_eq!(format_relative_time(now - 7_200_000), "2 hours ago");
-        assert_eq!(format_relative_time(now - 172_800_000), "2 days ago");
-    }
-
-    #[test]
-    fn format_labels_empty() {
-        assert_eq!(format_labels(&[], 2), "");
-    }
-
-    #[test]
-    fn format_labels_single() {
-        assert_eq!(format_labels(&["bug".to_string()], 2), "[bug]");
-    }
-
-    #[test]
-    fn format_labels_multiple() {
-        let labels = vec!["bug".to_string(), "urgent".to_string()];
-        assert_eq!(format_labels(&labels, 2), "[bug, urgent]");
-    }
-
-    #[test]
-    fn format_labels_overflow() {
-        let labels = vec![
-            "bug".to_string(),
-            "urgent".to_string(),
-            "wip".to_string(),
-            "blocked".to_string(),
-        ];
-        assert_eq!(format_labels(&labels, 2), "[bug, urgent +2]");
-    }
-
-    #[test]
-    fn format_discussions_empty() {
-        assert_eq!(format_discussions(0, 0), "");
-    }
-
-    #[test]
-    fn format_discussions_no_unresolved() {
-        assert_eq!(format_discussions(5, 0), "5");
-    }
-
-    #[test]
-    fn format_discussions_with_unresolved() {
-        assert_eq!(format_discussions(5, 2), "5/2!");
+fn truncate_body(body: &str, max_len: usize) -> String {
+    if body.chars().count() <= max_len {
+        body.to_string()
+    } else {
+        let truncated: String = body.chars().take(max_len).collect();
+        format!("{truncated}...")
    }
 }
+
+fn format_note_type(note_type: Option<&str>) -> &str {
+    match note_type {
+        Some("DiffNote") => "Diff",
+        Some("DiscussionNote") => "Disc",
+        _ => "-",
+    }
+}
+
+fn format_note_path(path: Option<&str>, line: Option<i64>) -> String {
+    match (path, line) {
+        (Some(p), Some(l)) => format!("{p}:{l}"),
+        (Some(p), None) => p.to_string(),
+        _ => "-".to_string(),
+    }
+}
+
+fn format_note_parent(noteable_type: Option<&str>, parent_iid: Option<i64>) -> String {
+    match (noteable_type, parent_iid) {
+        (Some("Issue"), Some(iid)) => format!("Issue #{iid}"),
+        (Some("MergeRequest"), Some(iid)) => format!("MR !{iid}"),
+        _ => "-".to_string(),
+    }
+}
+
+pub fn print_list_notes(result: &NoteListResult) {
+    if result.notes.is_empty() {
+        println!("No notes found.");
+        return;
+    }
+
+    println!(
+        "Notes (showing {} of {})\n",
+        result.notes.len(),
+        result.total_count
+    );
+
+    let mut table = Table::new();
+    table
+        .set_content_arrangement(ContentArrangement::Dynamic)
+        .set_header(vec![
+            Cell::new("ID").add_attribute(Attribute::Bold),
+            Cell::new("Author").add_attribute(Attribute::Bold),
+            Cell::new("Type").add_attribute(Attribute::Bold),
+            Cell::new("Body").add_attribute(Attribute::Bold),
+            Cell::new("Path:Line").add_attribute(Attribute::Bold),
+            Cell::new("Parent").add_attribute(Attribute::Bold),
+            Cell::new("Created").add_attribute(Attribute::Bold),
+        ]);
+
+    for note in &result.notes {
+        let body = note
+            .body
+            .as_deref()
+            .map(|b| truncate_body(b, 60))
+            .unwrap_or_default();
+        let path = format_note_path(note.position_new_path.as_deref(), note.position_new_line);
+        let parent = format_note_parent(note.noteable_type.as_deref(), note.parent_iid);
+        let relative_time = format_relative_time(note.created_at);
+        let note_type = format_note_type(note.note_type.as_deref());
+
+        table.add_row(vec![
+            colored_cell(note.gitlab_id, Color::Cyan),
+            colored_cell(
+                format!("@{}", truncate_with_ellipsis(&note.author_username, 12)),
+                Color::Magenta,
+            ),
+            Cell::new(note_type),
+            Cell::new(body),
+            Cell::new(path),
+            Cell::new(parent),
+            colored_cell(relative_time, Color::DarkGrey),
+        ]);
+    }
+
+    println!("{table}");
+}
+
+pub fn print_list_notes_json(result: &NoteListResult, elapsed_ms: u64, fields: Option<&[String]>) {
+    let json_result = NoteListResultJson::from(result);
+    let meta = RobotMeta { elapsed_ms };
+    let output = serde_json::json!({
+        "ok": true,
+        "data": json_result,
+        "meta": meta,
+    });
+    let mut output = output;
+    if let Some(f) = fields {
+        let expanded = expand_fields_preset(f, "notes");
+        filter_fields(&mut output, "notes", &expanded);
+    }
+    match serde_json::to_string(&output) {
+        Ok(json) => println!("{json}"),
+        Err(e) => eprintln!("Error serializing to JSON: {e}"),
+    }
+}
+
+pub fn print_list_notes_jsonl(result: &NoteListResult) {
+    for note in &result.notes {
+        let json_row = NoteListRowJson::from(note);
+        match serde_json::to_string(&json_row) {
+            Ok(json) => println!("{json}"),
+            Err(e) => eprintln!("Error serializing to JSON: {e}"),
+        }
+    }
+}
+
+/// Escape a field for RFC 4180 CSV: quote fields containing commas, quotes, or newlines.
+fn csv_escape(field: &str) -> String {
+    if field.contains(',') || field.contains('"') || field.contains('\n') || field.contains('\r') {
+        let escaped = field.replace('"', "\"\"");
+        format!("\"{escaped}\"")
+    } else {
+        field.to_string()
+    }
+}
+
+pub fn print_list_notes_csv(result: &NoteListResult) {
+    println!(
+        "id,gitlab_id,author_username,body,note_type,is_system,created_at,updated_at,position_new_path,position_new_line,noteable_type,parent_iid,project_path"
+    );
+    for note in &result.notes {
+        let body = note.body.as_deref().unwrap_or("");
+        let note_type = note.note_type.as_deref().unwrap_or("");
+        let path = note.position_new_path.as_deref().unwrap_or("");
+        let line = note
+            .position_new_line
+            .map_or(String::new(), |l| l.to_string());
+        let noteable = note.noteable_type.as_deref().unwrap_or("");
+        let parent_iid = note.parent_iid.map_or(String::new(), |i| i.to_string());
+
+        println!(
+            "{},{},{},{},{},{},{},{},{},{},{},{},{}",
+            note.id,
+            note.gitlab_id,
+            csv_escape(&note.author_username),
+            csv_escape(body),
+            csv_escape(note_type),
+            note.is_system,
+            note.created_at,
+            note.updated_at,
+            csv_escape(path),
+            line,
+            csv_escape(noteable),
+            parent_iid,
+            csv_escape(&note.project_path),
+        );
+    }
+}
+
+// ---------------------------------------------------------------------------
+// Note query layer
+// ---------------------------------------------------------------------------
+
+#[derive(Debug, Serialize)]
+pub struct NoteListRow {
+    pub id: i64,
+    pub gitlab_id: i64,
+    pub author_username: String,
+    pub body: Option<String>,
+    pub note_type: Option<String>,
+    pub is_system: bool,
+    pub created_at: i64,
+    pub updated_at: i64,
+    pub position_new_path: Option<String>,
+    pub position_new_line: Option<i64>,
+    pub position_old_path: Option<String>,
+    pub position_old_line: Option<i64>,
+    pub resolvable: bool,
+    pub resolved: bool,
+    pub resolved_by: Option<String>,
+    pub noteable_type: Option<String>,
+    pub parent_iid: Option<i64>,
+    pub parent_title: Option<String>,
+    pub project_path: String,
+}
+
+#[derive(Serialize)]
+pub struct NoteListRowJson {
+    pub id: i64,
+    pub gitlab_id: i64,
+    pub author_username: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub body: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub note_type: Option<String>,
+    pub is_system: bool,
+    pub created_at_iso: String,
+    pub updated_at_iso: String,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub position_new_path: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub position_new_line: Option<i64>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub position_old_path: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub position_old_line: Option<i64>,
+    pub resolvable: bool,
+    pub resolved: bool,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub resolved_by: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub noteable_type: Option<String>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub parent_iid: Option<i64>,
+    #[serde(skip_serializing_if = "Option::is_none")]
+    pub parent_title: Option<String>,
+    pub project_path: String,
+}
+
+impl From<&NoteListRow> for NoteListRowJson {
+    fn from(row: &NoteListRow) -> Self {
+        Self {
+            id: row.id,
+            gitlab_id: row.gitlab_id,
+            author_username: row.author_username.clone(),
+            body: row.body.clone(),
+            note_type: row.note_type.clone(),
+            is_system: row.is_system,
+            created_at_iso: ms_to_iso(row.created_at),
+            updated_at_iso: ms_to_iso(row.updated_at),
+            position_new_path: row.position_new_path.clone(),
+            position_new_line: row.position_new_line,
+            position_old_path: row.position_old_path.clone(),
+            position_old_line: row.position_old_line,
+            resolvable: row.resolvable,
+            resolved: row.resolved,
+            resolved_by: row.resolved_by.clone(),
+            noteable_type: row.noteable_type.clone(),
+            parent_iid: row.parent_iid,
+            parent_title: row.parent_title.clone(),
+            project_path: row.project_path.clone(),
+        }
+    }
+}
+
+#[derive(Debug)]
+pub struct NoteListResult {
+    pub notes: Vec<NoteListRow>,
+    pub total_count: i64,
+}
+
+#[derive(Serialize)]
+pub struct NoteListResultJson {
+    pub notes: Vec<NoteListRowJson>,
+    pub total_count: i64,
+    pub showing: usize,
+}
+
+impl From<&NoteListResult> for NoteListResultJson {
+    fn from(result: &NoteListResult) -> Self {
+        Self {
+            notes: result.notes.iter().map(NoteListRowJson::from).collect(),
+            total_count: result.total_count,
+            showing: result.notes.len(),
+        }
+    }
+}
+
+pub struct NoteListFilters {
+    pub limit: usize,
+    pub project: Option<String>,
+    pub author: Option<String>,
+    pub note_type: Option<String>,
+    pub include_system: bool,
+    pub for_issue_iid: Option<i64>,
+    pub for_mr_iid: Option<i64>,
+    pub note_id: Option<i64>,
+    pub gitlab_note_id: Option<i64>,
+    pub discussion_id: Option<String>,
+    pub since: Option<String>,
+    pub until: Option<String>,
+    pub path: Option<String>,
+    pub contains: Option<String>,
+    pub resolution: Option<String>,
+    pub sort: String,
+    pub order: String,
+}
+
+pub fn query_notes(
+    conn: &Connection,
+    filters: &NoteListFilters,
+    config: &Config,
+) -> Result<NoteListResult> {
+    let mut where_clauses: Vec<String> = Vec::new();
+    let mut params: Vec<Box<dyn rusqlite::ToSql>> = Vec::new();
+
+    // Project filter
+    if let Some(ref project) = filters.project {
+        let project_id = resolve_project(conn, project)?;
+        where_clauses.push("n.project_id = ?".to_string());
+        params.push(Box::new(project_id));
+    }
+
+    // Author filter (case-insensitive, strip leading @)
+    if let Some(ref author) = filters.author {
+        let username = author.strip_prefix('@').unwrap_or(author);
+        where_clauses.push("n.author_username = ? COLLATE NOCASE".to_string());
+        params.push(Box::new(username.to_string()));
+    }
+
+    // Note type filter
+    if let Some(ref note_type) = filters.note_type {
+        where_clauses.push("n.note_type = ?".to_string());
+        params.push(Box::new(note_type.clone()));
+    }
+
+    // System note filter (default: exclude system notes)
+    if !filters.include_system {
+        where_clauses.push("n.is_system = 0".to_string());
+    }
+
+    // Since filter
+    let since_ms = if let Some(ref since_str) = filters.since {
+        let ms = parse_since(since_str).ok_or_else(|| {
+            LoreError::Other(format!(
+                "Invalid --since value '{}'. Use relative (7d, 2w, 1m) or absolute (YYYY-MM-DD) format.",
+                since_str
+            ))
+        })?;
+        where_clauses.push("n.created_at >= ?".to_string());
+        params.push(Box::new(ms));
+        Some(ms)
+    } else {
+        None
+    };
+
+    // Until filter (end of day for date-only input)
+    if let Some(ref until_str) = filters.until {
+        let until_ms = if until_str.len() == 10
+            && until_str.chars().filter(|&c| c == '-').count() == 2
+        {
+            // Date-only: use end of day 23:59:59.999
+            let iso_full = format!("{until_str}T23:59:59.999Z");
+            crate::core::time::iso_to_ms(&iso_full).ok_or_else(|| {
+                LoreError::Other(format!(
+                    "Invalid --until value '{}'. Use YYYY-MM-DD or relative format.",
+                    until_str
+                ))
+            })?
+        } else {
+            parse_since(until_str).ok_or_else(|| {
+                LoreError::Other(format!(
+                    "Invalid --until value '{}'. Use relative (7d, 2w, 1m) or absolute (YYYY-MM-DD) format.",
+                    until_str
+                ))
+            })?
+        };
+
+        // Validate since <= until
+        if let Some(s) = since_ms
+            && s > until_ms
+        {
+            return Err(LoreError::Other(
+                "Invalid time window: --since is after --until.".to_string(),
+            ));
+        }
+
+        where_clauses.push("n.created_at <= ?".to_string());
+        params.push(Box::new(until_ms));
+    }
+
+    // Path filter (trailing / = prefix match, else exact)
+    if let Some(ref path) = filters.path {
+        if let Some(prefix) = path.strip_suffix('/') {
+            let escaped = note_escape_like(prefix);
+            where_clauses.push("n.position_new_path LIKE ? ESCAPE '\\'".to_string());
+            params.push(Box::new(format!("{escaped}%")));
+        } else {
+            where_clauses.push("n.position_new_path = ?".to_string());
+            params.push(Box::new(path.clone()));
+        }
+    }
+
+    // Contains filter (LIKE %term% on body, case-insensitive)
+    if let Some(ref contains) = filters.contains {
+        let escaped = note_escape_like(contains);
+        where_clauses.push("n.body LIKE ? ESCAPE '\\' COLLATE NOCASE".to_string());
+        params.push(Box::new(format!("%{escaped}%")));
+    }
+
+    // Resolution filter
+    if let Some(ref resolution) = filters.resolution {
+        match resolution.as_str() {
+            "unresolved" => {
+                where_clauses.push("n.resolvable = 1 AND n.resolved = 0".to_string());
+            }
+            "resolved" => {
+                where_clauses.push("n.resolvable = 1 AND n.resolved = 1".to_string());
+            }
+            other => {
+                return Err(LoreError::Other(format!(
+                    "Invalid --resolution value '{}'. Use 'resolved' or 'unresolved'.",
+                    other
+                )));
+            }
+        }
+    }
+
+    // For-issue-iid filter (requires project context)
+    if let Some(iid) = filters.for_issue_iid {
+        let project_str = filters.project.as_deref().or(config.default_project.as_deref()).ok_or_else(|| {
+            LoreError::Other(
+                "Cannot filter by issue IID without a project context. Use --project or set defaultProject in config."
+                    .to_string(),
+            )
+        })?;
+        let project_id = resolve_project(conn, project_str)?;
+        where_clauses.push(
+            "d.issue_id = (SELECT id FROM issues WHERE project_id = ? AND iid = ?)".to_string(),
+        );
+        params.push(Box::new(project_id));
+        params.push(Box::new(iid));
+    }
+
+    // For-mr-iid filter (requires project context)
+    if let Some(iid) = filters.for_mr_iid {
+        let project_str = filters.project.as_deref().or(config.default_project.as_deref()).ok_or_else(|| {
+            LoreError::Other(
+                "Cannot filter by MR IID without a project context. Use --project or set defaultProject in config."
+                    .to_string(),
+            )
+        })?;
+        let project_id = resolve_project(conn, project_str)?;
+        where_clauses.push(
+            "d.merge_request_id = (SELECT id FROM merge_requests WHERE project_id = ? AND iid = ?)"
+                .to_string(),
+        );
+        params.push(Box::new(project_id));
+        params.push(Box::new(iid));
+    }
+
+    // Note ID filter
+    if let Some(id) = filters.note_id {
+        where_clauses.push("n.id = ?".to_string());
+        params.push(Box::new(id));
+    }
+
+    // GitLab note ID filter
+    if let Some(gitlab_id) = filters.gitlab_note_id {
+        where_clauses.push("n.gitlab_id = ?".to_string());
+        params.push(Box::new(gitlab_id));
+    }
+
+    // Discussion ID filter
+    if let Some(ref disc_id) = filters.discussion_id {
+        where_clauses.push("d.gitlab_discussion_id = ?".to_string());
+        params.push(Box::new(disc_id.clone()));
+    }
+
+    let where_sql = if where_clauses.is_empty() {
+        String::new()
+    } else {
+        format!("WHERE {}", where_clauses.join(" AND "))
+    };
+
+    // Count query
+    let count_sql = format!(
+        "SELECT COUNT(*) FROM notes n
+         JOIN discussions d ON n.discussion_id = d.id
+         JOIN projects p ON n.project_id = p.id
+         LEFT JOIN issues i ON d.issue_id = i.id
+         LEFT JOIN merge_requests m ON d.merge_request_id = m.id
+         {where_sql}"
+    );
+
+    let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
+    let total_count: i64 = conn.query_row(&count_sql, param_refs.as_slice(), |row| row.get(0))?;
+
+    // Sort + order
+    let sort_column = match filters.sort.as_str() {
+        "updated" => "n.updated_at",
+        _ => "n.created_at",
+    };
+    let order = if filters.order == "asc" {
+        "ASC"
+    } else {
+        "DESC"
+    };
+
+    let query_sql = format!(
+        "SELECT
+            n.id,
+            n.gitlab_id,
+            n.author_username,
+            n.body,
+            n.note_type,
+            n.is_system,
+            n.created_at,
+            n.updated_at,
+            n.position_new_path,
+            n.position_new_line,
+            n.position_old_path,
+            n.position_old_line,
+            n.resolvable,
+            n.resolved,
+            n.resolved_by,
+            d.noteable_type,
+            COALESCE(i.iid, m.iid) AS parent_iid,
+            COALESCE(i.title, m.title) AS parent_title,
+            p.path_with_namespace AS project_path
+         FROM notes n
+         JOIN discussions d ON n.discussion_id = d.id
+         JOIN projects p ON n.project_id = p.id
+         LEFT JOIN issues i ON d.issue_id = i.id
+         LEFT JOIN merge_requests m ON d.merge_request_id = m.id
+         {where_sql}
+         ORDER BY {sort_column} {order}, n.id {order}
+         LIMIT ?"
+    );
+
+    params.push(Box::new(filters.limit as i64));
+    let param_refs: Vec<&dyn rusqlite::ToSql> = params.iter().map(|p| p.as_ref()).collect();
+
+    let mut stmt = conn.prepare(&query_sql)?;
+    let notes: Vec<NoteListRow> = stmt
+        .query_map(param_refs.as_slice(), |row| {
+            let is_system_int: i64 = row.get(5)?;
+            let resolvable_int: i64 = row.get(12)?;
+            let resolved_int: i64 = row.get(13)?;
+
+            Ok(NoteListRow {
+                id: row.get(0)?,
+                gitlab_id: row.get(1)?,
+                author_username: row.get::<_, Option<String>>(2)?.unwrap_or_default(),
+                body: row.get(3)?,
+                note_type: row.get(4)?,
+                is_system: is_system_int == 1,
+                created_at: row.get(6)?,
+                updated_at: row.get(7)?,
+                position_new_path: row.get(8)?,
+                position_new_line: row.get(9)?,
+                position_old_path: row.get(10)?,
+                position_old_line: row.get(11)?,
+                resolvable: resolvable_int == 1,
+                resolved: resolved_int == 1,
+                resolved_by: row.get(14)?,
+                noteable_type: row.get(15)?,
+                parent_iid: row.get(16)?,
+                parent_title: row.get(17)?,
+                project_path: row.get(18)?,
+            })
+        })?
+        .collect::<std::result::Result<Vec<_>, _>>()?;
+
+    Ok(NoteListResult { notes, total_count })
+}
+
+#[cfg(test)]
+#[path = "list_tests.rs"]
+mod tests;
--- a/src/cli/commands/list_tests.rs
+++ b/src/cli/commands/list_tests.rs
--- a/src/cli/commands/mod.rs
+++ b/src/cli/commands/mod.rs
@@ -30,8 +30,10 @@ pub use ingest::{
 };
 pub use init::{InitInputs, InitOptions, InitResult, run_init};
 pub use list::{
-    ListFilters, MrListFilters, open_issue_in_browser, open_mr_in_browser, print_list_issues,
-    print_list_issues_json, print_list_mrs, print_list_mrs_json, run_list_issues, run_list_mrs,
+    ListFilters, MrListFilters, NoteListFilters, open_issue_in_browser, open_mr_in_browser,
+    print_list_issues, print_list_issues_json, print_list_mrs, print_list_mrs_json,
+    print_list_notes, print_list_notes_csv, print_list_notes_json, print_list_notes_jsonl,
+    query_notes, run_list_issues, run_list_mrs,
 };
 pub use search::{
    SearchCliFilters, SearchResponse, print_search_results, print_search_results_json, run_search,
--- a/src/cli/commands/search.rs
+++ b/src/cli/commands/search.rs
@@ -334,6 +334,7 @@ pub fn print_search_results(response: &SearchResponse) {
            "issue" => "Issue",
            "merge_request" => "MR",
            "discussion" => "Discussion",
+            "note" => "Note",
            _ => &result.source_type,
        };

--- a/src/cli/commands/show.rs
+++ b/src/cli/commands/show.rs
@@ -160,6 +160,7 @@ pub fn run_show_issue(
    })
 }

+#[derive(Debug)]
 struct IssueRow {
    id: i64,
    iid: i64,
@@ -194,7 +195,7 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
                        i.due_date, i.milestone_title,
                        (SELECT COUNT(*) FROM notes n
                         JOIN discussions d ON n.discussion_id = d.id
-                         WHERE d.noteable_type = 'Issue' AND d.noteable_id = i.id AND n.is_system = 0) AS user_notes_count,
+                         WHERE d.noteable_type = 'Issue' AND d.issue_id = i.id AND n.is_system = 0) AS user_notes_count,
                        i.status_name, i.status_category, i.status_color,
                        i.status_icon_name, i.status_synced_at
                 FROM issues i
@@ -210,7 +211,7 @@ fn find_issue(conn: &Connection, iid: i64, project_filter: Option<&str>) -> Resu
                    i.due_date, i.milestone_title,
                    (SELECT COUNT(*) FROM notes n
                     JOIN discussions d ON n.discussion_id = d.id
-                     WHERE d.noteable_type = 'Issue' AND d.noteable_id = i.id AND n.is_system = 0) AS user_notes_count,
+                     WHERE d.noteable_type = 'Issue' AND d.issue_id = i.id AND n.is_system = 0) AS user_notes_count,
                    i.status_name, i.status_category, i.status_color,
                    i.status_icon_name, i.status_synced_at
             FROM issues i
@@ -1218,6 +1219,172 @@ mod tests {
        .unwrap();
    }

+    fn seed_second_project(conn: &Connection) {
+        conn.execute(
+            "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+             VALUES (2, 101, 'other/repo', 'https://gitlab.example.com/other', 1000, 2000)",
+            [],
+        )
+        .unwrap();
+    }
+
+    fn seed_discussion_with_notes(
+        conn: &Connection,
+        issue_id: i64,
+        project_id: i64,
+        user_notes: usize,
+        system_notes: usize,
+    ) {
+        let disc_id: i64 = conn
+            .query_row(
+                "SELECT COALESCE(MAX(id), 0) + 1 FROM discussions",
+                [],
+                |r| r.get(0),
+            )
+            .unwrap();
+        conn.execute(
+            "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, first_note_at, last_note_at, last_seen_at)
+             VALUES (?1, ?2, ?3, ?4, 'Issue', 1000, 2000, 2000)",
+            rusqlite::params![disc_id, format!("disc-{}", disc_id), project_id, issue_id],
+        )
+        .unwrap();
+        for i in 0..user_notes {
+            conn.execute(
+                "INSERT INTO notes (gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system, position)
+                 VALUES (?1, ?2, ?3, 'user1', 'comment', 1000, 2000, 2000, 0, ?4)",
+                rusqlite::params![1000 + disc_id * 100 + i as i64, disc_id, project_id, i as i64],
+            )
+            .unwrap();
+        }
+        for i in 0..system_notes {
+            conn.execute(
+                "INSERT INTO notes (gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system, position)
+                 VALUES (?1, ?2, ?3, 'system', 'status changed', 1000, 2000, 2000, 1, ?4)",
+                rusqlite::params![2000 + disc_id * 100 + i as i64, disc_id, project_id, (user_notes + i) as i64],
+            )
+            .unwrap();
+        }
+    }
+
+    // --- find_issue tests ---
+
+    #[test]
+    fn test_find_issue_basic() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        let row = find_issue(&conn, 10, None).unwrap();
+        assert_eq!(row.iid, 10);
+        assert_eq!(row.title, "Test issue");
+        assert_eq!(row.state, "opened");
+        assert_eq!(row.author_username, "author");
+        assert_eq!(row.project_path, "group/repo");
+    }
+
+    #[test]
+    fn test_find_issue_with_project_filter() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        let row = find_issue(&conn, 10, Some("group/repo")).unwrap();
+        assert_eq!(row.iid, 10);
+        assert_eq!(row.project_path, "group/repo");
+    }
+
+    #[test]
+    fn test_find_issue_not_found() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        let err = find_issue(&conn, 999, None).unwrap_err();
+        assert!(matches!(err, LoreError::NotFound(_)));
+    }
+
+    #[test]
+    fn test_find_issue_wrong_project_filter() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        seed_second_project(&conn);
+        // Issue 10 only exists in project 1, not project 2
+        let err = find_issue(&conn, 10, Some("other/repo")).unwrap_err();
+        assert!(matches!(err, LoreError::NotFound(_)));
+    }
+
+    #[test]
+    fn test_find_issue_ambiguous_without_project() {
+        let conn = setup_test_db();
+        seed_issue(&conn); // issue iid=10 in project 1
+        seed_second_project(&conn);
+        conn.execute(
+            "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, author_username,
+             created_at, updated_at, last_seen_at)
+             VALUES (2, 201, 10, 2, 'Same iid different project', 'opened', 'author', 1000, 2000, 2000)",
+            [],
+        )
+        .unwrap();
+        let err = find_issue(&conn, 10, None).unwrap_err();
+        assert!(matches!(err, LoreError::Ambiguous(_)));
+    }
+
+    #[test]
+    fn test_find_issue_ambiguous_resolved_with_project() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        seed_second_project(&conn);
+        conn.execute(
+            "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, author_username,
+             created_at, updated_at, last_seen_at)
+             VALUES (2, 201, 10, 2, 'Same iid different project', 'opened', 'author', 1000, 2000, 2000)",
+            [],
+        )
+        .unwrap();
+        let row = find_issue(&conn, 10, Some("other/repo")).unwrap();
+        assert_eq!(row.title, "Same iid different project");
+    }
+
+    #[test]
+    fn test_find_issue_user_notes_count_zero() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        let row = find_issue(&conn, 10, None).unwrap();
+        assert_eq!(row.user_notes_count, 0);
+    }
+
+    #[test]
+    fn test_find_issue_user_notes_count_excludes_system() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        // 2 user notes + 3 system notes = should count only 2
+        seed_discussion_with_notes(&conn, 1, 1, 2, 3);
+        let row = find_issue(&conn, 10, None).unwrap();
+        assert_eq!(row.user_notes_count, 2);
+    }
+
+    #[test]
+    fn test_find_issue_user_notes_count_across_discussions() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        seed_discussion_with_notes(&conn, 1, 1, 3, 0); // 3 user notes
+        seed_discussion_with_notes(&conn, 1, 1, 1, 2); // 1 user note + 2 system
+        let row = find_issue(&conn, 10, None).unwrap();
+        assert_eq!(row.user_notes_count, 4);
+    }
+
+    #[test]
+    fn test_find_issue_notes_count_ignores_other_issues() {
+        let conn = setup_test_db();
+        seed_issue(&conn);
+        // Add a second issue
+        conn.execute(
+            "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, author_username,
+             created_at, updated_at, last_seen_at)
+             VALUES (2, 201, 20, 1, 'Other issue', 'opened', 'author', 1000, 2000, 2000)",
+            [],
+        )
+        .unwrap();
+        // Notes on issue 2, not issue 1
+        seed_discussion_with_notes(&conn, 2, 1, 5, 0);
+        let row = find_issue(&conn, 10, None).unwrap();
+        assert_eq!(row.user_notes_count, 0); // Issue 10 has no notes
+    }
+
    #[test]
    fn test_ansi256_from_rgb() {
        assert_eq!(ansi256_from_rgb(0, 0, 0), 16);
--- a/src/cli/commands/sync.rs
+++ b/src/cli/commands/sync.rs
@@ -7,6 +7,7 @@ use tracing::Instrument;
 use tracing::{info, warn};

 use crate::Config;
+use crate::cli::progress::stage_spinner;
 use crate::core::error::Result;
 use crate::core::metrics::{MetricsLayer, StageTiming};
 use crate::core::shutdown::ShutdownSignal;
@@ -42,22 +43,6 @@ pub struct SyncResult {
    pub status_enrichment_errors: usize,
 }

-fn stage_spinner(stage: u8, total: u8, msg: &str, robot_mode: bool) -> ProgressBar {
-    if robot_mode {
-        return ProgressBar::hidden();
-    }
-    let pb = crate::cli::progress::multi().add(ProgressBar::new_spinner());
-    pb.set_style(
-        ProgressStyle::default_spinner()
-            .template("{spinner:.blue} {prefix} {msg}")
-            .expect("valid template"),
-    );
-    pb.enable_steady_tick(std::time::Duration::from_millis(80));
-    pb.set_prefix(format!("[{stage}/{total}]"));
-    pb.set_message(msg.to_string());
-    pb
-}
-
 pub async fn run_sync(
    config: &Config,
    options: SyncOptions,
--- a/src/cli/commands/timeline.rs
+++ b/src/cli/commands/timeline.rs
@@ -2,6 +2,7 @@ use console::{Alignment, pad_str, style};
 use serde::Serialize;

 use crate::Config;
+use crate::cli::progress::stage_spinner;
 use crate::core::db::create_connection;
 use crate::core::error::{LoreError, Result};
 use crate::core::paths::get_db_path;
@@ -12,7 +13,8 @@ use crate::core::timeline::{
 };
 use crate::core::timeline_collect::collect_events;
 use crate::core::timeline_expand::expand_timeline;
-use crate::core::timeline_seed::seed_timeline;
+use crate::core::timeline_seed::{seed_timeline, seed_timeline_direct};
+use crate::embedding::ollama::{OllamaClient, OllamaConfig};

 /// Parameters for running the timeline pipeline.
 pub struct TimelineParams {
@@ -25,10 +27,48 @@ pub struct TimelineParams {
    pub max_seeds: usize,
    pub max_entities: usize,
    pub max_evidence: usize,
+    pub robot_mode: bool,
+}
+
+/// Parsed timeline query: either a search string or a direct entity reference.
+enum TimelineQuery {
+    Search(String),
+    EntityDirect { entity_type: String, iid: i64 },
+}
+
+/// Parse the timeline query for entity-direct patterns.
+///
+/// Recognized patterns (case-insensitive prefix):
+/// - `issue:N`, `i:N` -> issue
+/// - `mr:N`, `m:N` -> merge_request
+/// - Anything else -> search query
+fn parse_timeline_query(query: &str) -> TimelineQuery {
+    let query = query.trim();
+    if let Some((prefix, rest)) = query.split_once(':') {
+        let prefix_lower = prefix.to_ascii_lowercase();
+        if let Ok(iid) = rest.trim().parse::<i64>() {
+            match prefix_lower.as_str() {
+                "issue" | "i" => {
+                    return TimelineQuery::EntityDirect {
+                        entity_type: "issue".to_owned(),
+                        iid,
+                    };
+                }
+                "mr" | "m" => {
+                    return TimelineQuery::EntityDirect {
+                        entity_type: "merge_request".to_owned(),
+                        iid,
+                    };
+                }
+                _ => {}
+            }
+        }
+    }
+    TimelineQuery::Search(query.to_owned())
 }

 /// Run the full timeline pipeline: SEED -> EXPAND -> COLLECT.
-pub fn run_timeline(config: &Config, params: &TimelineParams) -> Result<TimelineResult> {
+pub async fn run_timeline(config: &Config, params: &TimelineParams) -> Result<TimelineResult> {
    let db_path = get_db_path(config.storage.db_path.as_deref());
    let conn = create_connection(&db_path)?;

@@ -50,17 +90,45 @@ pub fn run_timeline(config: &Config, params: &TimelineParams) -> Result<Timeline
        })
        .transpose()?;

-    // Stage 1+2: SEED + HYDRATE
-    let seed_result = seed_timeline(
-        &conn,
-        &params.query,
-        project_id,
-        since_ms,
-        params.max_seeds,
-        params.max_evidence,
-    )?;
+    // Parse query for entity-direct syntax (issue:N, mr:N, i:N, m:N)
+    let parsed_query = parse_timeline_query(&params.query);
+
+    let seed_result = match parsed_query {
+        TimelineQuery::EntityDirect { entity_type, iid } => {
+            // Direct seeding: synchronous, no Ollama needed
+            let spinner = stage_spinner(1, 3, "Resolving entity...", params.robot_mode);
+            let result = seed_timeline_direct(&conn, &entity_type, iid, project_id)?;
+            spinner.finish_and_clear();
+            result
+        }
+        TimelineQuery::Search(ref query) => {
+            // Construct OllamaClient for hybrid search (same pattern as run_search)
+            let ollama_cfg = &config.embedding;
+            let client = OllamaClient::new(OllamaConfig {
+                base_url: ollama_cfg.base_url.clone(),
+                model: ollama_cfg.model.clone(),
+                ..OllamaConfig::default()
+            });
+
+            // Stage 1+2: SEED + HYDRATE (hybrid search with FTS fallback)
+            let spinner = stage_spinner(1, 3, "Seeding timeline...", params.robot_mode);
+            let result = seed_timeline(
+                &conn,
+                Some(&client),
+                query,
+                project_id,
+                since_ms,
+                params.max_seeds,
+                params.max_evidence,
+            )
+            .await?;
+            spinner.finish_and_clear();
+            result
+        }
+    };

    // Stage 3: EXPAND
+    let spinner = stage_spinner(2, 3, "Expanding cross-references...", params.robot_mode);
    let expand_result = expand_timeline(
        &conn,
        &seed_result.seed_entities,
@@ -68,19 +136,24 @@ pub fn run_timeline(config: &Config, params: &TimelineParams) -> Result<Timeline
        params.expand_mentions,
        params.max_entities,
    )?;
+    spinner.finish_and_clear();

    // Stage 4: COLLECT
+    let spinner = stage_spinner(3, 3, "Collecting events...", params.robot_mode);
    let (events, total_before_limit) = collect_events(
        &conn,
        &seed_result.seed_entities,
        &expand_result.expanded_entities,
        &seed_result.evidence_notes,
+        &seed_result.matched_discussions,
        since_ms,
        params.limit,
    )?;
+    spinner.finish_and_clear();

    Ok(TimelineResult {
        query: params.query.clone(),
+        search_mode: seed_result.search_mode,
        events,
        total_events_before_limit: total_before_limit,
        seed_entities: seed_result.seed_entities,
@@ -150,6 +223,25 @@ fn print_timeline_event(event: &TimelineEvent) {
            );
        }
    }
+
+    // Show full discussion thread
+    if let TimelineEventType::DiscussionThread { notes, .. } = &event.event_type {
+        let bar = "\u{2500}".repeat(44);
+        println!("    \u{2500}\u{2500} Discussion {bar}");
+        for note in notes {
+            let note_date = format_date(note.created_at);
+            let author = note
+                .author
+                .as_deref()
+                .map(|a| format!("@{a}"))
+                .unwrap_or_else(|| "unknown".to_owned());
+            println!("    {} ({note_date}):", style(author).bold());
+            for line in wrap_text(&note.body, 60) {
+                println!("      {line}");
+            }
+        }
+        println!("    {}", "\u{2500}".repeat(60));
+    }
 }

 fn print_timeline_footer(result: &TimelineResult) {
@@ -194,6 +286,7 @@ fn format_event_tag(event_type: &TimelineEventType) -> String {
        TimelineEventType::MilestoneRemoved { .. } => style("MILESTONE-").magenta().to_string(),
        TimelineEventType::Merged => style("MERGED").cyan().to_string(),
        TimelineEventType::NoteEvidence { .. } => style("NOTE").dim().to_string(),
+        TimelineEventType::DiscussionThread { .. } => style("THREAD").yellow().to_string(),
        TimelineEventType::CrossReferenced { .. } => style("REF").dim().to_string(),
    }
 }
@@ -220,7 +313,7 @@ fn truncate_summary(s: &str, max: usize) -> String {
    }
 }

-fn wrap_snippet(text: &str, width: usize) -> Vec<String> {
+fn wrap_text(text: &str, width: usize) -> Vec<String> {
    let mut lines = Vec::new();
    let mut current = String::new();

@@ -239,7 +332,11 @@ fn wrap_snippet(text: &str, width: usize) -> Vec<String> {
        lines.push(current);
    }

-    // Cap at 4 lines
+    lines
+}
+
+fn wrap_snippet(text: &str, width: usize) -> Vec<String> {
+    let mut lines = wrap_text(text, width);
    lines.truncate(4);
    lines
 }
@@ -258,12 +355,13 @@ pub fn print_timeline_json_with_meta(
        ok: true,
        data: TimelineDataJson::from_result(result),
        meta: TimelineMetaJson {
-            search_mode: "lexical".to_owned(),
+            search_mode: result.search_mode.clone(),
            expansion_depth: depth,
            expand_mentions,
            total_entities: result.seed_entities.len() + result.expanded_entities.len(),
            total_events: total_events_before_limit,
            evidence_notes_included: count_evidence_notes(&result.events),
+            discussion_threads_included: count_discussion_threads(&result.events),
            unresolved_references: result.unresolved_references.len(),
            showing: result.events.len(),
        },
@@ -461,6 +559,22 @@ fn event_type_to_json(event_type: &TimelineEventType) -> (String, serde_json::Va
                "discussion_id": discussion_id,
            }),
        ),
+        TimelineEventType::DiscussionThread {
+            discussion_id,
+            notes,
+        } => (
+            "discussion_thread".to_owned(),
+            serde_json::json!({
+                "discussion_id": discussion_id,
+                "note_count": notes.len(),
+                "notes": notes.iter().map(|n| serde_json::json!({
+                    "note_id": n.note_id,
+                    "author": n.author,
+                    "body": n.body,
+                    "created_at": ms_to_iso(n.created_at),
+                })).collect::<Vec<_>>(),
+            }),
+        ),
        TimelineEventType::CrossReferenced { target } => (
            "cross_referenced".to_owned(),
            serde_json::json!({ "target": target }),
@@ -476,6 +590,7 @@ struct TimelineMetaJson {
    total_entities: usize,
    total_events: usize,
    evidence_notes_included: usize,
+    discussion_threads_included: usize,
    unresolved_references: usize,
    showing: usize,
 }
@@ -486,3 +601,91 @@ fn count_evidence_notes(events: &[TimelineEvent]) -> usize {
        .filter(|e| matches!(e.event_type, TimelineEventType::NoteEvidence { .. }))
        .count()
 }
+
+fn count_discussion_threads(events: &[TimelineEvent]) -> usize {
+    events
+        .iter()
+        .filter(|e| matches!(e.event_type, TimelineEventType::DiscussionThread { .. }))
+        .count()
+}
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_parse_issue_colon_number() {
+        let q = parse_timeline_query("issue:42");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
+        );
+    }
+
+    #[test]
+    fn test_parse_i_colon_number() {
+        let q = parse_timeline_query("i:42");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
+        );
+    }
+
+    #[test]
+    fn test_parse_mr_colon_number() {
+        let q = parse_timeline_query("mr:99");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
+        );
+    }
+
+    #[test]
+    fn test_parse_m_colon_number() {
+        let q = parse_timeline_query("m:99");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
+        );
+    }
+
+    #[test]
+    fn test_parse_case_insensitive() {
+        let q = parse_timeline_query("ISSUE:42");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
+        );
+
+        let q = parse_timeline_query("MR:99");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "merge_request" && iid == 99)
+        );
+
+        let q = parse_timeline_query("Issue:7");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 7)
+        );
+    }
+
+    #[test]
+    fn test_parse_search_fallback() {
+        let q = parse_timeline_query("switch health");
+        assert!(matches!(q, TimelineQuery::Search(ref s) if s == "switch health"));
+    }
+
+    #[test]
+    fn test_parse_non_numeric_falls_back_to_search() {
+        let q = parse_timeline_query("issue:abc");
+        assert!(matches!(q, TimelineQuery::Search(_)));
+    }
+
+    #[test]
+    fn test_parse_unknown_prefix_falls_back_to_search() {
+        let q = parse_timeline_query("foo:42");
+        assert!(matches!(q, TimelineQuery::Search(_)));
+    }
+
+    #[test]
+    fn test_parse_whitespace_trimmed() {
+        let q = parse_timeline_query("  issue:42  ");
+        assert!(
+            matches!(q, TimelineQuery::EntityDirect { ref entity_type, iid } if entity_type == "issue" && iid == 42)
+        );
+    }
+}
--- a/src/cli/commands/who.rs
+++ b/src/cli/commands/who.rs
--- a/src/cli/commands/who_tests.rs
+++ b/src/cli/commands/who_tests.rs
--- a/src/cli/mod.rs
+++ b/src/cli/mod.rs
@@ -10,6 +10,7 @@ use std::io::IsTerminal;
 #[command(name = "lore")]
 #[command(version = env!("LORE_VERSION"), about = "Local GitLab data management with semantic search", long_about = None)]
 #[command(subcommand_required = false)]
+#[command(infer_subcommands = true)]
 #[command(after_long_help = "\x1b[1mEnvironment:\x1b[0m
  GITLAB_TOKEN       GitLab personal access token (or name set in config)
  LORE_ROBOT         Enable robot/JSON mode (non-empty, non-zero value)
@@ -107,11 +108,21 @@ impl Cli {
 #[allow(clippy::large_enum_variant)]
 pub enum Commands {
    /// List or show issues
+    #[command(visible_alias = "issue")]
    Issues(IssuesArgs),

    /// List or show merge requests
+    #[command(
+        visible_alias = "mr",
+        alias = "merge-requests",
+        alias = "merge-request"
+    )]
    Mrs(MrsArgs),

+    /// List notes from discussions
+    #[command(visible_alias = "note")]
+    Notes(NotesArgs),
+
    /// Ingest data from GitLab
    Ingest(IngestArgs),

@@ -119,6 +130,7 @@ pub enum Commands {
    Count(CountArgs),

    /// Show sync state
+    #[command(visible_alias = "st")]
    Status,

    /// Verify GitLab authentication
@@ -167,9 +179,11 @@ pub enum Commands {
    },

    /// Search indexed documents
+    #[command(visible_alias = "find", alias = "query")]
    Search(SearchArgs),

    /// Show document and index statistics
+    #[command(visible_alias = "stat")]
    Stats(StatsArgs),

    /// Generate searchable documents from ingested data
@@ -489,6 +503,113 @@ pub struct MrsArgs {
    pub no_open: bool,
 }

+#[derive(Parser)]
+#[command(after_help = "\x1b[1mExamples:\x1b[0m
+  lore notes                                  # List 50 most recent notes
+  lore notes --author alice --since 7d        # Notes by alice in last 7 days
+  lore notes --for-issue 42 -p group/repo     # Notes on issue #42
+  lore notes --path src/ --resolution unresolved  # Unresolved diff notes in src/")]
+pub struct NotesArgs {
+    /// Maximum results
+    #[arg(
+        short = 'n',
+        long = "limit",
+        default_value = "50",
+        help_heading = "Output"
+    )]
+    pub limit: usize,
+
+    /// Select output fields (comma-separated, or 'minimal' preset: id,author_username,body,created_at_iso)
+    #[arg(long, help_heading = "Output", value_delimiter = ',')]
+    pub fields: Option<Vec<String>>,
+
+    /// Output format (table, json, jsonl, csv)
+    #[arg(
+        long,
+        default_value = "table",
+        value_parser = ["table", "json", "jsonl", "csv"],
+        help_heading = "Output"
+    )]
+    pub format: String,
+
+    /// Filter by author username
+    #[arg(short = 'a', long, help_heading = "Filters")]
+    pub author: Option<String>,
+
+    /// Filter by note type (DiffNote, DiscussionNote)
+    #[arg(long, help_heading = "Filters")]
+    pub note_type: Option<String>,
+
+    /// Filter by body text (substring match)
+    #[arg(long, help_heading = "Filters")]
+    pub contains: Option<String>,
+
+    /// Filter by internal note ID
+    #[arg(long, help_heading = "Filters")]
+    pub note_id: Option<i64>,
+
+    /// Filter by GitLab note ID
+    #[arg(long, help_heading = "Filters")]
+    pub gitlab_note_id: Option<i64>,
+
+    /// Filter by discussion ID
+    #[arg(long, help_heading = "Filters")]
+    pub discussion_id: Option<String>,
+
+    /// Include system notes (excluded by default)
+    #[arg(long, help_heading = "Filters")]
+    pub include_system: bool,
+
+    /// Filter to notes on a specific issue IID (requires --project or default_project)
+    #[arg(long, conflicts_with = "for_mr", help_heading = "Filters")]
+    pub for_issue: Option<i64>,
+
+    /// Filter to notes on a specific MR IID (requires --project or default_project)
+    #[arg(long, conflicts_with = "for_issue", help_heading = "Filters")]
+    pub for_mr: Option<i64>,
+
+    /// Filter by project path
+    #[arg(short = 'p', long, help_heading = "Filters")]
+    pub project: Option<String>,
+
+    /// Filter by time (7d, 2w, 1m, or YYYY-MM-DD)
+    #[arg(long, help_heading = "Filters")]
+    pub since: Option<String>,
+
+    /// Filter until date (YYYY-MM-DD, inclusive end-of-day)
+    #[arg(long, help_heading = "Filters")]
+    pub until: Option<String>,
+
+    /// Filter by file path (exact match or prefix with trailing /)
+    #[arg(long, help_heading = "Filters")]
+    pub path: Option<String>,
+
+    /// Filter by resolution status (any, unresolved, resolved)
+    #[arg(
+        long,
+        value_parser = ["any", "unresolved", "resolved"],
+        help_heading = "Filters"
+    )]
+    pub resolution: Option<String>,
+
+    /// Sort field (created, updated)
+    #[arg(
+        long,
+        value_parser = ["created", "updated"],
+        default_value = "created",
+        help_heading = "Sorting"
+    )]
+    pub sort: String,
+
+    /// Sort ascending (default: descending)
+    #[arg(long, help_heading = "Sorting")]
+    pub asc: bool,
+
+    /// Open first matching item in browser
+    #[arg(long, help_heading = "Actions")]
+    pub open: bool,
+}
+
 #[derive(Parser)]
 pub struct IngestArgs {
    /// Entity to ingest (issues, mrs). Omit to ingest everything
@@ -556,8 +677,8 @@ pub struct SearchArgs {
    #[arg(long, default_value = "hybrid", value_parser = ["lexical", "hybrid", "semantic"], help_heading = "Mode")]
    pub mode: String,

-    /// Filter by source type (issue, mr, discussion)
-    #[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion"], help_heading = "Filters")]
+    /// Filter by source type (issue, mr, discussion, note)
+    #[arg(long = "type", value_name = "TYPE", value_parser = ["issue", "mr", "discussion", "note"], help_heading = "Filters")]
    pub source_type: Option<String>,

    /// Filter by author username
@@ -684,11 +805,14 @@ pub struct EmbedArgs {

 #[derive(Parser)]
 #[command(after_help = "\x1b[1mExamples:\x1b[0m
-  lore timeline 'deployment'                          # Events related to deployments
+  lore timeline 'deployment'                          # Search-based seeding
+  lore timeline issue:42                              # Direct: issue #42 and related entities
+  lore timeline i:42                                  # Shorthand for issue:42
+  lore timeline mr:99                                 # Direct: MR !99 and related entities
  lore timeline 'auth' --since 30d -p group/repo      # Scoped to project and time
  lore timeline 'migration' --depth 2 --expand-mentions  # Deep cross-reference expansion")]
 pub struct TimelineArgs {
-    /// Search query (keywords to find in issues, MRs, and discussions)
+    /// Search text or entity reference (issue:N, i:N, mr:N, m:N)
    pub query: String,

    /// Scope to a specific project (fuzzy match)
@@ -795,11 +919,36 @@ pub struct WhoArgs {
    pub fields: Option<Vec<String>>,

    /// Show per-MR detail breakdown (expert mode only)
-    #[arg(long, help_heading = "Output", overrides_with = "no_detail")]
+    #[arg(
+        long,
+        help_heading = "Output",
+        overrides_with = "no_detail",
+        conflicts_with = "explain_score"
+    )]
    pub detail: bool,

    #[arg(long = "no-detail", hide = true, overrides_with = "detail")]
    pub no_detail: bool,
+
+    /// Score as if "now" is this date (ISO 8601 or duration like 30d). Expert mode only.
+    #[arg(long = "as-of", help_heading = "Scoring")]
+    pub as_of: Option<String>,
+
+    /// Show per-component score breakdown in output. Expert mode only.
+    #[arg(long = "explain-score", help_heading = "Scoring")]
+    pub explain_score: bool,
+
+    /// Include bot users in results (normally excluded via scoring.excluded_usernames).
+    #[arg(long = "include-bots", help_heading = "Scoring")]
+    pub include_bots: bool,
+
+    /// Remove the default time window (query all history). Conflicts with --since.
+    #[arg(
+        long = "all-history",
+        help_heading = "Filters",
+        conflicts_with = "since"
+    )]
+    pub all_history: bool,
 }

 #[derive(Parser)]
--- a/src/cli/progress.rs
+++ b/src/cli/progress.rs
@@ -1,4 +1,4 @@
-use indicatif::MultiProgress;
+use indicatif::{MultiProgress, ProgressBar, ProgressStyle};
 use std::io::Write;
 use std::sync::LazyLock;
 use tracing_subscriber::fmt::MakeWriter;
@@ -9,6 +9,26 @@ pub fn multi() -> &'static MultiProgress {
    &MULTI
 }

+/// Create a spinner for a numbered pipeline stage.
+///
+/// Returns a hidden (no-op) bar in robot mode so callers can use
+/// the same code path regardless of output mode.
+pub fn stage_spinner(stage: u8, total: u8, msg: &str, robot_mode: bool) -> ProgressBar {
+    if robot_mode {
+        return ProgressBar::hidden();
+    }
+    let pb = multi().add(ProgressBar::new_spinner());
+    pb.set_style(
+        ProgressStyle::default_spinner()
+            .template("{spinner:.blue} {prefix} {msg}")
+            .expect("valid template"),
+    );
+    pb.enable_steady_tick(std::time::Duration::from_millis(80));
+    pb.set_prefix(format!("[{stage}/{total}]"));
+    pb.set_message(msg.to_string());
+    pb
+}
+
 #[derive(Clone)]
 pub struct SuspendingWriter;

@@ -50,7 +70,6 @@ impl<'a> MakeWriter<'a> for SuspendingWriter {
 #[cfg(test)]
 mod tests {
    use super::*;
-    use indicatif::ProgressBar;

    #[test]
    fn multi_returns_same_instance() {
@@ -88,4 +107,35 @@ mod tests {
        let w = MakeWriter::make_writer(&writer);
        drop(w);
    }
+
+    #[test]
+    fn stage_spinner_robot_mode_returns_hidden() {
+        let pb = stage_spinner(1, 3, "Testing...", true);
+        assert!(pb.is_hidden());
+    }
+
+    #[test]
+    fn stage_spinner_human_mode_sets_properties() {
+        // In non-TTY test environments, MultiProgress may report bars as
+        // hidden. Verify the human-mode code path by checking that prefix
+        // and message are configured (robot-mode returns a bare hidden bar).
+        let pb = stage_spinner(1, 3, "Testing...", false);
+        assert_eq!(pb.prefix(), "[1/3]");
+        assert_eq!(pb.message(), "Testing...");
+        pb.finish_and_clear();
+    }
+
+    #[test]
+    fn stage_spinner_sets_prefix_format() {
+        let pb = stage_spinner(2, 5, "Working...", false);
+        assert_eq!(pb.prefix(), "[2/5]");
+        pb.finish_and_clear();
+    }
+
+    #[test]
+    fn stage_spinner_sets_message() {
+        let pb = stage_spinner(1, 3, "Seeding timeline...", false);
+        assert_eq!(pb.message(), "Seeding timeline...");
+        pb.finish_and_clear();
+    }
 }
--- a/src/cli/robot.rs
+++ b/src/cli/robot.rs
@@ -64,6 +64,10 @@ pub fn expand_fields_preset(fields: &[String], entity: &str) -> Vec<String> {
                .iter()
                .map(|s| (*s).to_string())
                .collect(),
+            "notes" => ["id", "author_username", "body", "created_at_iso"]
+                .iter()
+                .map(|s| (*s).to_string())
+                .collect(),
            _ => fields.to_vec(),
        }
    } else {
@@ -82,3 +86,25 @@ pub fn strip_schemas(commands: &mut serde_json::Value) {
        }
    }
 }
+
+#[cfg(test)]
+mod tests {
+    use super::*;
+
+    #[test]
+    fn test_expand_fields_preset_notes() {
+        let fields = vec!["minimal".to_string()];
+        let expanded = expand_fields_preset(&fields, "notes");
+        assert_eq!(
+            expanded,
+            ["id", "author_username", "body", "created_at_iso"]
+        );
+    }
+
+    #[test]
+    fn test_expand_fields_preset_passthrough() {
+        let fields = vec!["id".to_string(), "body".to_string()];
+        let expanded = expand_fields_preset(&fields, "notes");
+        assert_eq!(expanded, ["id", "body"]);
+    }
+}
--- a/src/core/config.rs
+++ b/src/core/config.rs
@@ -164,6 +164,38 @@ pub struct ScoringConfig {
    /// Bonus points per individual inline review comment (DiffNote).
    #[serde(rename = "noteBonus")]
    pub note_bonus: i64,
+
+    /// Points per MR where the user was assigned as a reviewer.
+    #[serde(rename = "reviewerAssignmentWeight")]
+    pub reviewer_assignment_weight: i64,
+
+    /// Half-life in days for author contribution decay.
+    #[serde(rename = "authorHalfLifeDays")]
+    pub author_half_life_days: u32,
+
+    /// Half-life in days for reviewer contribution decay.
+    #[serde(rename = "reviewerHalfLifeDays")]
+    pub reviewer_half_life_days: u32,
+
+    /// Half-life in days for reviewer assignment decay.
+    #[serde(rename = "reviewerAssignmentHalfLifeDays")]
+    pub reviewer_assignment_half_life_days: u32,
+
+    /// Half-life in days for note/comment contribution decay.
+    #[serde(rename = "noteHalfLifeDays")]
+    pub note_half_life_days: u32,
+
+    /// Multiplier applied to scores from closed (not merged) MRs.
+    #[serde(rename = "closedMrMultiplier")]
+    pub closed_mr_multiplier: f64,
+
+    /// Minimum character count for a review note to earn note_bonus.
+    #[serde(rename = "reviewerMinNoteChars")]
+    pub reviewer_min_note_chars: u32,
+
+    /// Usernames excluded from expert/scoring results.
+    #[serde(rename = "excludedUsernames")]
+    pub excluded_usernames: Vec<String>,
 }

 impl Default for ScoringConfig {
@@ -172,6 +204,14 @@ impl Default for ScoringConfig {
            author_weight: 25,
            reviewer_weight: 10,
            note_bonus: 1,
+            reviewer_assignment_weight: 3,
+            author_half_life_days: 180,
+            reviewer_half_life_days: 90,
+            reviewer_assignment_half_life_days: 45,
+            note_half_life_days: 45,
+            closed_mr_multiplier: 0.5,
+            reviewer_min_note_chars: 20,
+            excluded_usernames: vec![],
        }
    }
 }
@@ -287,6 +327,55 @@ fn validate_scoring(scoring: &ScoringConfig) -> Result<()> {
            details: "scoring.noteBonus must be >= 0".to_string(),
        });
    }
+    if scoring.reviewer_assignment_weight < 0 {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.reviewerAssignmentWeight must be >= 0".to_string(),
+        });
+    }
+    if scoring.author_half_life_days == 0 || scoring.author_half_life_days > 3650 {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.authorHalfLifeDays must be in 1..=3650".to_string(),
+        });
+    }
+    if scoring.reviewer_half_life_days == 0 || scoring.reviewer_half_life_days > 3650 {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.reviewerHalfLifeDays must be in 1..=3650".to_string(),
+        });
+    }
+    if scoring.reviewer_assignment_half_life_days == 0
+        || scoring.reviewer_assignment_half_life_days > 3650
+    {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.reviewerAssignmentHalfLifeDays must be in 1..=3650".to_string(),
+        });
+    }
+    if scoring.note_half_life_days == 0 || scoring.note_half_life_days > 3650 {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.noteHalfLifeDays must be in 1..=3650".to_string(),
+        });
+    }
+    if !scoring.closed_mr_multiplier.is_finite()
+        || scoring.closed_mr_multiplier <= 0.0
+        || scoring.closed_mr_multiplier > 1.0
+    {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.closedMrMultiplier must be finite and in (0.0, 1.0]".to_string(),
+        });
+    }
+    if scoring.reviewer_min_note_chars > 4096 {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.reviewerMinNoteChars must be <= 4096".to_string(),
+        });
+    }
+    if scoring
+        .excluded_usernames
+        .iter()
+        .any(|u| u.trim().is_empty())
+    {
+        return Err(LoreError::ConfigInvalid {
+            details: "scoring.excludedUsernames entries must be non-empty".to_string(),
+        });
+    }
    Ok(())
 }

@@ -561,4 +650,140 @@ mod tests {
            "set default_project should be present: {json}"
        );
    }
+
+    #[test]
+    fn test_config_validation_rejects_zero_half_life() {
+        let scoring = ScoringConfig {
+            author_half_life_days: 0,
+            ..Default::default()
+        };
+        let err = validate_scoring(&scoring).unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("authorHalfLifeDays"),
+            "unexpected error: {msg}"
+        );
+    }
+
+    #[test]
+    fn test_config_validation_rejects_absurd_half_life() {
+        let scoring = ScoringConfig {
+            author_half_life_days: 5000,
+            ..Default::default()
+        };
+        let err = validate_scoring(&scoring).unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("authorHalfLifeDays"),
+            "unexpected error: {msg}"
+        );
+    }
+
+    #[test]
+    fn test_config_validation_rejects_nan_multiplier() {
+        let scoring = ScoringConfig {
+            closed_mr_multiplier: f64::NAN,
+            ..Default::default()
+        };
+        let err = validate_scoring(&scoring).unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("closedMrMultiplier"),
+            "unexpected error: {msg}"
+        );
+    }
+
+    #[test]
+    fn test_config_validation_rejects_zero_multiplier() {
+        let scoring = ScoringConfig {
+            closed_mr_multiplier: 0.0,
+            ..Default::default()
+        };
+        let err = validate_scoring(&scoring).unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("closedMrMultiplier"),
+            "unexpected error: {msg}"
+        );
+    }
+
+    #[test]
+    fn test_config_validation_rejects_negative_reviewer_assignment_weight() {
+        let scoring = ScoringConfig {
+            reviewer_assignment_weight: -1,
+            ..Default::default()
+        };
+        let err = validate_scoring(&scoring).unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("reviewerAssignmentWeight"),
+            "unexpected error: {msg}"
+        );
+    }
+
+    #[test]
+    fn test_config_validation_rejects_oversized_min_note_chars() {
+        let scoring = ScoringConfig {
+            reviewer_min_note_chars: 5000,
+            ..Default::default()
+        };
+        let err = validate_scoring(&scoring).unwrap_err();
+        let msg = err.to_string();
+        assert!(
+            msg.contains("reviewerMinNoteChars"),
+            "unexpected error: {msg}"
+        );
+    }
+
+    #[test]
+    fn test_config_validation_rejects_empty_excluded_username() {
+        let scoring = ScoringConfig {
+            excluded_usernames: vec!["valid".to_string(), "  ".to_string()],
+            ..Default::default()
+        };
+        let err = validate_scoring(&scoring).unwrap_err();
+        let msg = err.to_string();
+        assert!(msg.contains("excludedUsernames"), "unexpected error: {msg}");
+    }
+
+    #[test]
+    fn test_config_validation_accepts_valid_new_fields() {
+        let scoring = ScoringConfig {
+            author_half_life_days: 365,
+            reviewer_half_life_days: 180,
+            reviewer_assignment_half_life_days: 90,
+            note_half_life_days: 60,
+            closed_mr_multiplier: 0.5,
+            reviewer_min_note_chars: 20,
+            reviewer_assignment_weight: 3,
+            excluded_usernames: vec!["bot-user".to_string()],
+            ..Default::default()
+        };
+        validate_scoring(&scoring).unwrap();
+    }
+
+    #[test]
+    fn test_config_validation_accepts_boundary_half_life() {
+        // 1 and 3650 are both valid boundaries
+        let scoring_min = ScoringConfig {
+            author_half_life_days: 1,
+            ..Default::default()
+        };
+        validate_scoring(&scoring_min).unwrap();
+
+        let scoring_max = ScoringConfig {
+            author_half_life_days: 3650,
+            ..Default::default()
+        };
+        validate_scoring(&scoring_max).unwrap();
+    }
+
+    #[test]
+    fn test_config_validation_accepts_multiplier_at_one() {
+        let scoring = ScoringConfig {
+            closed_mr_multiplier: 1.0,
+            ..Default::default()
+        };
+        validate_scoring(&scoring).unwrap();
+    }
 }
--- a/src/core/db.rs
+++ b/src/core/db.rs
@@ -69,10 +69,26 @@ const MIGRATIONS: &[(&str, &str)] = &[
        "021",
        include_str!("../../migrations/021_work_item_status.sql"),
    ),
+    (
+        "022",
+        include_str!("../../migrations/022_notes_query_index.sql"),
+    ),
    (
        "023",
        include_str!("../../migrations/023_issue_detail_fields.sql"),
    ),
+    (
+        "024",
+        include_str!("../../migrations/024_note_documents.sql"),
+    ),
+    (
+        "025",
+        include_str!("../../migrations/025_note_dirty_backfill.sql"),
+    ),
+    (
+        "026",
+        include_str!("../../migrations/026_scoring_indexes.sql"),
+    ),
 ];

 pub fn create_connection(db_path: &Path) -> Result<Connection> {
@@ -316,3 +332,7 @@ pub fn get_schema_version(conn: &Connection) -> i32 {
    )
    .unwrap_or(0)
 }
+
+#[cfg(test)]
+#[path = "db_tests.rs"]
+mod tests;
--- a/src/core/db_tests.rs
+++ b/src/core/db_tests.rs
@@ -0,0 +1,632 @@
+use super::*;
+
+fn setup_migrated_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn index_exists(conn: &Connection, index_name: &str) -> bool {
+    conn.query_row(
+        "SELECT COUNT(*) > 0 FROM sqlite_master WHERE type='index' AND name=?1",
+        [index_name],
+        |row| row.get(0),
+    )
+    .unwrap_or(false)
+}
+
+fn column_exists(conn: &Connection, table: &str, column: &str) -> bool {
+    let sql = format!("PRAGMA table_info({})", table);
+    let mut stmt = conn.prepare(&sql).unwrap();
+    let columns: Vec<String> = stmt
+        .query_map([], |row| row.get::<_, String>(1))
+        .unwrap()
+        .filter_map(|r| r.ok())
+        .collect();
+    columns.contains(&column.to_string())
+}
+
+#[test]
+fn test_migration_022_indexes_exist() {
+    let conn = setup_migrated_db();
+
+    // New indexes from migration 022
+    assert!(
+        index_exists(&conn, "idx_notes_user_created"),
+        "idx_notes_user_created should exist"
+    );
+    assert!(
+        index_exists(&conn, "idx_notes_project_created"),
+        "idx_notes_project_created should exist"
+    );
+    assert!(
+        index_exists(&conn, "idx_notes_author_id"),
+        "idx_notes_author_id should exist"
+    );
+
+    // Discussion JOIN indexes (idx_discussions_issue_id is new;
+    // idx_discussions_mr_id already existed from migration 006 but
+    // IF NOT EXISTS makes it safe)
+    assert!(
+        index_exists(&conn, "idx_discussions_issue_id"),
+        "idx_discussions_issue_id should exist"
+    );
+    assert!(
+        index_exists(&conn, "idx_discussions_mr_id"),
+        "idx_discussions_mr_id should exist"
+    );
+
+    // author_id column on notes
+    assert!(
+        column_exists(&conn, "notes", "author_id"),
+        "notes.author_id column should exist"
+    );
+}
+
+// -- Helper: insert a minimal project for FK satisfaction --
+fn insert_test_project(conn: &Connection) -> i64 {
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) \
+         VALUES (1000, 'test/project', 'https://example.com/test/project')",
+        [],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+// -- Helper: insert a minimal issue --
+fn insert_test_issue(conn: &Connection, project_id: i64) -> i64 {
+    conn.execute(
+        "INSERT INTO issues (gitlab_id, project_id, iid, state, author_username, \
+         created_at, updated_at, last_seen_at) \
+         VALUES (100, ?1, 1, 'opened', 'alice', 1000, 1000, 1000)",
+        [project_id],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+// -- Helper: insert a minimal discussion --
+fn insert_test_discussion(conn: &Connection, project_id: i64, issue_id: i64) -> i64 {
+    conn.execute(
+        "INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, \
+         noteable_type, last_seen_at) \
+         VALUES ('disc-001', ?1, ?2, 'Issue', 1000)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+// -- Helper: insert a minimal non-system note --
+#[allow(clippy::too_many_arguments)]
+fn insert_test_note(
+    conn: &Connection,
+    gitlab_id: i64,
+    discussion_id: i64,
+    project_id: i64,
+    is_system: bool,
+) -> i64 {
+    conn.execute(
+        "INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, \
+         author_username, body, created_at, updated_at, last_seen_at) \
+         VALUES (?1, ?2, ?3, ?4, 'alice', 'note body', 1000, 1000, 1000)",
+        rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+// -- Helper: insert a document --
+fn insert_test_document(
+    conn: &Connection,
+    source_type: &str,
+    source_id: i64,
+    project_id: i64,
+) -> i64 {
+    conn.execute(
+        "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) \
+         VALUES (?1, ?2, ?3, 'test content', 'hash123')",
+        rusqlite::params![source_type, source_id, project_id],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+#[test]
+fn test_migration_024_allows_note_source_type() {
+    let conn = setup_migrated_db();
+    let pid = insert_test_project(&conn);
+
+    // Should succeed -- 'note' is now allowed
+    conn.execute(
+        "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) \
+         VALUES ('note', 1, ?1, 'note content', 'hash-note')",
+        [pid],
+    )
+    .expect("INSERT with source_type='note' into documents should succeed");
+
+    // dirty_sources should also accept 'note'
+    conn.execute(
+        "INSERT INTO dirty_sources (source_type, source_id, queued_at) \
+         VALUES ('note', 1, 1000)",
+        [],
+    )
+    .expect("INSERT with source_type='note' into dirty_sources should succeed");
+}
+
+#[test]
+fn test_migration_024_preserves_existing_data() {
+    // Run migrations up to 023 only, insert data, then apply 024
+    // Migration 024 is at index 23 (0-based). Use hardcoded index so adding
+    // later migrations doesn't silently shift what this test exercises.
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+
+    // Apply migrations 001-023 (indices 0..23)
+    run_migrations_up_to(&conn, 23);
+
+    let pid = insert_test_project(&conn);
+
+    // Insert a document with existing source_type
+    conn.execute(
+        "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash, title) \
+         VALUES ('issue', 1, ?1, 'issue content', 'hash-issue', 'Test Issue')",
+        [pid],
+    )
+    .unwrap();
+    let doc_id: i64 = conn.last_insert_rowid();
+
+    // Insert junction data
+    conn.execute(
+        "INSERT INTO document_labels (document_id, label_name) VALUES (?1, 'bug')",
+        [doc_id],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO document_paths (document_id, path) VALUES (?1, 'src/main.rs')",
+        [doc_id],
+    )
+    .unwrap();
+
+    // Insert dirty_sources row
+    conn.execute(
+        "INSERT INTO dirty_sources (source_type, source_id, queued_at) VALUES ('issue', 1, 1000)",
+        [],
+    )
+    .unwrap();
+
+    // Now apply migration 024 (index 23) -- the table-rebuild migration
+    run_single_migration(&conn, 23);
+
+    // Verify document still exists with correct data
+    let (st, content, title): (String, String, String) = conn
+        .query_row(
+            "SELECT source_type, content_text, title FROM documents WHERE id = ?1",
+            [doc_id],
+            |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
+        )
+        .unwrap();
+    assert_eq!(st, "issue");
+    assert_eq!(content, "issue content");
+    assert_eq!(title, "Test Issue");
+
+    // Verify junction data preserved
+    let label_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM document_labels WHERE document_id = ?1",
+            [doc_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(label_count, 1);
+
+    let path_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM document_paths WHERE document_id = ?1",
+            [doc_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(path_count, 1);
+
+    // Verify dirty_sources preserved
+    let dirty_count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM dirty_sources", [], |row| row.get(0))
+        .unwrap();
+    assert_eq!(dirty_count, 1);
+}
+
+#[test]
+fn test_migration_024_fts_triggers_intact() {
+    let conn = setup_migrated_db();
+    let pid = insert_test_project(&conn);
+
+    // Insert a document after migration -- FTS trigger should fire
+    let doc_id = insert_test_document(&conn, "note", 1, pid);
+
+    // Verify FTS entry exists
+    let fts_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'test'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert!(fts_count > 0, "FTS trigger should have created an entry");
+
+    // Verify update trigger works
+    conn.execute(
+        "UPDATE documents SET content_text = 'updated content' WHERE id = ?1",
+        [doc_id],
+    )
+    .unwrap();
+
+    let fts_updated: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'updated'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert!(
+        fts_updated > 0,
+        "FTS update trigger should reflect new content"
+    );
+
+    // Verify delete trigger works
+    conn.execute("DELETE FROM documents WHERE id = ?1", [doc_id])
+        .unwrap();
+
+    let fts_after_delete: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents_fts WHERE documents_fts MATCH 'updated'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        fts_after_delete, 0,
+        "FTS delete trigger should remove the entry"
+    );
+}
+
+#[test]
+fn test_migration_024_row_counts_preserved() {
+    let conn = setup_migrated_db();
+
+    // After full migration, tables should exist and be queryable
+    let doc_count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM documents", [], |row| row.get(0))
+        .unwrap();
+    assert_eq!(doc_count, 0, "Fresh DB should have 0 documents");
+
+    let dirty_count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM dirty_sources", [], |row| row.get(0))
+        .unwrap();
+    assert_eq!(dirty_count, 0, "Fresh DB should have 0 dirty_sources");
+}
+
+#[test]
+fn test_migration_024_integrity_checks_pass() {
+    let conn = setup_migrated_db();
+
+    // PRAGMA integrity_check
+    let integrity: String = conn
+        .query_row("PRAGMA integrity_check", [], |row| row.get(0))
+        .unwrap();
+    assert_eq!(integrity, "ok", "Database integrity check should pass");
+
+    // PRAGMA foreign_key_check (returns rows only if there are violations)
+    let fk_violations: i64 = conn
+        .query_row("SELECT COUNT(*) FROM pragma_foreign_key_check", [], |row| {
+            row.get(0)
+        })
+        .unwrap();
+    assert_eq!(fk_violations, 0, "No foreign key violations should exist");
+}
+
+#[test]
+fn test_migration_024_note_delete_trigger_cleans_document() {
+    let conn = setup_migrated_db();
+    let pid = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, pid);
+    let disc_id = insert_test_discussion(&conn, pid, issue_id);
+    let note_id = insert_test_note(&conn, 200, disc_id, pid, false);
+
+    // Create a document for this note
+    insert_test_document(&conn, "note", note_id, pid);
+
+    let doc_before: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
+            [note_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(doc_before, 1);
+
+    // Delete the note -- trigger should remove the document
+    conn.execute("DELETE FROM notes WHERE id = ?1", [note_id])
+        .unwrap();
+
+    let doc_after: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
+            [note_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        doc_after, 0,
+        "notes_ad_cleanup trigger should delete the document"
+    );
+}
+
+#[test]
+fn test_migration_024_note_system_flip_trigger_cleans_document() {
+    let conn = setup_migrated_db();
+    let pid = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, pid);
+    let disc_id = insert_test_discussion(&conn, pid, issue_id);
+    let note_id = insert_test_note(&conn, 201, disc_id, pid, false);
+
+    // Create a document for this note
+    insert_test_document(&conn, "note", note_id, pid);
+
+    let doc_before: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
+            [note_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(doc_before, 1);
+
+    // Flip is_system from 0 to 1 -- trigger should remove the document
+    conn.execute("UPDATE notes SET is_system = 1 WHERE id = ?1", [note_id])
+        .unwrap();
+
+    let doc_after: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
+            [note_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        doc_after, 0,
+        "notes_au_system_cleanup trigger should delete the document"
+    );
+}
+
+#[test]
+fn test_migration_024_system_note_delete_trigger_does_not_fire() {
+    let conn = setup_migrated_db();
+    let pid = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, pid);
+    let disc_id = insert_test_discussion(&conn, pid, issue_id);
+
+    // Insert a system note (is_system = true)
+    let note_id = insert_test_note(&conn, 202, disc_id, pid, true);
+
+    // Manually insert a document (shouldn't exist for system notes in practice,
+    // but we test the trigger guard)
+    insert_test_document(&conn, "note", note_id, pid);
+
+    let doc_before: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
+            [note_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(doc_before, 1);
+
+    // Delete system note -- trigger has WHEN old.is_system = 0 so it should NOT fire
+    conn.execute("DELETE FROM notes WHERE id = ?1", [note_id])
+        .unwrap();
+
+    let doc_after: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?1",
+            [note_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        doc_after, 1,
+        "notes_ad_cleanup trigger should NOT fire for system notes"
+    );
+}
+
+/// Run migrations only up to version `up_to` (inclusive).
+fn run_migrations_up_to(conn: &Connection, up_to: usize) {
+    conn.execute_batch(
+        "CREATE TABLE IF NOT EXISTS schema_version ( \
+         version INTEGER PRIMARY KEY, applied_at INTEGER NOT NULL, description TEXT);",
+    )
+    .unwrap();
+
+    for (version_str, sql) in &MIGRATIONS[..up_to] {
+        let version: i32 = version_str.parse().unwrap();
+        conn.execute_batch(sql).unwrap();
+        conn.execute(
+            "INSERT OR REPLACE INTO schema_version (version, applied_at, description) \
+             VALUES (?1, strftime('%s', 'now') * 1000, ?2)",
+            rusqlite::params![version, version_str],
+        )
+        .unwrap();
+    }
+}
+
+/// Run a single migration by index (0-based).
+fn run_single_migration(conn: &Connection, index: usize) {
+    let (version_str, sql) = MIGRATIONS[index];
+    let version: i32 = version_str.parse().unwrap();
+    conn.execute_batch(sql).unwrap();
+    conn.execute(
+        "INSERT OR REPLACE INTO schema_version (version, applied_at, description) \
+         VALUES (?1, strftime('%s', 'now') * 1000, ?2)",
+        rusqlite::params![version, version_str],
+    )
+    .unwrap();
+}
+
+#[test]
+fn test_migration_025_backfills_existing_notes() {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    // Run all migrations through 024 (index 0..24)
+    run_migrations_up_to(&conn, 24);
+
+    let pid = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, pid);
+    let disc_id = insert_test_discussion(&conn, pid, issue_id);
+
+    // Insert 5 non-system notes
+    for i in 1..=5 {
+        insert_test_note(&conn, 300 + i, disc_id, pid, false);
+    }
+    // Insert 2 system notes
+    for i in 1..=2 {
+        insert_test_note(&conn, 400 + i, disc_id, pid, true);
+    }
+
+    // Run migration 025
+    run_single_migration(&conn, 24);
+
+    let dirty_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        dirty_count, 5,
+        "Migration 025 should backfill 5 non-system notes"
+    );
+
+    // Verify system notes were not backfilled
+    let system_note_ids: Vec<i64> = {
+        let mut stmt = conn
+            .prepare(
+                "SELECT source_id FROM dirty_sources WHERE source_type = 'note' ORDER BY source_id",
+            )
+            .unwrap();
+        stmt.query_map([], |row| row.get(0))
+            .unwrap()
+            .collect::<std::result::Result<Vec<_>, _>>()
+            .unwrap()
+    };
+    // System note ids should not appear
+    let all_system_note_ids: Vec<i64> = {
+        let mut stmt = conn
+            .prepare("SELECT id FROM notes WHERE is_system = 1 ORDER BY id")
+            .unwrap();
+        stmt.query_map([], |row| row.get(0))
+            .unwrap()
+            .collect::<std::result::Result<Vec<_>, _>>()
+            .unwrap()
+    };
+    for sys_id in &all_system_note_ids {
+        assert!(
+            !system_note_ids.contains(sys_id),
+            "System note id {} should not be in dirty_sources",
+            sys_id
+        );
+    }
+}
+
+#[test]
+fn test_migration_025_idempotent_with_existing_documents() {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations_up_to(&conn, 24);
+
+    let pid = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, pid);
+    let disc_id = insert_test_discussion(&conn, pid, issue_id);
+
+    // Insert 3 non-system notes
+    let note_ids: Vec<i64> = (1..=3)
+        .map(|i| insert_test_note(&conn, 500 + i, disc_id, pid, false))
+        .collect();
+
+    // Create documents for 2 of 3 notes (simulating already-generated docs)
+    insert_test_document(&conn, "note", note_ids[0], pid);
+    insert_test_document(&conn, "note", note_ids[1], pid);
+
+    // Run migration 025
+    run_single_migration(&conn, 24);
+
+    let dirty_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        dirty_count, 1,
+        "Only the note without a document should be backfilled"
+    );
+
+    // Verify the correct note was queued
+    let queued_id: i64 = conn
+        .query_row(
+            "SELECT source_id FROM dirty_sources WHERE source_type = 'note'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(queued_id, note_ids[2]);
+}
+
+#[test]
+fn test_migration_025_skips_notes_already_in_dirty_queue() {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations_up_to(&conn, 24);
+
+    let pid = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, pid);
+    let disc_id = insert_test_discussion(&conn, pid, issue_id);
+
+    // Insert 3 non-system notes
+    let note_ids: Vec<i64> = (1..=3)
+        .map(|i| insert_test_note(&conn, 600 + i, disc_id, pid, false))
+        .collect();
+
+    // Pre-queue one note in dirty_sources
+    conn.execute(
+        "INSERT INTO dirty_sources (source_type, source_id, queued_at) VALUES ('note', ?1, 999)",
+        [note_ids[0]],
+    )
+    .unwrap();
+
+    // Run migration 025
+    run_single_migration(&conn, 24);
+
+    let dirty_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        dirty_count, 3,
+        "All 3 notes should be in dirty_sources (1 pre-existing + 2 new)"
+    );
+
+    // Verify the pre-existing entry preserved its original queued_at
+    let original_queued_at: i64 = conn
+        .query_row(
+            "SELECT queued_at FROM dirty_sources WHERE source_type = 'note' AND source_id = ?1",
+            [note_ids[0]],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        original_queued_at, 999,
+        "ON CONFLICT DO NOTHING should preserve the original queued_at"
+    );
+}
--- a/src/core/file_history.rs
+++ b/src/core/file_history.rs
@@ -0,0 +1,71 @@
+use std::collections::HashSet;
+use std::collections::VecDeque;
+
+use rusqlite::Connection;
+
+use super::error::Result;
+
+/// Resolves a file path through its rename history in `mr_file_changes`.
+///
+/// BFS in both directions: forward (`old_path` -> `new_path`) and backward
+/// (`new_path` -> `old_path`). Returns all equivalent paths including the
+/// original, sorted for determinism. Cycles are detected via a visited set.
+///
+/// `max_hops` limits the BFS depth (distance from the starting path).
+pub fn resolve_rename_chain(
+    conn: &Connection,
+    project_id: i64,
+    path: &str,
+    max_hops: usize,
+) -> Result<Vec<String>> {
+    let mut visited: HashSet<String> = HashSet::new();
+    visited.insert(path.to_string());
+
+    if max_hops == 0 {
+        return Ok(vec![path.to_string()]);
+    }
+
+    let mut queue: VecDeque<(String, usize)> = VecDeque::new();
+    queue.push_back((path.to_string(), 0));
+
+    let forward_sql = "\
+        SELECT DISTINCT mfc.new_path FROM mr_file_changes mfc \
+        WHERE mfc.project_id = ?1 AND mfc.old_path = ?2 AND mfc.change_type = 'renamed'";
+    let backward_sql = "\
+        SELECT DISTINCT mfc.old_path FROM mr_file_changes mfc \
+        WHERE mfc.project_id = ?1 AND mfc.new_path = ?2 AND mfc.change_type = 'renamed'";
+
+    while let Some((current, depth)) = queue.pop_front() {
+        if depth >= max_hops {
+            continue;
+        }
+
+        // Forward: current was the old name -> discover new names
+        let mut fwd_stmt = conn.prepare_cached(forward_sql)?;
+        let forward: Vec<String> = fwd_stmt
+            .query_map(rusqlite::params![project_id, &current], |row| row.get(0))?
+            .filter_map(std::result::Result::ok)
+            .collect();
+
+        // Backward: current was the new name -> discover old names
+        let mut bwd_stmt = conn.prepare_cached(backward_sql)?;
+        let backward: Vec<String> = bwd_stmt
+            .query_map(rusqlite::params![project_id, &current], |row| row.get(0))?
+            .filter_map(std::result::Result::ok)
+            .collect();
+
+        for discovered in forward.into_iter().chain(backward) {
+            if visited.insert(discovered.clone()) {
+                queue.push_back((discovered, depth + 1));
+            }
+        }
+    }
+
+    let mut paths: Vec<String> = visited.into_iter().collect();
+    paths.sort();
+    Ok(paths)
+}
+
+#[cfg(test)]
+#[path = "file_history_tests.rs"]
+mod tests;
--- a/src/core/file_history_tests.rs
+++ b/src/core/file_history_tests.rs
@@ -0,0 +1,274 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup_test_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn seed_project(conn: &Connection) -> i64 {
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
+        [],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (1, 300, 5, 1, 'Rename MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
+        [],
+    )
+    .unwrap();
+
+    1 // project_id
+}
+
+fn insert_rename(conn: &Connection, mr_id: i64, old_path: &str, new_path: &str) {
+    conn.execute(
+        "INSERT INTO mr_file_changes (merge_request_id, project_id, old_path, new_path, change_type)
+         VALUES (?1, 1, ?2, ?3, 'renamed')",
+        rusqlite::params![mr_id, old_path, new_path],
+    )
+    .unwrap();
+}
+
+#[test]
+fn test_no_renames_returns_original_path() {
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    let result = resolve_rename_chain(&conn, project_id, "src/auth.rs", 10).unwrap();
+    assert_eq!(result, ["src/auth.rs"]);
+}
+
+#[test]
+fn test_forward_chain() {
+    // a.rs -> b.rs -> c.rs, starting from a.rs finds all three
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
+
+    // Need a second MR for the next rename
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (2, 301, 6, 1, 'Rename MR 2', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
+        [],
+    )
+    .unwrap();
+    insert_rename(&conn, 2, "src/b.rs", "src/c.rs");
+
+    let mut result = resolve_rename_chain(&conn, project_id, "src/a.rs", 10).unwrap();
+    result.sort();
+    assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
+}
+
+#[test]
+fn test_backward_chain() {
+    // a.rs -> b.rs -> c.rs, starting from c.rs finds all three
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (2, 301, 6, 1, 'Rename MR 2', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
+        [],
+    )
+    .unwrap();
+    insert_rename(&conn, 2, "src/b.rs", "src/c.rs");
+
+    let mut result = resolve_rename_chain(&conn, project_id, "src/c.rs", 10).unwrap();
+    result.sort();
+    assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
+}
+
+#[test]
+fn test_cycle_detection() {
+    // a -> b -> a: terminates without infinite loop
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (2, 301, 6, 1, 'Rename back', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
+        [],
+    )
+    .unwrap();
+    insert_rename(&conn, 2, "src/b.rs", "src/a.rs");
+
+    let mut result = resolve_rename_chain(&conn, project_id, "src/a.rs", 10).unwrap();
+    result.sort();
+    assert_eq!(result, ["src/a.rs", "src/b.rs"]);
+}
+
+#[test]
+fn test_max_hops_zero_returns_original() {
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
+
+    let result = resolve_rename_chain(&conn, project_id, "src/a.rs", 0).unwrap();
+    assert_eq!(result, ["src/a.rs"]);
+}
+
+#[test]
+fn test_max_hops_bounded() {
+    // Chain: a -> b -> c -> d -> e (4 hops)
+    // With max_hops=2, should find exactly {a, b, c} (original + 2 depth levels)
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    let paths = ["src/a.rs", "src/b.rs", "src/c.rs", "src/d.rs", "src/e.rs"];
+    for (i, window) in paths.windows(2).enumerate() {
+        if i > 0 {
+            conn.execute(
+                "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+                 created_at, updated_at, last_seen_at, source_branch, target_branch)
+                 VALUES (?1, ?2, ?3, 1, 'MR', 'merged', ?4, ?5, ?5, 'feat', 'main')",
+                rusqlite::params![
+                    (i + 1) as i64,
+                    (300 + i) as i64,
+                    (5 + i) as i64,
+                    (1000 * (i + 1)) as i64,
+                    (2000 * (i + 1)) as i64,
+                ],
+            )
+            .unwrap();
+        }
+        #[allow(clippy::cast_possible_wrap)]
+        insert_rename(&conn, (i + 1) as i64, window[0], window[1]);
+    }
+
+    let result = resolve_rename_chain(&conn, project_id, "src/a.rs", 2).unwrap();
+    assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
+
+    // Depth 1 should find only {a, b}
+    let result1 = resolve_rename_chain(&conn, project_id, "src/a.rs", 1).unwrap();
+    assert_eq!(result1, ["src/a.rs", "src/b.rs"]);
+}
+
+#[test]
+fn test_diamond_pattern() {
+    // Diamond: a -> b, a -> c, b -> d, c -> d
+    // From a with max_hops=2, should find all four: {a, b, c, d}
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    // MR 1: a -> b
+    insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
+
+    // MR 2: a -> c
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (2, 301, 6, 1, 'MR 2', 'merged', 2000, 3000, 3000, 'feat2', 'main')",
+        [],
+    )
+    .unwrap();
+    insert_rename(&conn, 2, "src/a.rs", "src/c.rs");
+
+    // MR 3: b -> d
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (3, 302, 7, 1, 'MR 3', 'merged', 3000, 4000, 4000, 'feat3', 'main')",
+        [],
+    )
+    .unwrap();
+    insert_rename(&conn, 3, "src/b.rs", "src/d.rs");
+
+    // MR 4: c -> d
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (4, 303, 8, 1, 'MR 4', 'merged', 4000, 5000, 5000, 'feat4', 'main')",
+        [],
+    )
+    .unwrap();
+    insert_rename(&conn, 4, "src/c.rs", "src/d.rs");
+
+    // max_hops=2: a(0) -> {b,c}(1) -> {d}(2) — all four found
+    let result = resolve_rename_chain(&conn, project_id, "src/a.rs", 2).unwrap();
+    assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs", "src/d.rs"]);
+
+    // max_hops=1: a(0) -> {b,c}(1) — d at depth 2 excluded
+    let result1 = resolve_rename_chain(&conn, project_id, "src/a.rs", 1).unwrap();
+    assert_eq!(result1, ["src/a.rs", "src/b.rs", "src/c.rs"]);
+}
+
+#[test]
+fn test_branching_renames() {
+    // a.rs was renamed to b.rs in one MR and c.rs in another
+    let conn = setup_test_db();
+    let project_id = seed_project(&conn);
+
+    insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (2, 301, 6, 1, 'Rename MR 2', 'merged', 3000, 4000, 4000, 'feature2', 'main')",
+        [],
+    )
+    .unwrap();
+    insert_rename(&conn, 2, "src/a.rs", "src/c.rs");
+
+    let mut result = resolve_rename_chain(&conn, project_id, "src/a.rs", 10).unwrap();
+    result.sort();
+    assert_eq!(result, ["src/a.rs", "src/b.rs", "src/c.rs"]);
+}
+
+#[test]
+fn test_different_project_isolation() {
+    // Renames in project 2 should not leak into project 1 queries
+    let conn = setup_test_db();
+    let _project_id = seed_project(&conn);
+
+    // Create project 2
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (2, 200, 'other/repo', 'https://gitlab.example.com/other/repo', 1000, 2000)",
+        [],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (2, 301, 5, 2, 'Other MR', 'merged', 1000, 2000, 2000, 'feat', 'main')",
+        [],
+    )
+    .unwrap();
+
+    // Rename in project 1
+    insert_rename(&conn, 1, "src/a.rs", "src/b.rs");
+
+    // Rename in project 2 (different mr_id and project_id)
+    conn.execute(
+        "INSERT INTO mr_file_changes (merge_request_id, project_id, old_path, new_path, change_type)
+         VALUES (2, 2, 'src/a.rs', 'src/z.rs', 'renamed')",
+        [],
+    )
+    .unwrap();
+
+    // Query project 1 -- should NOT see z.rs
+    let mut result = resolve_rename_chain(&conn, 1, "src/a.rs", 10).unwrap();
+    result.sort();
+    assert_eq!(result, ["src/a.rs", "src/b.rs"]);
+
+    // Query project 2 -- should NOT see b.rs
+    let mut result2 = resolve_rename_chain(&conn, 2, "src/a.rs", 10).unwrap();
+    result2.sort();
+    assert_eq!(result2, ["src/a.rs", "src/z.rs"]);
+}
--- a/src/core/mod.rs
+++ b/src/core/mod.rs
@@ -4,10 +4,12 @@ pub mod db;
 pub mod dependent_queue;
 pub mod error;
 pub mod events_db;
+pub mod file_history;
 pub mod lock;
 pub mod logging;
 pub mod metrics;
 pub mod note_parser;
+pub mod path_resolver;
 pub mod paths;
 pub mod payloads;
 pub mod project;
--- a/src/core/note_parser.rs
+++ b/src/core/note_parser.rs
@@ -22,20 +22,34 @@ pub struct ExtractResult {
    pub parse_failures: usize,
 }

+// GitLab system notes include the entity type word: "mentioned in issue #5"
+// or "mentioned in merge request !730". The word is mandatory in real data,
+// but we also keep the old bare-sigil form as a fallback (no data uses it today,
+// but other GitLab instances might differ).
 static MENTIONED_RE: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(
-        r"mentioned in (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
+        r"mentioned in (?:issue |merge request )?(?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
    )
    .expect("mentioned regex is valid")
 });

 static CLOSED_BY_RE: LazyLock<Regex> = LazyLock::new(|| {
    Regex::new(
-        r"closed by (?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
+        r"closed by (?:issue |merge request )?(?:(?P<project>[\w][\w.\-]*(?:/[\w][\w.\-]*)+))?(?P<sigil>[#!])(?P<iid>\d+)",
    )
    .expect("closed_by regex is valid")
 });

+/// Matches full GitLab URLs like:
+/// `https://gitlab.example.com/group/project/-/issues/123`
+/// `https://gitlab.example.com/group/sub/project/-/merge_requests/456`
+static GITLAB_URL_RE: LazyLock<Regex> = LazyLock::new(|| {
+    Regex::new(
+        r"https?://[^\s/]+/(?P<project>[^\s]+?)/-/(?P<entity_type>issues|merge_requests)/(?P<iid>\d+)",
+    )
+    .expect("gitlab url regex is valid")
+});
+
 pub fn parse_cross_refs(body: &str) -> Vec<ParsedCrossRef> {
    let mut refs = Vec::new();

@@ -54,6 +68,47 @@ pub fn parse_cross_refs(body: &str) -> Vec<ParsedCrossRef> {
    refs
 }

+/// Extract cross-references from GitLab URLs in free-text bodies (descriptions, user notes).
+pub fn parse_url_refs(body: &str) -> Vec<ParsedCrossRef> {
+    let mut refs = Vec::new();
+    let mut seen = std::collections::HashSet::new();
+
+    for caps in GITLAB_URL_RE.captures_iter(body) {
+        let Some(entity_type_raw) = caps.name("entity_type").map(|m| m.as_str()) else {
+            continue;
+        };
+        let Some(iid_str) = caps.name("iid").map(|m| m.as_str()) else {
+            continue;
+        };
+        let Some(project) = caps.name("project").map(|m| m.as_str()) else {
+            continue;
+        };
+        let Ok(iid) = iid_str.parse::<i64>() else {
+            continue;
+        };
+
+        let target_entity_type = match entity_type_raw {
+            "issues" => "issue",
+            "merge_requests" => "merge_request",
+            _ => continue,
+        };
+
+        let key = (target_entity_type, project.to_owned(), iid);
+        if !seen.insert(key) {
+            continue; // deduplicate within same body
+        }
+
+        refs.push(ParsedCrossRef {
+            reference_type: "mentioned".to_owned(),
+            target_entity_type: target_entity_type.to_owned(),
+            target_iid: iid,
+            target_project_path: Some(project.to_owned()),
+        });
+    }
+
+    refs
+}
+
 fn capture_to_cross_ref(
    caps: &regex::Captures<'_>,
    reference_type: &str,
@@ -233,331 +288,189 @@ fn resolve_cross_project_entity(
    resolve_entity_id(conn, project_id, entity_type, iid)
 }

-#[cfg(test)]
-mod tests {
-    use super::*;
+/// Extract cross-references from issue and MR descriptions (GitLab URLs only).
+pub fn extract_refs_from_descriptions(conn: &Connection, project_id: i64) -> Result<ExtractResult> {
+    let mut result = ExtractResult::default();

-    #[test]
-    fn test_parse_mentioned_in_mr() {
-        let refs = parse_cross_refs("mentioned in !567");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].reference_type, "mentioned");
-        assert_eq!(refs[0].target_entity_type, "merge_request");
-        assert_eq!(refs[0].target_iid, 567);
-        assert!(refs[0].target_project_path.is_none());
+    let mut insert_stmt = conn.prepare_cached(
+        "INSERT OR IGNORE INTO entity_references
+         (project_id, source_entity_type, source_entity_id,
+          target_entity_type, target_entity_id,
+          target_project_path, target_entity_iid,
+          reference_type, source_method, created_at)
+         VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, 'description_parse', ?9)",
+    )?;
+
+    let now = now_ms();
+
+    // Issues with descriptions
+    let mut issue_stmt = conn.prepare_cached(
+        "SELECT id, iid, description FROM issues
+         WHERE project_id = ?1 AND description IS NOT NULL AND description != ''",
+    )?;
+    let issues: Vec<(i64, i64, String)> = issue_stmt
+        .query_map([project_id], |row| {
+            Ok((row.get(0)?, row.get(1)?, row.get(2)?))
+        })?
+        .collect::<std::result::Result<Vec<_>, _>>()?;
+
+    for (entity_id, _iid, description) in &issues {
+        insert_url_refs(
+            conn,
+            &mut insert_stmt,
+            &mut result,
+            project_id,
+            "issue",
+            *entity_id,
+            description,
+            now,
+        )?;
    }

-    #[test]
-    fn test_parse_mentioned_in_issue() {
-        let refs = parse_cross_refs("mentioned in #234");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].reference_type, "mentioned");
-        assert_eq!(refs[0].target_entity_type, "issue");
-        assert_eq!(refs[0].target_iid, 234);
-        assert!(refs[0].target_project_path.is_none());
+    // Merge requests with descriptions
+    let mut mr_stmt = conn.prepare_cached(
+        "SELECT id, iid, description FROM merge_requests
+         WHERE project_id = ?1 AND description IS NOT NULL AND description != ''",
+    )?;
+    let mrs: Vec<(i64, i64, String)> = mr_stmt
+        .query_map([project_id], |row| {
+            Ok((row.get(0)?, row.get(1)?, row.get(2)?))
+        })?
+        .collect::<std::result::Result<Vec<_>, _>>()?;
+
+    for (entity_id, _iid, description) in &mrs {
+        insert_url_refs(
+            conn,
+            &mut insert_stmt,
+            &mut result,
+            project_id,
+            "merge_request",
+            *entity_id,
+            description,
+            now,
+        )?;
    }

-    #[test]
-    fn test_parse_mentioned_cross_project() {
-        let refs = parse_cross_refs("mentioned in group/repo!789");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].reference_type, "mentioned");
-        assert_eq!(refs[0].target_entity_type, "merge_request");
-        assert_eq!(refs[0].target_iid, 789);
-        assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
-    }
-
-    #[test]
-    fn test_parse_mentioned_cross_project_issue() {
-        let refs = parse_cross_refs("mentioned in group/repo#123");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].reference_type, "mentioned");
-        assert_eq!(refs[0].target_entity_type, "issue");
-        assert_eq!(refs[0].target_iid, 123);
-        assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
-    }
-
-    #[test]
-    fn test_parse_closed_by_mr() {
-        let refs = parse_cross_refs("closed by !567");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].reference_type, "closes");
-        assert_eq!(refs[0].target_entity_type, "merge_request");
-        assert_eq!(refs[0].target_iid, 567);
-        assert!(refs[0].target_project_path.is_none());
-    }
-
-    #[test]
-    fn test_parse_closed_by_cross_project() {
-        let refs = parse_cross_refs("closed by group/repo!789");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].reference_type, "closes");
-        assert_eq!(refs[0].target_entity_type, "merge_request");
-        assert_eq!(refs[0].target_iid, 789);
-        assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
-    }
-
-    #[test]
-    fn test_parse_multiple_refs() {
-        let refs = parse_cross_refs("mentioned in !123 and mentioned in #456");
-        assert_eq!(refs.len(), 2);
-        assert_eq!(refs[0].target_entity_type, "merge_request");
-        assert_eq!(refs[0].target_iid, 123);
-        assert_eq!(refs[1].target_entity_type, "issue");
-        assert_eq!(refs[1].target_iid, 456);
-    }
-
-    #[test]
-    fn test_parse_no_refs() {
-        let refs = parse_cross_refs("Updated the description");
-        assert!(refs.is_empty());
-    }
-
-    #[test]
-    fn test_parse_non_english_note() {
-        let refs = parse_cross_refs("a ajout\u{00e9} l'\u{00e9}tiquette ~bug");
-        assert!(refs.is_empty());
-    }
-
-    #[test]
-    fn test_parse_multi_level_group_path() {
-        let refs = parse_cross_refs("mentioned in top/sub/project#123");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(
-            refs[0].target_project_path.as_deref(),
-            Some("top/sub/project")
-        );
-        assert_eq!(refs[0].target_iid, 123);
-    }
-
-    #[test]
-    fn test_parse_deeply_nested_group_path() {
-        let refs = parse_cross_refs("mentioned in a/b/c/d/e!42");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].target_project_path.as_deref(), Some("a/b/c/d/e"));
-        assert_eq!(refs[0].target_iid, 42);
-    }
-
-    #[test]
-    fn test_parse_hyphenated_project_path() {
-        let refs = parse_cross_refs("mentioned in my-group/my-project#99");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(
-            refs[0].target_project_path.as_deref(),
-            Some("my-group/my-project")
+    if result.inserted > 0 || result.skipped_unresolvable > 0 {
+        debug!(
+            inserted = result.inserted,
+            unresolvable = result.skipped_unresolvable,
+            "Description cross-reference extraction complete"
        );
    }

-    #[test]
-    fn test_parse_dotted_project_path() {
-        let refs = parse_cross_refs("mentioned in visiostack.io/backend#123");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(
-            refs[0].target_project_path.as_deref(),
-            Some("visiostack.io/backend")
-        );
-        assert_eq!(refs[0].target_iid, 123);
-    }
-
-    #[test]
-    fn test_parse_dotted_nested_project_path() {
-        let refs = parse_cross_refs("closed by my.org/sub.group/my.project!42");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(
-            refs[0].target_project_path.as_deref(),
-            Some("my.org/sub.group/my.project")
-        );
-        assert_eq!(refs[0].target_entity_type, "merge_request");
-        assert_eq!(refs[0].target_iid, 42);
-    }
-
-    #[test]
-    fn test_parse_self_reference_is_valid() {
-        let refs = parse_cross_refs("mentioned in #123");
-        assert_eq!(refs.len(), 1);
-        assert_eq!(refs[0].target_iid, 123);
-    }
-
-    #[test]
-    fn test_parse_mixed_mentioned_and_closed() {
-        let refs = parse_cross_refs("mentioned in !10 and closed by !20");
-        assert_eq!(refs.len(), 2);
-        assert_eq!(refs[0].reference_type, "mentioned");
-        assert_eq!(refs[0].target_iid, 10);
-        assert_eq!(refs[1].reference_type, "closes");
-        assert_eq!(refs[1].target_iid, 20);
-    }
-
-    fn setup_test_db() -> Connection {
-        use crate::core::db::{create_connection, run_migrations};
-
-        let conn = create_connection(std::path::Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-        conn
-    }
-
-    fn seed_test_data(conn: &Connection) -> i64 {
-        let now = now_ms();
-
-        conn.execute(
-            "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
-             VALUES (1, 100, 'group/test-project', 'https://gitlab.com/group/test-project', ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
-             VALUES (10, 1000, 1, 123, 'Test Issue', 'opened', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
-             VALUES (11, 1001, 1, 456, 'Another Issue', 'opened', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
-             VALUES (20, 2000, 1, 789, 'Test MR', 'opened', 'feat', 'main', 'dev', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
-             VALUES (30, 'disc-aaa', 1, 10, 'Issue', ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO discussions (id, gitlab_discussion_id, project_id, merge_request_id, noteable_type, last_seen_at)
-             VALUES (31, 'disc-bbb', 1, 20, 'MergeRequest', ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
-             VALUES (40, 4000, 30, 1, 1, 'mentioned in !789', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
-             VALUES (41, 4001, 31, 1, 1, 'mentioned in #456', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
-             VALUES (42, 4002, 30, 1, 0, 'mentioned in !999', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
-             VALUES (43, 4003, 30, 1, 1, 'added label ~bug', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
-             VALUES (44, 4004, 30, 1, 1, 'mentioned in other/project#999', ?1, ?1, ?1)",
-            [now],
-        )
-        .unwrap();
-
-        1
-    }
-
-    #[test]
-    fn test_extract_refs_from_system_notes_integration() {
-        let conn = setup_test_db();
-        let project_id = seed_test_data(&conn);
-
-        let result = extract_refs_from_system_notes(&conn, project_id).unwrap();
-
-        assert_eq!(result.inserted, 2, "Two same-project refs should resolve");
-        assert_eq!(
-            result.skipped_unresolvable, 1,
-            "One cross-project ref should be unresolvable"
-        );
-        assert_eq!(
-            result.parse_failures, 1,
-            "One system note has no cross-ref pattern"
-        );
-
-        let ref_count: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM entity_references WHERE project_id = ?1 AND source_method = 'note_parse'",
-                [project_id],
-                |row| row.get(0),
-            )
-            .unwrap();
-        assert_eq!(ref_count, 3, "Should have 3 entity_references rows total");
-
-        let unresolved_count: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM entity_references WHERE target_entity_id IS NULL AND source_method = 'note_parse'",
-                [],
-                |row| row.get(0),
-            )
-            .unwrap();
-        assert_eq!(
-            unresolved_count, 1,
-            "Should have 1 unresolved cross-project ref"
-        );
-
-        let (path, iid): (String, i64) = conn
-            .query_row(
-                "SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
-                [],
-                |row| Ok((row.get(0)?, row.get(1)?)),
-            )
-            .unwrap();
-        assert_eq!(path, "other/project");
-        assert_eq!(iid, 999);
-    }
-
-    #[test]
-    fn test_extract_refs_idempotent() {
-        let conn = setup_test_db();
-        let project_id = seed_test_data(&conn);
-
-        let result1 = extract_refs_from_system_notes(&conn, project_id).unwrap();
-        let result2 = extract_refs_from_system_notes(&conn, project_id).unwrap();
-
-        assert_eq!(result2.inserted, 0);
-        assert_eq!(result2.skipped_unresolvable, 0);
-
-        let total: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM entity_references WHERE source_method = 'note_parse'",
-                [],
-                |row| row.get(0),
-            )
-            .unwrap();
-        assert_eq!(
-            total,
-            (result1.inserted + result1.skipped_unresolvable) as i64
-        );
-    }
-
-    #[test]
-    fn test_extract_refs_empty_project() {
-        let conn = setup_test_db();
-        let result = extract_refs_from_system_notes(&conn, 999).unwrap();
-        assert_eq!(result.inserted, 0);
-        assert_eq!(result.skipped_unresolvable, 0);
-        assert_eq!(result.parse_failures, 0);
-    }
+    Ok(result)
 }
+
+/// Extract cross-references from user (non-system) notes (GitLab URLs only).
+pub fn extract_refs_from_user_notes(conn: &Connection, project_id: i64) -> Result<ExtractResult> {
+    let mut result = ExtractResult::default();
+
+    let mut note_stmt = conn.prepare_cached(
+        "SELECT n.id, n.body, d.noteable_type,
+                COALESCE(d.issue_id, d.merge_request_id) AS entity_id
+         FROM notes n
+         JOIN discussions d ON n.discussion_id = d.id
+         WHERE n.is_system = 0
+           AND n.project_id = ?1
+           AND n.body IS NOT NULL",
+    )?;
+
+    let notes: Vec<(i64, String, String, i64)> = note_stmt
+        .query_map([project_id], |row| {
+            Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?))
+        })?
+        .collect::<std::result::Result<Vec<_>, _>>()?;
+
+    if notes.is_empty() {
+        return Ok(result);
+    }
+
+    let mut insert_stmt = conn.prepare_cached(
+        "INSERT OR IGNORE INTO entity_references
+         (project_id, source_entity_type, source_entity_id,
+          target_entity_type, target_entity_id,
+          target_project_path, target_entity_iid,
+          reference_type, source_method, created_at)
+         VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, 'note_parse', ?9)",
+    )?;
+
+    let now = now_ms();
+
+    for (_, body, noteable_type, entity_id) in &notes {
+        let source_entity_type = noteable_type_to_entity_type(noteable_type);
+        insert_url_refs(
+            conn,
+            &mut insert_stmt,
+            &mut result,
+            project_id,
+            source_entity_type,
+            *entity_id,
+            body,
+            now,
+        )?;
+    }
+
+    if result.inserted > 0 || result.skipped_unresolvable > 0 {
+        debug!(
+            inserted = result.inserted,
+            unresolvable = result.skipped_unresolvable,
+            "User note cross-reference extraction complete"
+        );
+    }
+
+    Ok(result)
+}
+
+/// Shared helper: parse URL refs from a body and insert into entity_references.
+#[allow(clippy::too_many_arguments)]
+fn insert_url_refs(
+    conn: &Connection,
+    insert_stmt: &mut rusqlite::CachedStatement<'_>,
+    result: &mut ExtractResult,
+    project_id: i64,
+    source_entity_type: &str,
+    source_entity_id: i64,
+    body: &str,
+    now: i64,
+) -> Result<()> {
+    let url_refs = parse_url_refs(body);
+
+    for xref in &url_refs {
+        let target_entity_id = if let Some(ref path) = xref.target_project_path {
+            resolve_cross_project_entity(conn, path, &xref.target_entity_type, xref.target_iid)
+        } else {
+            resolve_entity_id(conn, project_id, &xref.target_entity_type, xref.target_iid)
+        };
+
+        let rows_changed = insert_stmt.execute(rusqlite::params![
+            project_id,
+            source_entity_type,
+            source_entity_id,
+            xref.target_entity_type,
+            target_entity_id,
+            xref.target_project_path,
+            if target_entity_id.is_none() {
+                Some(xref.target_iid)
+            } else {
+                None
+            },
+            xref.reference_type,
+            now,
+        ])?;
+
+        if rows_changed > 0 {
+            if target_entity_id.is_none() {
+                result.skipped_unresolvable += 1;
+            } else {
+                result.inserted += 1;
+            }
+        }
+    }
+
+    Ok(())
+}
+
+#[cfg(test)]
+#[path = "note_parser_tests.rs"]
+mod tests;
--- a/src/core/note_parser_tests.rs
+++ b/src/core/note_parser_tests.rs
@@ -0,0 +1,770 @@
+use super::*;
+
+// --- parse_cross_refs: real GitLab system note format ---
+
+#[test]
+fn test_parse_mentioned_in_mr() {
+    let refs = parse_cross_refs("mentioned in merge request !567");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].reference_type, "mentioned");
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 567);
+    assert!(refs[0].target_project_path.is_none());
+}
+
+#[test]
+fn test_parse_mentioned_in_issue() {
+    let refs = parse_cross_refs("mentioned in issue #234");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].reference_type, "mentioned");
+    assert_eq!(refs[0].target_entity_type, "issue");
+    assert_eq!(refs[0].target_iid, 234);
+    assert!(refs[0].target_project_path.is_none());
+}
+
+#[test]
+fn test_parse_mentioned_cross_project() {
+    let refs = parse_cross_refs("mentioned in merge request group/repo!789");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].reference_type, "mentioned");
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 789);
+    assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
+}
+
+#[test]
+fn test_parse_mentioned_cross_project_issue() {
+    let refs = parse_cross_refs("mentioned in issue group/repo#123");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].reference_type, "mentioned");
+    assert_eq!(refs[0].target_entity_type, "issue");
+    assert_eq!(refs[0].target_iid, 123);
+    assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
+}
+
+#[test]
+fn test_parse_closed_by_mr() {
+    let refs = parse_cross_refs("closed by merge request !567");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].reference_type, "closes");
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 567);
+    assert!(refs[0].target_project_path.is_none());
+}
+
+#[test]
+fn test_parse_closed_by_cross_project() {
+    let refs = parse_cross_refs("closed by merge request group/repo!789");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].reference_type, "closes");
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 789);
+    assert_eq!(refs[0].target_project_path.as_deref(), Some("group/repo"));
+}
+
+#[test]
+fn test_parse_multiple_refs() {
+    let refs = parse_cross_refs("mentioned in merge request !123 and mentioned in issue #456");
+    assert_eq!(refs.len(), 2);
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 123);
+    assert_eq!(refs[1].target_entity_type, "issue");
+    assert_eq!(refs[1].target_iid, 456);
+}
+
+#[test]
+fn test_parse_no_refs() {
+    let refs = parse_cross_refs("Updated the description");
+    assert!(refs.is_empty());
+}
+
+#[test]
+fn test_parse_non_english_note() {
+    let refs = parse_cross_refs("a ajout\u{00e9} l'\u{00e9}tiquette ~bug");
+    assert!(refs.is_empty());
+}
+
+#[test]
+fn test_parse_multi_level_group_path() {
+    let refs = parse_cross_refs("mentioned in issue top/sub/project#123");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("top/sub/project")
+    );
+    assert_eq!(refs[0].target_iid, 123);
+}
+
+#[test]
+fn test_parse_deeply_nested_group_path() {
+    let refs = parse_cross_refs("mentioned in merge request a/b/c/d/e!42");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].target_project_path.as_deref(), Some("a/b/c/d/e"));
+    assert_eq!(refs[0].target_iid, 42);
+}
+
+#[test]
+fn test_parse_hyphenated_project_path() {
+    let refs = parse_cross_refs("mentioned in issue my-group/my-project#99");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("my-group/my-project")
+    );
+}
+
+#[test]
+fn test_parse_dotted_project_path() {
+    let refs = parse_cross_refs("mentioned in issue visiostack.io/backend#123");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("visiostack.io/backend")
+    );
+    assert_eq!(refs[0].target_iid, 123);
+}
+
+#[test]
+fn test_parse_dotted_nested_project_path() {
+    let refs = parse_cross_refs("closed by merge request my.org/sub.group/my.project!42");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("my.org/sub.group/my.project")
+    );
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 42);
+}
+
+// Bare-sigil fallback (no "issue"/"merge request" word) still works
+#[test]
+fn test_parse_bare_sigil_fallback() {
+    let refs = parse_cross_refs("mentioned in #123");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].target_iid, 123);
+    assert_eq!(refs[0].target_entity_type, "issue");
+}
+
+#[test]
+fn test_parse_bare_sigil_closed_by() {
+    let refs = parse_cross_refs("closed by !567");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].reference_type, "closes");
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 567);
+}
+
+#[test]
+fn test_parse_mixed_mentioned_and_closed() {
+    let refs = parse_cross_refs("mentioned in merge request !10 and closed by merge request !20");
+    assert_eq!(refs.len(), 2);
+    assert_eq!(refs[0].reference_type, "mentioned");
+    assert_eq!(refs[0].target_iid, 10);
+    assert_eq!(refs[1].reference_type, "closes");
+    assert_eq!(refs[1].target_iid, 20);
+}
+
+// --- parse_url_refs ---
+
+#[test]
+fn test_url_ref_same_project_issue() {
+    let refs = parse_url_refs(
+        "See https://gitlab.visiostack.com/vs/typescript-code/-/issues/3537 for details",
+    );
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].target_entity_type, "issue");
+    assert_eq!(refs[0].target_iid, 3537);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("vs/typescript-code")
+    );
+    assert_eq!(refs[0].reference_type, "mentioned");
+}
+
+#[test]
+fn test_url_ref_merge_request() {
+    let refs =
+        parse_url_refs("https://gitlab.visiostack.com/vs/typescript-code/-/merge_requests/3548");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 3548);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("vs/typescript-code")
+    );
+}
+
+#[test]
+fn test_url_ref_cross_project() {
+    let refs = parse_url_refs(
+        "Related: https://gitlab.visiostack.com/vs/python-code/-/merge_requests/5203",
+    );
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 5203);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("vs/python-code")
+    );
+}
+
+#[test]
+fn test_url_ref_with_anchor() {
+    let refs =
+        parse_url_refs("https://gitlab.visiostack.com/vs/typescript-code/-/issues/123#note_456");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].target_entity_type, "issue");
+    assert_eq!(refs[0].target_iid, 123);
+}
+
+#[test]
+fn test_url_ref_markdown_link() {
+    let refs = parse_url_refs(
+        "Check [this MR](https://gitlab.visiostack.com/vs/typescript-code/-/merge_requests/100) for context",
+    );
+    assert_eq!(refs.len(), 1);
+    assert_eq!(refs[0].target_entity_type, "merge_request");
+    assert_eq!(refs[0].target_iid, 100);
+}
+
+#[test]
+fn test_url_ref_multiple_urls() {
+    let body =
+        "See https://gitlab.com/a/b/-/issues/1 and https://gitlab.com/a/b/-/merge_requests/2";
+    let refs = parse_url_refs(body);
+    assert_eq!(refs.len(), 2);
+    assert_eq!(refs[0].target_entity_type, "issue");
+    assert_eq!(refs[0].target_iid, 1);
+    assert_eq!(refs[1].target_entity_type, "merge_request");
+    assert_eq!(refs[1].target_iid, 2);
+}
+
+#[test]
+fn test_url_ref_deduplicates() {
+    let body = "See https://gitlab.com/a/b/-/issues/1 and again https://gitlab.com/a/b/-/issues/1";
+    let refs = parse_url_refs(body);
+    assert_eq!(
+        refs.len(),
+        1,
+        "Duplicate URLs in same body should be deduplicated"
+    );
+}
+
+#[test]
+fn test_url_ref_non_gitlab_urls_ignored() {
+    let refs = parse_url_refs(
+        "Check https://google.com/search?q=test and https://github.com/org/repo/issues/1",
+    );
+    assert!(refs.is_empty());
+}
+
+#[test]
+fn test_url_ref_deeply_nested_project() {
+    let refs = parse_url_refs("https://gitlab.com/org/sub/deep/project/-/issues/42");
+    assert_eq!(refs.len(), 1);
+    assert_eq!(
+        refs[0].target_project_path.as_deref(),
+        Some("org/sub/deep/project")
+    );
+    assert_eq!(refs[0].target_iid, 42);
+}
+
+// --- Integration tests: system notes (updated for real format) ---
+
+fn setup_test_db() -> Connection {
+    use crate::core::db::{create_connection, run_migrations};
+
+    let conn = create_connection(std::path::Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn seed_test_data(conn: &Connection) -> i64 {
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'group/test-project', 'https://gitlab.com/group/test-project', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 123, 'Test Issue', 'opened', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
+         VALUES (11, 1001, 1, 456, 'Another Issue', 'opened', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
+         VALUES (20, 2000, 1, 789, 'Test MR', 'opened', 'feat', 'main', 'dev', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
+         VALUES (30, 'disc-aaa', 1, 10, 'Issue', ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, merge_request_id, noteable_type, last_seen_at)
+         VALUES (31, 'disc-bbb', 1, 20, 'MergeRequest', ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // System note: real GitLab format "mentioned in merge request !789"
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (40, 4000, 30, 1, 1, 'mentioned in merge request !789', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // System note: real GitLab format "mentioned in issue #456"
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (41, 4001, 31, 1, 1, 'mentioned in issue #456', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // User note (is_system=0) — should NOT be processed by system note extractor
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (42, 4002, 30, 1, 0, 'mentioned in merge request !999', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // System note with no cross-ref pattern
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (43, 4003, 30, 1, 1, 'added label ~bug', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // System note: cross-project ref
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (44, 4004, 30, 1, 1, 'mentioned in issue other/project#999', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    1
+}
+
+#[test]
+fn test_extract_refs_from_system_notes_integration() {
+    let conn = setup_test_db();
+    let project_id = seed_test_data(&conn);
+
+    let result = extract_refs_from_system_notes(&conn, project_id).unwrap();
+
+    assert_eq!(result.inserted, 2, "Two same-project refs should resolve");
+    assert_eq!(
+        result.skipped_unresolvable, 1,
+        "One cross-project ref should be unresolvable"
+    );
+    assert_eq!(
+        result.parse_failures, 1,
+        "One system note has no cross-ref pattern"
+    );
+
+    let ref_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM entity_references WHERE project_id = ?1 AND source_method = 'note_parse'",
+            [project_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(ref_count, 3, "Should have 3 entity_references rows total");
+
+    let unresolved_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM entity_references WHERE target_entity_id IS NULL AND source_method = 'note_parse'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        unresolved_count, 1,
+        "Should have 1 unresolved cross-project ref"
+    );
+
+    let (path, iid): (String, i64) = conn
+        .query_row(
+            "SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
+            [],
+            |row| Ok((row.get(0)?, row.get(1)?)),
+        )
+        .unwrap();
+    assert_eq!(path, "other/project");
+    assert_eq!(iid, 999);
+}
+
+#[test]
+fn test_extract_refs_idempotent() {
+    let conn = setup_test_db();
+    let project_id = seed_test_data(&conn);
+
+    let result1 = extract_refs_from_system_notes(&conn, project_id).unwrap();
+    let result2 = extract_refs_from_system_notes(&conn, project_id).unwrap();
+
+    assert_eq!(result2.inserted, 0);
+    assert_eq!(result2.skipped_unresolvable, 0);
+
+    let total: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM entity_references WHERE source_method = 'note_parse'",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        total,
+        (result1.inserted + result1.skipped_unresolvable) as i64
+    );
+}
+
+#[test]
+fn test_extract_refs_empty_project() {
+    let conn = setup_test_db();
+    let result = extract_refs_from_system_notes(&conn, 999).unwrap();
+    assert_eq!(result.inserted, 0);
+    assert_eq!(result.skipped_unresolvable, 0);
+    assert_eq!(result.parse_failures, 0);
+}
+
+// --- Integration tests: description extraction ---
+
+#[test]
+fn test_extract_refs_from_descriptions_issue() {
+    let conn = setup_test_db();
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // Issue with MR reference in description
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 3537, 'Test Issue', 'opened',
+                 'Related to https://gitlab.com/vs/typescript-code/-/merge_requests/3548',
+                 ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // The target MR so it resolves
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
+         VALUES (20, 2000, 1, 3548, 'Fix MR', 'merged', 'fix', 'main', 'dev', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    let result = extract_refs_from_descriptions(&conn, 1).unwrap();
+
+    assert_eq!(result.inserted, 1, "Should insert 1 description ref");
+    assert_eq!(result.skipped_unresolvable, 0);
+
+    let method: String = conn
+        .query_row(
+            "SELECT source_method FROM entity_references WHERE project_id = 1",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(method, "description_parse");
+}
+
+#[test]
+fn test_extract_refs_from_descriptions_mr() {
+    let conn = setup_test_db();
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 100, 'Target Issue', 'opened', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, description, created_at, updated_at, last_seen_at)
+         VALUES (20, 2000, 1, 200, 'Fixing MR', 'merged', 'fix', 'main', 'dev',
+                 'Fixes https://gitlab.com/vs/typescript-code/-/issues/100',
+                 ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    let result = extract_refs_from_descriptions(&conn, 1).unwrap();
+
+    assert_eq!(result.inserted, 1);
+
+    let (src_type, tgt_type): (String, String) = conn
+        .query_row(
+            "SELECT source_entity_type, target_entity_type FROM entity_references WHERE project_id = 1",
+            [],
+            |row| Ok((row.get(0)?, row.get(1)?)),
+        )
+        .unwrap();
+    assert_eq!(src_type, "merge_request");
+    assert_eq!(tgt_type, "issue");
+}
+
+#[test]
+fn test_extract_refs_from_descriptions_idempotent() {
+    let conn = setup_test_db();
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 1, 'Issue', 'opened',
+                 'See https://gitlab.com/vs/code/-/merge_requests/2', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
+         VALUES (20, 2000, 1, 2, 'MR', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    let r1 = extract_refs_from_descriptions(&conn, 1).unwrap();
+    assert_eq!(r1.inserted, 1);
+
+    let r2 = extract_refs_from_descriptions(&conn, 1).unwrap();
+    assert_eq!(r2.inserted, 0, "Second run should insert 0 (idempotent)");
+}
+
+#[test]
+fn test_extract_refs_from_descriptions_cross_project_unresolved() {
+    let conn = setup_test_db();
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'vs/typescript-code', 'https://gitlab.com/vs/typescript-code', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, description, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 1, 'Issue', 'opened',
+                 'See https://gitlab.com/vs/other-project/-/merge_requests/99', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    let result = extract_refs_from_descriptions(&conn, 1).unwrap();
+
+    assert_eq!(result.inserted, 0);
+    assert_eq!(
+        result.skipped_unresolvable, 1,
+        "Cross-project ref with no matching project should be unresolvable"
+    );
+
+    let (path, iid): (String, i64) = conn
+        .query_row(
+            "SELECT target_project_path, target_entity_iid FROM entity_references WHERE target_entity_id IS NULL",
+            [],
+            |row| Ok((row.get(0)?, row.get(1)?)),
+        )
+        .unwrap();
+    assert_eq!(path, "vs/other-project");
+    assert_eq!(iid, 99);
+}
+
+// --- Integration tests: user note extraction ---
+
+#[test]
+fn test_extract_refs_from_user_notes_with_url() {
+    let conn = setup_test_db();
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 50, 'Source Issue', 'opened', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
+         VALUES (20, 2000, 1, 60, 'Target MR', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
+         VALUES (30, 'disc-user', 1, 10, 'Issue', ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // User note with a URL
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (40, 4000, 30, 1, 0,
+                 'This is related to https://gitlab.com/vs/code/-/merge_requests/60',
+                 ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    let result = extract_refs_from_user_notes(&conn, 1).unwrap();
+
+    assert_eq!(result.inserted, 1);
+
+    let method: String = conn
+        .query_row(
+            "SELECT source_method FROM entity_references WHERE project_id = 1",
+            [],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(method, "note_parse");
+}
+
+#[test]
+fn test_extract_refs_from_user_notes_no_system_note_patterns() {
+    let conn = setup_test_db();
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 50, 'Source', 'opened', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
+         VALUES (20, 2000, 1, 999, 'Target', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
+         VALUES (30, 'disc-x', 1, 10, 'Issue', ?1)",
+        [now],
+    )
+    .unwrap();
+
+    // User note with system-note-like text but no URL — should NOT extract
+    // (user notes only use URL parsing, not system note pattern matching)
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (40, 4000, 30, 1, 0, 'mentioned in merge request !999', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    let result = extract_refs_from_user_notes(&conn, 1).unwrap();
+
+    assert_eq!(
+        result.inserted, 0,
+        "User notes should only parse URLs, not system note patterns"
+    );
+}
+
+#[test]
+fn test_extract_refs_from_user_notes_idempotent() {
+    let conn = setup_test_db();
+    let now = now_ms();
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'vs/code', 'https://gitlab.com/vs/code', ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at)
+         VALUES (10, 1000, 1, 1, 'Src', 'opened', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, project_id, iid, title, state, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at)
+         VALUES (20, 2000, 1, 2, 'Tgt', 'opened', 'x', 'main', 'dev', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at)
+         VALUES (30, 'disc-y', 1, 10, 'Issue', ?1)",
+        [now],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, is_system, body, created_at, updated_at, last_seen_at)
+         VALUES (40, 4000, 30, 1, 0,
+                 'See https://gitlab.com/vs/code/-/merge_requests/2', ?1, ?1, ?1)",
+        [now],
+    )
+    .unwrap();
+
+    let r1 = extract_refs_from_user_notes(&conn, 1).unwrap();
+    assert_eq!(r1.inserted, 1);
+
+    let r2 = extract_refs_from_user_notes(&conn, 1).unwrap();
+    assert_eq!(r2.inserted, 0, "Second extraction should be idempotent");
+}
--- a/src/core/path_resolver.rs
+++ b/src/core/path_resolver.rs
@@ -0,0 +1,244 @@
+use rusqlite::Connection;
+
+use super::error::{LoreError, Result};
+
+// ─── SQL Helpers ─────────────────────────────────────────────────────────────
+
+/// Escape LIKE metacharacters (`%`, `_`, `\`).
+/// All queries using this must include `ESCAPE '\'`.
+pub fn escape_like(input: &str) -> String {
+    input
+        .replace('\\', "\\\\")
+        .replace('%', "\\%")
+        .replace('_', "\\_")
+}
+
+/// Normalize user-supplied repo paths to match stored DiffNote / file-change paths.
+/// - trims whitespace
+/// - strips leading "./" and "/" (repo-relative paths)
+/// - converts '\' to '/' when no '/' present (Windows paste)
+/// - collapses repeated "//"
+pub fn normalize_repo_path(input: &str) -> String {
+    let mut s = input.trim().to_string();
+    // Windows backslash normalization (only when no forward slashes present)
+    if s.contains('\\') && !s.contains('/') {
+        s = s.replace('\\', "/");
+    }
+    // Strip leading ./
+    while s.starts_with("./") {
+        s = s[2..].to_string();
+    }
+    // Strip leading /
+    s = s.trim_start_matches('/').to_string();
+    // Collapse repeated //
+    while s.contains("//") {
+        s = s.replace("//", "/");
+    }
+    s
+}
+
+// ─── Path Query Resolution ──────────────────────────────────────────────────
+
+/// Describes how to match a user-supplied path in SQL.
+#[derive(Debug)]
+pub struct PathQuery {
+    /// The parameter value to bind.
+    pub value: String,
+    /// If true: use `LIKE value ESCAPE '\'`. If false: use `= value`.
+    pub is_prefix: bool,
+}
+
+/// Result of a suffix probe against the DB.
+pub enum SuffixResult {
+    /// Suffix probe was not attempted (conditions not met).
+    NotAttempted,
+    /// No paths matched the suffix.
+    NoMatch,
+    /// Exactly one distinct path matched — auto-resolve.
+    Unique(String),
+    /// Multiple distinct paths matched — user must disambiguate.
+    Ambiguous(Vec<String>),
+}
+
+/// Build a path query from a user-supplied path, with project-scoped DB probes.
+///
+/// Resolution strategy (in priority order):
+/// 1. Trailing `/` → directory prefix (LIKE `path/%`)
+/// 2. Exact match probe against notes + `mr_file_changes` → exact (= `path`)
+/// 3. Directory prefix probe → prefix (LIKE `path/%`)
+/// 4. Suffix probe for bare filenames → auto-resolve or ambiguity error
+/// 5. Heuristic fallback: `.` in last segment → file, else → directory prefix
+pub fn build_path_query(
+    conn: &Connection,
+    path: &str,
+    project_id: Option<i64>,
+) -> Result<PathQuery> {
+    let trimmed = path.trim_end_matches('/');
+    let last_segment = trimmed.rsplit('/').next().unwrap_or(trimmed);
+    let is_root = !trimmed.contains('/');
+    let forced_dir = path.ends_with('/');
+    // Heuristic is now only a fallback; probes decide first when ambiguous.
+    let looks_like_file = !forced_dir && (is_root || last_segment.contains('.'));
+
+    // Probe 1: exact file exists in DiffNotes OR mr_file_changes (project-scoped)
+    // Checks both new_path and old_path to support querying renamed files.
+    let exact_exists = conn
+        .query_row(
+            "SELECT 1 FROM notes INDEXED BY idx_notes_diffnote_path_created
+         WHERE note_type = 'DiffNote'
+           AND is_system = 0
+           AND (position_new_path = ?1 OR position_old_path = ?1)
+           AND (?2 IS NULL OR project_id = ?2)
+         LIMIT 1",
+            rusqlite::params![trimmed, project_id],
+            |_| Ok(()),
+        )
+        .is_ok()
+        || conn
+            .query_row(
+                "SELECT 1 FROM mr_file_changes
+             WHERE (new_path = ?1 OR old_path = ?1)
+               AND (?2 IS NULL OR project_id = ?2)
+             LIMIT 1",
+                rusqlite::params![trimmed, project_id],
+                |_| Ok(()),
+            )
+            .is_ok();
+
+    // Probe 2: directory prefix exists in DiffNotes OR mr_file_changes (project-scoped)
+    let prefix_exists = if !forced_dir && !exact_exists {
+        let escaped = escape_like(trimmed);
+        let pat = format!("{escaped}/%");
+        conn.query_row(
+            "SELECT 1 FROM notes INDEXED BY idx_notes_diffnote_path_created
+             WHERE note_type = 'DiffNote'
+               AND is_system = 0
+               AND (position_new_path LIKE ?1 ESCAPE '\\' OR position_old_path LIKE ?1 ESCAPE '\\')
+               AND (?2 IS NULL OR project_id = ?2)
+             LIMIT 1",
+            rusqlite::params![pat, project_id],
+            |_| Ok(()),
+        )
+        .is_ok()
+            || conn
+                .query_row(
+                    "SELECT 1 FROM mr_file_changes
+                 WHERE (new_path LIKE ?1 ESCAPE '\\' OR old_path LIKE ?1 ESCAPE '\\')
+                   AND (?2 IS NULL OR project_id = ?2)
+                 LIMIT 1",
+                    rusqlite::params![pat, project_id],
+                    |_| Ok(()),
+                )
+                .is_ok()
+    } else {
+        false
+    };
+
+    // Probe 3: suffix match — user typed a bare filename or partial path that
+    // doesn't exist as-is. Search for full paths ending with /input (or equal to input).
+    // This handles "login.rs" matching "src/auth/login.rs".
+    let suffix_resolved = if !forced_dir && !exact_exists && !prefix_exists && looks_like_file {
+        suffix_probe(conn, trimmed, project_id)?
+    } else {
+        SuffixResult::NotAttempted
+    };
+
+    match suffix_resolved {
+        SuffixResult::Unique(full_path) => Ok(PathQuery {
+            value: full_path,
+            is_prefix: false,
+        }),
+        SuffixResult::Ambiguous(candidates) => {
+            let list = candidates
+                .iter()
+                .map(|p| format!("  {p}"))
+                .collect::<Vec<_>>()
+                .join("\n");
+            Err(LoreError::Ambiguous(format!(
+                "'{trimmed}' matches multiple paths. Use the full path or -p to scope:\n{list}"
+            )))
+        }
+        SuffixResult::NotAttempted | SuffixResult::NoMatch => {
+            // Original logic: exact > prefix > heuristic
+            let is_file = if forced_dir {
+                false
+            } else if exact_exists {
+                true
+            } else if prefix_exists {
+                false
+            } else {
+                looks_like_file
+            };
+
+            if is_file {
+                Ok(PathQuery {
+                    value: trimmed.to_string(),
+                    is_prefix: false,
+                })
+            } else {
+                let escaped = escape_like(trimmed);
+                Ok(PathQuery {
+                    value: format!("{escaped}/%"),
+                    is_prefix: true,
+                })
+            }
+        }
+    }
+}
+
+/// Probe both notes and mr_file_changes for paths ending with the given suffix.
+/// Searches both new_path and old_path columns to support renamed file resolution.
+/// Returns up to 11 distinct candidates (enough to detect ambiguity + show a useful list).
+pub fn suffix_probe(
+    conn: &Connection,
+    suffix: &str,
+    project_id: Option<i64>,
+) -> Result<SuffixResult> {
+    let escaped = escape_like(suffix);
+    let suffix_pat = format!("%/{escaped}");
+
+    let mut stmt = conn.prepare_cached(
+        "SELECT DISTINCT full_path FROM (
+            SELECT position_new_path AS full_path
+            FROM notes INDEXED BY idx_notes_diffnote_path_created
+            WHERE note_type = 'DiffNote'
+              AND is_system = 0
+              AND (position_new_path LIKE ?1 ESCAPE '\\' OR position_new_path = ?2)
+              AND (?3 IS NULL OR project_id = ?3)
+            UNION
+            SELECT new_path AS full_path FROM mr_file_changes
+            WHERE (new_path LIKE ?1 ESCAPE '\\' OR new_path = ?2)
+              AND (?3 IS NULL OR project_id = ?3)
+            UNION
+            SELECT position_old_path AS full_path FROM notes
+            WHERE note_type = 'DiffNote'
+              AND is_system = 0
+              AND position_old_path IS NOT NULL
+              AND (position_old_path LIKE ?1 ESCAPE '\\' OR position_old_path = ?2)
+              AND (?3 IS NULL OR project_id = ?3)
+            UNION
+            SELECT old_path AS full_path FROM mr_file_changes
+            WHERE old_path IS NOT NULL
+              AND (old_path LIKE ?1 ESCAPE '\\' OR old_path = ?2)
+              AND (?3 IS NULL OR project_id = ?3)
+        )
+        ORDER BY full_path
+        LIMIT 11",
+    )?;
+
+    let candidates: Vec<String> = stmt
+        .query_map(rusqlite::params![suffix_pat, suffix, project_id], |row| {
+            row.get(0)
+        })?
+        .collect::<std::result::Result<Vec<_>, _>>()?;
+
+    match candidates.len() {
+        0 => Ok(SuffixResult::NoMatch),
+        1 => Ok(SuffixResult::Unique(candidates.into_iter().next().unwrap())),
+        _ => Ok(SuffixResult::Ambiguous(candidates)),
+    }
+}
+
+#[cfg(test)]
+#[path = "path_resolver_tests.rs"]
+mod tests;
--- a/src/core/path_resolver_tests.rs
+++ b/src/core/path_resolver_tests.rs
@@ -0,0 +1,290 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup_test_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn seed_project(conn: &Connection, id: i64) {
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (?1, ?1, 'group/repo', 'https://gl.example.com/group/repo', 1000, 2000)",
+        rusqlite::params![id],
+    )
+    .unwrap();
+}
+
+fn seed_mr(conn: &Connection, mr_id: i64, project_id: i64) {
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, \
+         created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (?1, ?1, ?1, ?2, 'MR', 'merged', 1000, 2000, 2000, 'feat', 'main')",
+        rusqlite::params![mr_id, project_id],
+    )
+    .unwrap();
+}
+
+fn seed_file_change(conn: &Connection, mr_id: i64, project_id: i64, path: &str) {
+    conn.execute(
+        "INSERT INTO mr_file_changes (merge_request_id, project_id, new_path, change_type)
+         VALUES (?1, ?2, ?3, 'modified')",
+        rusqlite::params![mr_id, project_id, path],
+    )
+    .unwrap();
+}
+
+fn seed_diffnote(conn: &Connection, id: i64, project_id: i64, path: &str) {
+    // Need a discussion first (MergeRequest type, linked to mr_id=1)
+    conn.execute(
+        "INSERT OR IGNORE INTO discussions (id, gitlab_discussion_id, project_id, \
+         merge_request_id, noteable_type, resolvable, resolved, last_seen_at, last_note_at)
+         VALUES (?1, ?2, ?3, 1, 'MergeRequest', 1, 0, 2000, 2000)",
+        rusqlite::params![id, format!("disc-{id}"), project_id],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, note_type, is_system, \
+         author_username, body, created_at, updated_at, last_seen_at, position_new_path)
+         VALUES (?1, ?1, ?1, ?2, 'DiffNote', 0, 'user', 'note', 1000, 2000, 2000, ?3)",
+        rusqlite::params![id, project_id, path],
+    )
+    .unwrap();
+}
+
+// ─── escape_like ─────────────────────────────────────────────────────────────
+
+#[test]
+fn test_escape_like() {
+    assert_eq!(escape_like("normal/path"), "normal/path");
+    assert_eq!(escape_like("has_underscore"), "has\\_underscore");
+    assert_eq!(escape_like("has%percent"), "has\\%percent");
+    assert_eq!(escape_like("has\\backslash"), "has\\\\backslash");
+}
+
+// ─── normalize_repo_path ─────────────────────────────────────────────────────
+
+#[test]
+fn test_normalize_repo_path() {
+    assert_eq!(normalize_repo_path("./src/foo/"), "src/foo/");
+    assert_eq!(normalize_repo_path("/src/foo/"), "src/foo/");
+    assert_eq!(normalize_repo_path("././src/foo"), "src/foo");
+    assert_eq!(normalize_repo_path("src\\foo\\bar.rs"), "src/foo/bar.rs");
+    assert_eq!(normalize_repo_path("src/foo\\bar"), "src/foo\\bar");
+    assert_eq!(normalize_repo_path("src//foo//bar/"), "src/foo/bar/");
+    assert_eq!(normalize_repo_path("  src/foo/  "), "src/foo/");
+    assert_eq!(normalize_repo_path("src/foo/bar.rs"), "src/foo/bar.rs");
+    assert_eq!(normalize_repo_path(""), "");
+}
+
+// ─── build_path_query heuristics (no DB data) ──────────────────────────────
+
+#[test]
+fn test_trailing_slash_is_prefix() {
+    let conn = setup_test_db();
+    let pq = build_path_query(&conn, "src/auth/", None).unwrap();
+    assert_eq!(pq.value, "src/auth/%");
+    assert!(pq.is_prefix);
+}
+
+#[test]
+fn test_no_dot_in_last_segment_is_prefix() {
+    let conn = setup_test_db();
+    let pq = build_path_query(&conn, "src/auth", None).unwrap();
+    assert_eq!(pq.value, "src/auth/%");
+    assert!(pq.is_prefix);
+}
+
+#[test]
+fn test_file_extension_is_exact() {
+    let conn = setup_test_db();
+    let pq = build_path_query(&conn, "src/auth/login.rs", None).unwrap();
+    assert_eq!(pq.value, "src/auth/login.rs");
+    assert!(!pq.is_prefix);
+}
+
+#[test]
+fn test_root_file_is_exact() {
+    let conn = setup_test_db();
+    let pq = build_path_query(&conn, "README.md", None).unwrap();
+    assert_eq!(pq.value, "README.md");
+    assert!(!pq.is_prefix);
+}
+
+#[test]
+fn test_dotless_root_file_is_exact() {
+    let conn = setup_test_db();
+    let pq = build_path_query(&conn, "Makefile", None).unwrap();
+    assert_eq!(pq.value, "Makefile");
+    assert!(!pq.is_prefix);
+
+    let pq = build_path_query(&conn, "LICENSE", None).unwrap();
+    assert_eq!(pq.value, "LICENSE");
+    assert!(!pq.is_prefix);
+}
+
+#[test]
+fn test_metacharacters_escaped_in_prefix() {
+    let conn = setup_test_db();
+    let pq = build_path_query(&conn, "src/test_files/", None).unwrap();
+    assert_eq!(pq.value, "src/test\\_files/%");
+    assert!(pq.is_prefix);
+}
+
+#[test]
+fn test_exact_value_not_escaped() {
+    let conn = setup_test_db();
+    let pq = build_path_query(&conn, "README_with_underscore.md", None).unwrap();
+    assert_eq!(pq.value, "README_with_underscore.md");
+    assert!(!pq.is_prefix);
+}
+
+// ─── build_path_query DB probes ─────────────────────────────────────────────
+
+#[test]
+fn test_db_probe_detects_dotless_file() {
+    // "src/Dockerfile" has no dot in last segment -> normally prefix.
+    // DB probe detects it's actually a file.
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_mr(&conn, 1, 1);
+    seed_diffnote(&conn, 1, 1, "src/Dockerfile");
+
+    let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();
+    assert_eq!(pq.value, "src/Dockerfile");
+    assert!(!pq.is_prefix);
+
+    // Without DB data -> falls through to prefix
+    let empty = setup_test_db();
+    let pq2 = build_path_query(&empty, "src/Dockerfile", None).unwrap();
+    assert!(pq2.is_prefix);
+}
+
+#[test]
+fn test_db_probe_via_file_changes() {
+    // Exact match via mr_file_changes even without notes
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_mr(&conn, 1, 1);
+    seed_file_change(&conn, 1, 1, "src/Dockerfile");
+
+    let pq = build_path_query(&conn, "src/Dockerfile", None).unwrap();
+    assert_eq!(pq.value, "src/Dockerfile");
+    assert!(!pq.is_prefix);
+}
+
+#[test]
+fn test_db_probe_project_scoped() {
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_project(&conn, 2);
+    seed_mr(&conn, 1, 1);
+    seed_diffnote(&conn, 1, 1, "infra/Makefile");
+
+    // Unscoped: finds it
+    assert!(
+        !build_path_query(&conn, "infra/Makefile", None)
+            .unwrap()
+            .is_prefix
+    );
+    // Scoped to project 1: finds it
+    assert!(
+        !build_path_query(&conn, "infra/Makefile", Some(1))
+            .unwrap()
+            .is_prefix
+    );
+    // Scoped to project 2: no data -> prefix
+    assert!(
+        build_path_query(&conn, "infra/Makefile", Some(2))
+            .unwrap()
+            .is_prefix
+    );
+}
+
+// ─── suffix resolution ──────────────────────────────────────────────────────
+
+#[test]
+fn test_suffix_resolves_bare_filename() {
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_mr(&conn, 1, 1);
+    seed_file_change(&conn, 1, 1, "src/auth/login.rs");
+
+    let pq = build_path_query(&conn, "login.rs", None).unwrap();
+    assert_eq!(pq.value, "src/auth/login.rs");
+    assert!(!pq.is_prefix);
+}
+
+#[test]
+fn test_suffix_resolves_partial_path() {
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_mr(&conn, 1, 1);
+    seed_file_change(&conn, 1, 1, "src/auth/login.rs");
+
+    let pq = build_path_query(&conn, "auth/login.rs", None).unwrap();
+    assert_eq!(pq.value, "src/auth/login.rs");
+    assert!(!pq.is_prefix);
+}
+
+#[test]
+fn test_suffix_ambiguous_returns_error() {
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_mr(&conn, 1, 1);
+    seed_file_change(&conn, 1, 1, "src/auth/utils.rs");
+    seed_file_change(&conn, 1, 1, "src/db/utils.rs");
+
+    let err = build_path_query(&conn, "utils.rs", None).unwrap_err();
+    let msg = err.to_string();
+    assert!(msg.contains("src/auth/utils.rs"), "candidates: {msg}");
+    assert!(msg.contains("src/db/utils.rs"), "candidates: {msg}");
+}
+
+#[test]
+fn test_suffix_scoped_to_project() {
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_project(&conn, 2);
+    seed_mr(&conn, 1, 1);
+    seed_mr(&conn, 2, 2);
+    seed_file_change(&conn, 1, 1, "src/utils.rs");
+    seed_file_change(&conn, 2, 2, "lib/utils.rs");
+
+    // Unscoped: ambiguous
+    assert!(build_path_query(&conn, "utils.rs", None).is_err());
+
+    // Scoped to project 1: resolves
+    let pq = build_path_query(&conn, "utils.rs", Some(1)).unwrap();
+    assert_eq!(pq.value, "src/utils.rs");
+}
+
+#[test]
+fn test_suffix_deduplicates_across_sources() {
+    // Same path in notes AND file_changes -> single match, not ambiguous
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_mr(&conn, 1, 1);
+    seed_file_change(&conn, 1, 1, "src/auth/login.rs");
+    seed_diffnote(&conn, 1, 1, "src/auth/login.rs");
+
+    let pq = build_path_query(&conn, "login.rs", None).unwrap();
+    assert_eq!(pq.value, "src/auth/login.rs");
+    assert!(!pq.is_prefix);
+}
+
+#[test]
+fn test_exact_match_preferred_over_suffix() {
+    let conn = setup_test_db();
+    seed_project(&conn, 1);
+    seed_mr(&conn, 1, 1);
+    seed_file_change(&conn, 1, 1, "README.md");
+    seed_file_change(&conn, 1, 1, "docs/README.md");
+
+    // "README.md" exists as exact match -> no ambiguity
+    let pq = build_path_query(&conn, "README.md", None).unwrap();
+    assert_eq!(pq.value, "README.md");
+    assert!(!pq.is_prefix);
+}
--- a/src/core/payloads.rs
+++ b/src/core/payloads.rs
@@ -95,110 +95,5 @@ pub fn read_payload(conn: &Connection, id: i64) -> Result<Option<serde_json::Val
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::db::create_connection;
-    use tempfile::tempdir;
-
-    fn setup_test_db() -> Connection {
-        let dir = tempdir().unwrap();
-        let db_path = dir.path().join("test.db");
-        let conn = create_connection(&db_path).unwrap();
-
-        conn.execute_batch(
-            "CREATE TABLE raw_payloads (
-                id INTEGER PRIMARY KEY,
-                source TEXT NOT NULL,
-                project_id INTEGER,
-                resource_type TEXT NOT NULL,
-                gitlab_id TEXT NOT NULL,
-                fetched_at INTEGER NOT NULL,
-                content_encoding TEXT NOT NULL DEFAULT 'identity',
-                payload_hash TEXT NOT NULL,
-                payload BLOB NOT NULL
-            );
-            CREATE UNIQUE INDEX uq_raw_payloads_dedupe
-                ON raw_payloads(project_id, resource_type, gitlab_id, payload_hash);",
-        )
-        .unwrap();
-
-        conn
-    }
-
-    #[test]
-    fn test_store_and_read_payload() {
-        let conn = setup_test_db();
-        let payload = serde_json::json!({"title": "Test Issue", "id": 123});
-        let json_bytes = serde_json::to_vec(&payload).unwrap();
-
-        let id = store_payload(
-            &conn,
-            StorePayloadOptions {
-                project_id: Some(1),
-                resource_type: "issue",
-                gitlab_id: "123",
-                json_bytes: &json_bytes,
-                compress: false,
-            },
-        )
-        .unwrap();
-
-        let result = read_payload(&conn, id).unwrap().unwrap();
-        assert_eq!(result["title"], "Test Issue");
-    }
-
-    #[test]
-    fn test_compression_roundtrip() {
-        let conn = setup_test_db();
-        let payload = serde_json::json!({"data": "x".repeat(1000)});
-        let json_bytes = serde_json::to_vec(&payload).unwrap();
-
-        let id = store_payload(
-            &conn,
-            StorePayloadOptions {
-                project_id: Some(1),
-                resource_type: "issue",
-                gitlab_id: "456",
-                json_bytes: &json_bytes,
-                compress: true,
-            },
-        )
-        .unwrap();
-
-        let result = read_payload(&conn, id).unwrap().unwrap();
-        assert_eq!(result["data"], "x".repeat(1000));
-    }
-
-    #[test]
-    fn test_deduplication() {
-        let conn = setup_test_db();
-        let payload = serde_json::json!({"id": 789});
-        let json_bytes = serde_json::to_vec(&payload).unwrap();
-
-        let id1 = store_payload(
-            &conn,
-            StorePayloadOptions {
-                project_id: Some(1),
-                resource_type: "issue",
-                gitlab_id: "789",
-                json_bytes: &json_bytes,
-                compress: false,
-            },
-        )
-        .unwrap();
-
-        let id2 = store_payload(
-            &conn,
-            StorePayloadOptions {
-                project_id: Some(1),
-                resource_type: "issue",
-                gitlab_id: "789",
-                json_bytes: &json_bytes,
-                compress: false,
-            },
-        )
-        .unwrap();
-
-        assert_eq!(id1, id2);
-    }
-}
+#[path = "payloads_tests.rs"]
+mod tests;
--- a/src/core/payloads_tests.rs
+++ b/src/core/payloads_tests.rs
@@ -0,0 +1,105 @@
+use super::*;
+use crate::core::db::create_connection;
+use tempfile::tempdir;
+
+fn setup_test_db() -> Connection {
+    let dir = tempdir().unwrap();
+    let db_path = dir.path().join("test.db");
+    let conn = create_connection(&db_path).unwrap();
+
+    conn.execute_batch(
+        "CREATE TABLE raw_payloads (
+            id INTEGER PRIMARY KEY,
+            source TEXT NOT NULL,
+            project_id INTEGER,
+            resource_type TEXT NOT NULL,
+            gitlab_id TEXT NOT NULL,
+            fetched_at INTEGER NOT NULL,
+            content_encoding TEXT NOT NULL DEFAULT 'identity',
+            payload_hash TEXT NOT NULL,
+            payload BLOB NOT NULL
+        );
+        CREATE UNIQUE INDEX uq_raw_payloads_dedupe
+            ON raw_payloads(project_id, resource_type, gitlab_id, payload_hash);",
+    )
+    .unwrap();
+
+    conn
+}
+
+#[test]
+fn test_store_and_read_payload() {
+    let conn = setup_test_db();
+    let payload = serde_json::json!({"title": "Test Issue", "id": 123});
+    let json_bytes = serde_json::to_vec(&payload).unwrap();
+
+    let id = store_payload(
+        &conn,
+        StorePayloadOptions {
+            project_id: Some(1),
+            resource_type: "issue",
+            gitlab_id: "123",
+            json_bytes: &json_bytes,
+            compress: false,
+        },
+    )
+    .unwrap();
+
+    let result = read_payload(&conn, id).unwrap().unwrap();
+    assert_eq!(result["title"], "Test Issue");
+}
+
+#[test]
+fn test_compression_roundtrip() {
+    let conn = setup_test_db();
+    let payload = serde_json::json!({"data": "x".repeat(1000)});
+    let json_bytes = serde_json::to_vec(&payload).unwrap();
+
+    let id = store_payload(
+        &conn,
+        StorePayloadOptions {
+            project_id: Some(1),
+            resource_type: "issue",
+            gitlab_id: "456",
+            json_bytes: &json_bytes,
+            compress: true,
+        },
+    )
+    .unwrap();
+
+    let result = read_payload(&conn, id).unwrap().unwrap();
+    assert_eq!(result["data"], "x".repeat(1000));
+}
+
+#[test]
+fn test_deduplication() {
+    let conn = setup_test_db();
+    let payload = serde_json::json!({"id": 789});
+    let json_bytes = serde_json::to_vec(&payload).unwrap();
+
+    let id1 = store_payload(
+        &conn,
+        StorePayloadOptions {
+            project_id: Some(1),
+            resource_type: "issue",
+            gitlab_id: "789",
+            json_bytes: &json_bytes,
+            compress: false,
+        },
+    )
+    .unwrap();
+
+    let id2 = store_payload(
+        &conn,
+        StorePayloadOptions {
+            project_id: Some(1),
+            resource_type: "issue",
+            gitlab_id: "789",
+            json_bytes: &json_bytes,
+            compress: false,
+        },
+    )
+    .unwrap();
+
+    assert_eq!(id1, id2);
+}
--- a/src/core/project.rs
+++ b/src/core/project.rs
@@ -1,6 +1,7 @@
 use rusqlite::Connection;

 use super::error::{LoreError, Result};
+use super::path_resolver::escape_like;

 pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {
    let exact = conn.query_row(
@@ -106,169 +107,6 @@ pub fn resolve_project(conn: &Connection, project_str: &str) -> Result<i64> {

 /// Escape LIKE metacharacters so `%` and `_` in user input are treated as
 /// literals.  All queries using this must include `ESCAPE '\'`.
-fn escape_like(input: &str) -> String {
-    input
-        .replace('\\', "\\\\")
-        .replace('%', "\\%")
-        .replace('_', "\\_")
-}
-
 #[cfg(test)]
-mod tests {
-    use super::*;
-
-    fn setup_db() -> Connection {
-        let conn = Connection::open_in_memory().unwrap();
-        conn.execute_batch(
-            "
-            CREATE TABLE projects (
-                id INTEGER PRIMARY KEY,
-                gitlab_project_id INTEGER UNIQUE NOT NULL,
-                path_with_namespace TEXT NOT NULL,
-                default_branch TEXT,
-                web_url TEXT,
-                created_at INTEGER,
-                updated_at INTEGER,
-                raw_payload_id INTEGER
-            );
-            CREATE INDEX idx_projects_path ON projects(path_with_namespace);
-        ",
-        )
-        .unwrap();
-        conn
-    }
-
-    fn insert_project(conn: &Connection, id: i64, path: &str) {
-        conn.execute(
-            "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (?1, ?2, ?3)",
-            rusqlite::params![id, id * 100, path],
-        )
-        .unwrap();
-    }
-
-    #[test]
-    fn test_exact_match() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/auth-service");
-        let id = resolve_project(&conn, "backend/auth-service").unwrap();
-        assert_eq!(id, 1);
-    }
-
-    #[test]
-    fn test_case_insensitive() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/auth-service");
-        let id = resolve_project(&conn, "Backend/Auth-Service").unwrap();
-        assert_eq!(id, 1);
-    }
-
-    #[test]
-    fn test_suffix_unambiguous() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/auth-service");
-        insert_project(&conn, 2, "frontend/web-ui");
-        let id = resolve_project(&conn, "auth-service").unwrap();
-        assert_eq!(id, 1);
-    }
-
-    #[test]
-    fn test_suffix_ambiguous() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/auth-service");
-        insert_project(&conn, 2, "frontend/auth-service");
-        let err = resolve_project(&conn, "auth-service").unwrap_err();
-        let msg = err.to_string();
-        assert!(
-            msg.contains("ambiguous"),
-            "Expected ambiguous error, got: {}",
-            msg
-        );
-        assert!(msg.contains("backend/auth-service"));
-        assert!(msg.contains("frontend/auth-service"));
-    }
-
-    #[test]
-    fn test_substring_unambiguous() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "vs/python-code");
-        insert_project(&conn, 2, "vs/typescript-code");
-        let id = resolve_project(&conn, "typescript").unwrap();
-        assert_eq!(id, 2);
-    }
-
-    #[test]
-    fn test_substring_case_insensitive() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "vs/python-code");
-        insert_project(&conn, 2, "vs/typescript-code");
-        let id = resolve_project(&conn, "TypeScript").unwrap();
-        assert_eq!(id, 2);
-    }
-
-    #[test]
-    fn test_substring_ambiguous() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "vs/python-code");
-        insert_project(&conn, 2, "vs/typescript-code");
-        let err = resolve_project(&conn, "code").unwrap_err();
-        let msg = err.to_string();
-        assert!(
-            msg.contains("ambiguous"),
-            "Expected ambiguous error, got: {}",
-            msg
-        );
-        assert!(msg.contains("vs/python-code"));
-        assert!(msg.contains("vs/typescript-code"));
-    }
-
-    #[test]
-    fn test_suffix_preferred_over_substring() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/auth-service");
-        insert_project(&conn, 2, "backend/auth-service-v2");
-        let id = resolve_project(&conn, "auth-service").unwrap();
-        assert_eq!(id, 1);
-    }
-
-    #[test]
-    fn test_no_match() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/auth-service");
-        let err = resolve_project(&conn, "nonexistent").unwrap_err();
-        let msg = err.to_string();
-        assert!(
-            msg.contains("not found"),
-            "Expected not found error, got: {}",
-            msg
-        );
-        assert!(msg.contains("backend/auth-service"));
-    }
-
-    #[test]
-    fn test_empty_projects() {
-        let conn = setup_db();
-        let err = resolve_project(&conn, "anything").unwrap_err();
-        let msg = err.to_string();
-        assert!(msg.contains("No projects have been synced"));
-    }
-
-    #[test]
-    fn test_underscore_not_wildcard() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/my_project");
-        insert_project(&conn, 2, "backend/my-project");
-        // `_` in user input must not match `-` (LIKE wildcard behavior)
-        let id = resolve_project(&conn, "my_project").unwrap();
-        assert_eq!(id, 1);
-    }
-
-    #[test]
-    fn test_percent_not_wildcard() {
-        let conn = setup_db();
-        insert_project(&conn, 1, "backend/a%b");
-        insert_project(&conn, 2, "backend/axyzb");
-        // `%` in user input must not match arbitrary strings
-        let id = resolve_project(&conn, "a%b").unwrap();
-        assert_eq!(id, 1);
-    }
-}
+#[path = "project_tests.rs"]
+mod tests;
--- a/src/core/project_tests.rs
+++ b/src/core/project_tests.rs
@@ -0,0 +1,156 @@
+use super::*;
+
+fn setup_db() -> Connection {
+    let conn = Connection::open_in_memory().unwrap();
+    conn.execute_batch(
+        "
+        CREATE TABLE projects (
+            id INTEGER PRIMARY KEY,
+            gitlab_project_id INTEGER UNIQUE NOT NULL,
+            path_with_namespace TEXT NOT NULL,
+            default_branch TEXT,
+            web_url TEXT,
+            created_at INTEGER,
+            updated_at INTEGER,
+            raw_payload_id INTEGER
+        );
+        CREATE INDEX idx_projects_path ON projects(path_with_namespace);
+    ",
+    )
+    .unwrap();
+    conn
+}
+
+fn insert_project(conn: &Connection, id: i64, path: &str) {
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (?1, ?2, ?3)",
+        rusqlite::params![id, id * 100, path],
+    )
+    .unwrap();
+}
+
+#[test]
+fn test_exact_match() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/auth-service");
+    let id = resolve_project(&conn, "backend/auth-service").unwrap();
+    assert_eq!(id, 1);
+}
+
+#[test]
+fn test_case_insensitive() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/auth-service");
+    let id = resolve_project(&conn, "Backend/Auth-Service").unwrap();
+    assert_eq!(id, 1);
+}
+
+#[test]
+fn test_suffix_unambiguous() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/auth-service");
+    insert_project(&conn, 2, "frontend/web-ui");
+    let id = resolve_project(&conn, "auth-service").unwrap();
+    assert_eq!(id, 1);
+}
+
+#[test]
+fn test_suffix_ambiguous() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/auth-service");
+    insert_project(&conn, 2, "frontend/auth-service");
+    let err = resolve_project(&conn, "auth-service").unwrap_err();
+    let msg = err.to_string();
+    assert!(
+        msg.contains("ambiguous"),
+        "Expected ambiguous error, got: {}",
+        msg
+    );
+    assert!(msg.contains("backend/auth-service"));
+    assert!(msg.contains("frontend/auth-service"));
+}
+
+#[test]
+fn test_substring_unambiguous() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "vs/python-code");
+    insert_project(&conn, 2, "vs/typescript-code");
+    let id = resolve_project(&conn, "typescript").unwrap();
+    assert_eq!(id, 2);
+}
+
+#[test]
+fn test_substring_case_insensitive() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "vs/python-code");
+    insert_project(&conn, 2, "vs/typescript-code");
+    let id = resolve_project(&conn, "TypeScript").unwrap();
+    assert_eq!(id, 2);
+}
+
+#[test]
+fn test_substring_ambiguous() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "vs/python-code");
+    insert_project(&conn, 2, "vs/typescript-code");
+    let err = resolve_project(&conn, "code").unwrap_err();
+    let msg = err.to_string();
+    assert!(
+        msg.contains("ambiguous"),
+        "Expected ambiguous error, got: {}",
+        msg
+    );
+    assert!(msg.contains("vs/python-code"));
+    assert!(msg.contains("vs/typescript-code"));
+}
+
+#[test]
+fn test_suffix_preferred_over_substring() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/auth-service");
+    insert_project(&conn, 2, "backend/auth-service-v2");
+    let id = resolve_project(&conn, "auth-service").unwrap();
+    assert_eq!(id, 1);
+}
+
+#[test]
+fn test_no_match() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/auth-service");
+    let err = resolve_project(&conn, "nonexistent").unwrap_err();
+    let msg = err.to_string();
+    assert!(
+        msg.contains("not found"),
+        "Expected not found error, got: {}",
+        msg
+    );
+    assert!(msg.contains("backend/auth-service"));
+}
+
+#[test]
+fn test_empty_projects() {
+    let conn = setup_db();
+    let err = resolve_project(&conn, "anything").unwrap_err();
+    let msg = err.to_string();
+    assert!(msg.contains("No projects have been synced"));
+}
+
+#[test]
+fn test_underscore_not_wildcard() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/my_project");
+    insert_project(&conn, 2, "backend/my-project");
+    // `_` in user input must not match `-` (LIKE wildcard behavior)
+    let id = resolve_project(&conn, "my_project").unwrap();
+    assert_eq!(id, 1);
+}
+
+#[test]
+fn test_percent_not_wildcard() {
+    let conn = setup_db();
+    insert_project(&conn, 1, "backend/a%b");
+    insert_project(&conn, 2, "backend/axyzb");
+    // `%` in user input must not match arbitrary strings
+    let id = resolve_project(&conn, "a%b").unwrap();
+    assert_eq!(id, 1);
+}
--- a/src/core/references.rs
+++ b/src/core/references.rs
@@ -122,430 +122,5 @@ pub fn count_references_for_source(
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::db::{create_connection, run_migrations};
-    use std::path::Path;
-
-    fn setup_test_db() -> Connection {
-        let conn = create_connection(Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-        conn
-    }
-
-    fn seed_project_issue_mr(conn: &Connection) -> (i64, i64, i64) {
-        conn.execute(
-            "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
-             VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
-            [],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
-             VALUES (1, 200, 10, 1, 'Test issue', 'closed', 1000, 2000, 2000)",
-            [],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
-             VALUES (1, 300, 5, 1, 'Test MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
-            [],
-        )
-        .unwrap();
-
-        (1, 1, 1)
-    }
-
-    #[test]
-    fn test_extract_refs_from_state_events_basic() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-
-        let count = extract_refs_from_state_events(&conn, project_id).unwrap();
-        assert_eq!(count, 1, "Should insert exactly one reference");
-
-        let (src_type, src_id, tgt_type, tgt_id, ref_type, method): (
-            String,
-            i64,
-            String,
-            i64,
-            String,
-            String,
-        ) = conn
-            .query_row(
-                "SELECT source_entity_type, source_entity_id,
-                        target_entity_type, target_entity_id,
-                        reference_type, source_method
-                 FROM entity_references WHERE project_id = ?1",
-                [project_id],
-                |row| {
-                    Ok((
-                        row.get(0)?,
-                        row.get(1)?,
-                        row.get(2)?,
-                        row.get(3)?,
-                        row.get(4)?,
-                        row.get(5)?,
-                    ))
-                },
-            )
-            .unwrap();
-
-        assert_eq!(src_type, "merge_request");
-        assert_eq!(src_id, mr_id, "Source should be the MR's local DB id");
-        assert_eq!(tgt_type, "issue");
-        assert_eq!(tgt_id, issue_id, "Target should be the issue's local DB id");
-        assert_eq!(ref_type, "closes");
-        assert_eq!(method, "api");
-    }
-
-    #[test]
-    fn test_extract_refs_dedup_with_closes_issues() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO entity_references
-             (project_id, source_entity_type, source_entity_id,
-              target_entity_type, target_entity_id,
-              reference_type, source_method, created_at)
-             VALUES (?1, 'merge_request', ?2, 'issue', ?3, 'closes', 'api', 3000)",
-            rusqlite::params![project_id, mr_id, issue_id],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-
-        let count = extract_refs_from_state_events(&conn, project_id).unwrap();
-        assert_eq!(count, 0, "Should not insert duplicate reference");
-
-        let total: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
-                [project_id],
-                |row| row.get(0),
-            )
-            .unwrap();
-        assert_eq!(total, 1, "Should still have exactly one reference");
-    }
-
-    #[test]
-    fn test_extract_refs_no_source_mr() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (1, ?1, ?2, NULL, 'closed', 3000, NULL)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-
-        let count = extract_refs_from_state_events(&conn, project_id).unwrap();
-        assert_eq!(count, 0, "Should not create refs when no source MR");
-    }
-
-    #[test]
-    fn test_extract_refs_mr_not_synced() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (2, ?1, ?2, NULL, 'closed', 3000, 999)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-
-        let count = extract_refs_from_state_events(&conn, project_id).unwrap();
-        assert_eq!(
-            count, 0,
-            "Should not create ref when MR is not synced locally"
-        );
-    }
-
-    #[test]
-    fn test_extract_refs_idempotent() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-
-        let count1 = extract_refs_from_state_events(&conn, project_id).unwrap();
-        assert_eq!(count1, 1);
-
-        let count2 = extract_refs_from_state_events(&conn, project_id).unwrap();
-        assert_eq!(count2, 0, "Second run should insert nothing (idempotent)");
-    }
-
-    #[test]
-    fn test_extract_refs_multiple_events_same_mr_issue() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (2, ?1, ?2, NULL, 'closed', 4000, 5)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-
-        let count = extract_refs_from_state_events(&conn, project_id).unwrap();
-        assert!(count <= 2, "At most 2 inserts attempted");
-
-        let total: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
-                [project_id],
-                |row| row.get(0),
-            )
-            .unwrap();
-        assert_eq!(
-            total, 1,
-            "Only one unique reference should exist for same MR->issue pair"
-        );
-    }
-
-    #[test]
-    fn test_extract_refs_scoped_to_project() {
-        let conn = setup_test_db();
-        seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
-             VALUES (2, 101, 'group/other', 'https://gitlab.example.com/group/other', 1000, 2000)",
-            [],
-        )
-        .unwrap();
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
-             VALUES (2, 201, 10, 2, 'Other issue', 'closed', 1000, 2000, 2000)",
-            [],
-        )
-        .unwrap();
-        conn.execute(
-            "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
-             VALUES (2, 301, 5, 2, 'Other MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
-            [],
-        )
-        .unwrap();
-
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (1, 1, 1, NULL, 'closed', 3000, 5)",
-            [],
-        )
-        .unwrap();
-        conn.execute(
-            "INSERT INTO resource_state_events
-             (gitlab_id, project_id, issue_id, merge_request_id, state,
-              created_at, source_merge_request_iid)
-             VALUES (2, 2, 2, NULL, 'closed', 3000, 5)",
-            [],
-        )
-        .unwrap();
-
-        let count = extract_refs_from_state_events(&conn, 1).unwrap();
-        assert_eq!(count, 1);
-
-        let total: i64 = conn
-            .query_row("SELECT COUNT(*) FROM entity_references", [], |row| {
-                row.get(0)
-            })
-            .unwrap();
-        assert_eq!(total, 1, "Only project 1 refs should be created");
-    }
-
-    #[test]
-    fn test_insert_entity_reference_creates_row() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
-
-        let ref_ = EntityReference {
-            project_id,
-            source_entity_type: "merge_request",
-            source_entity_id: mr_id,
-            target_entity_type: "issue",
-            target_entity_id: Some(issue_id),
-            target_project_path: None,
-            target_entity_iid: None,
-            reference_type: "closes",
-            source_method: "api",
-        };
-
-        let inserted = insert_entity_reference(&conn, &ref_).unwrap();
-        assert!(inserted);
-
-        let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
-        assert_eq!(count, 1);
-    }
-
-    #[test]
-    fn test_insert_entity_reference_idempotent() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
-
-        let ref_ = EntityReference {
-            project_id,
-            source_entity_type: "merge_request",
-            source_entity_id: mr_id,
-            target_entity_type: "issue",
-            target_entity_id: Some(issue_id),
-            target_project_path: None,
-            target_entity_iid: None,
-            reference_type: "closes",
-            source_method: "api",
-        };
-
-        let first = insert_entity_reference(&conn, &ref_).unwrap();
-        assert!(first);
-
-        let second = insert_entity_reference(&conn, &ref_).unwrap();
-        assert!(!second, "Duplicate insert should be ignored");
-
-        let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
-        assert_eq!(count, 1, "Still just one reference");
-    }
-
-    #[test]
-    fn test_insert_entity_reference_cross_project_unresolved() {
-        let conn = setup_test_db();
-        let (project_id, _issue_id, mr_id) = seed_project_issue_mr(&conn);
-
-        let ref_ = EntityReference {
-            project_id,
-            source_entity_type: "merge_request",
-            source_entity_id: mr_id,
-            target_entity_type: "issue",
-            target_entity_id: None,
-            target_project_path: Some("other-group/other-project"),
-            target_entity_iid: Some(99),
-            reference_type: "closes",
-            source_method: "api",
-        };
-
-        let inserted = insert_entity_reference(&conn, &ref_).unwrap();
-        assert!(inserted);
-
-        let (target_id, target_path, target_iid): (Option<i64>, Option<String>, Option<i64>) = conn
-            .query_row(
-                "SELECT target_entity_id, target_project_path, target_entity_iid \
-                 FROM entity_references WHERE source_entity_id = ?1",
-                [mr_id],
-                |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
-            )
-            .unwrap();
-
-        assert!(target_id.is_none());
-        assert_eq!(target_path, Some("other-group/other-project".to_string()));
-        assert_eq!(target_iid, Some(99));
-    }
-
-    #[test]
-    fn test_insert_multiple_closes_references() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
-
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
-             VALUES (10, 210, 11, ?1, 'Second issue', 'opened', 1000, 2000, 2000)",
-            rusqlite::params![project_id],
-        )
-        .unwrap();
-        let issue_id_2 = 10i64;
-
-        for target_id in [issue_id, issue_id_2] {
-            let ref_ = EntityReference {
-                project_id,
-                source_entity_type: "merge_request",
-                source_entity_id: mr_id,
-                target_entity_type: "issue",
-                target_entity_id: Some(target_id),
-                target_project_path: None,
-                target_entity_iid: None,
-                reference_type: "closes",
-                source_method: "api",
-            };
-            insert_entity_reference(&conn, &ref_).unwrap();
-        }
-
-        let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
-        assert_eq!(count, 2);
-    }
-
-    #[test]
-    fn test_resolve_issue_local_id_found() {
-        let conn = setup_test_db();
-        let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
-
-        let resolved = resolve_issue_local_id(&conn, project_id, 10).unwrap();
-        assert_eq!(resolved, Some(issue_id));
-    }
-
-    #[test]
-    fn test_resolve_issue_local_id_not_found() {
-        let conn = setup_test_db();
-        let (project_id, _issue_id, _mr_id) = seed_project_issue_mr(&conn);
-
-        let resolved = resolve_issue_local_id(&conn, project_id, 999).unwrap();
-        assert!(resolved.is_none());
-    }
-
-    #[test]
-    fn test_resolve_project_path_found() {
-        let conn = setup_test_db();
-        seed_project_issue_mr(&conn);
-
-        let path = resolve_project_path(&conn, 100).unwrap();
-        assert_eq!(path, Some("group/repo".to_string()));
-    }
-
-    #[test]
-    fn test_resolve_project_path_not_found() {
-        let conn = setup_test_db();
-
-        let path = resolve_project_path(&conn, 999).unwrap();
-        assert!(path.is_none());
-    }
-}
+#[path = "references_tests.rs"]
+mod tests;
--- a/src/core/references_tests.rs
+++ b/src/core/references_tests.rs
@@ -0,0 +1,425 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup_test_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn seed_project_issue_mr(conn: &Connection) -> (i64, i64, i64) {
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (1, 100, 'group/repo', 'https://gitlab.example.com/group/repo', 1000, 2000)",
+        [],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
+         VALUES (1, 200, 10, 1, 'Test issue', 'closed', 1000, 2000, 2000)",
+        [],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (1, 300, 5, 1, 'Test MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
+        [],
+    )
+    .unwrap();
+
+    (1, 1, 1)
+}
+
+#[test]
+fn test_extract_refs_from_state_events_basic() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+
+    let count = extract_refs_from_state_events(&conn, project_id).unwrap();
+    assert_eq!(count, 1, "Should insert exactly one reference");
+
+    let (src_type, src_id, tgt_type, tgt_id, ref_type, method): (
+        String,
+        i64,
+        String,
+        i64,
+        String,
+        String,
+    ) = conn
+        .query_row(
+            "SELECT source_entity_type, source_entity_id,
+                    target_entity_type, target_entity_id,
+                    reference_type, source_method
+             FROM entity_references WHERE project_id = ?1",
+            [project_id],
+            |row| {
+                Ok((
+                    row.get(0)?,
+                    row.get(1)?,
+                    row.get(2)?,
+                    row.get(3)?,
+                    row.get(4)?,
+                    row.get(5)?,
+                ))
+            },
+        )
+        .unwrap();
+
+    assert_eq!(src_type, "merge_request");
+    assert_eq!(src_id, mr_id, "Source should be the MR's local DB id");
+    assert_eq!(tgt_type, "issue");
+    assert_eq!(tgt_id, issue_id, "Target should be the issue's local DB id");
+    assert_eq!(ref_type, "closes");
+    assert_eq!(method, "api");
+}
+
+#[test]
+fn test_extract_refs_dedup_with_closes_issues() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO entity_references
+         (project_id, source_entity_type, source_entity_id,
+          target_entity_type, target_entity_id,
+          reference_type, source_method, created_at)
+         VALUES (?1, 'merge_request', ?2, 'issue', ?3, 'closes', 'api', 3000)",
+        rusqlite::params![project_id, mr_id, issue_id],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+
+    let count = extract_refs_from_state_events(&conn, project_id).unwrap();
+    assert_eq!(count, 0, "Should not insert duplicate reference");
+
+    let total: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
+            [project_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(total, 1, "Should still have exactly one reference");
+}
+
+#[test]
+fn test_extract_refs_no_source_mr() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (1, ?1, ?2, NULL, 'closed', 3000, NULL)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+
+    let count = extract_refs_from_state_events(&conn, project_id).unwrap();
+    assert_eq!(count, 0, "Should not create refs when no source MR");
+}
+
+#[test]
+fn test_extract_refs_mr_not_synced() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (2, ?1, ?2, NULL, 'closed', 3000, 999)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+
+    let count = extract_refs_from_state_events(&conn, project_id).unwrap();
+    assert_eq!(
+        count, 0,
+        "Should not create ref when MR is not synced locally"
+    );
+}
+
+#[test]
+fn test_extract_refs_idempotent() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+
+    let count1 = extract_refs_from_state_events(&conn, project_id).unwrap();
+    assert_eq!(count1, 1);
+
+    let count2 = extract_refs_from_state_events(&conn, project_id).unwrap();
+    assert_eq!(count2, 0, "Second run should insert nothing (idempotent)");
+}
+
+#[test]
+fn test_extract_refs_multiple_events_same_mr_issue() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (1, ?1, ?2, NULL, 'closed', 3000, 5)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (2, ?1, ?2, NULL, 'closed', 4000, 5)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+
+    let count = extract_refs_from_state_events(&conn, project_id).unwrap();
+    assert!(count <= 2, "At most 2 inserts attempted");
+
+    let total: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM entity_references WHERE project_id = ?1",
+            [project_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(
+        total, 1,
+        "Only one unique reference should exist for same MR->issue pair"
+    );
+}
+
+#[test]
+fn test_extract_refs_scoped_to_project() {
+    let conn = setup_test_db();
+    seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO projects (id, gitlab_project_id, path_with_namespace, web_url, created_at, updated_at)
+         VALUES (2, 101, 'group/other', 'https://gitlab.example.com/group/other', 1000, 2000)",
+        [],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
+         VALUES (2, 201, 10, 2, 'Other issue', 'closed', 1000, 2000, 2000)",
+        [],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO merge_requests (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at, source_branch, target_branch)
+         VALUES (2, 301, 5, 2, 'Other MR', 'merged', 1000, 2000, 2000, 'feature', 'main')",
+        [],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (1, 1, 1, NULL, 'closed', 3000, 5)",
+        [],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO resource_state_events
+         (gitlab_id, project_id, issue_id, merge_request_id, state,
+          created_at, source_merge_request_iid)
+         VALUES (2, 2, 2, NULL, 'closed', 3000, 5)",
+        [],
+    )
+    .unwrap();
+
+    let count = extract_refs_from_state_events(&conn, 1).unwrap();
+    assert_eq!(count, 1);
+
+    let total: i64 = conn
+        .query_row("SELECT COUNT(*) FROM entity_references", [], |row| {
+            row.get(0)
+        })
+        .unwrap();
+    assert_eq!(total, 1, "Only project 1 refs should be created");
+}
+
+#[test]
+fn test_insert_entity_reference_creates_row() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
+
+    let ref_ = EntityReference {
+        project_id,
+        source_entity_type: "merge_request",
+        source_entity_id: mr_id,
+        target_entity_type: "issue",
+        target_entity_id: Some(issue_id),
+        target_project_path: None,
+        target_entity_iid: None,
+        reference_type: "closes",
+        source_method: "api",
+    };
+
+    let inserted = insert_entity_reference(&conn, &ref_).unwrap();
+    assert!(inserted);
+
+    let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
+    assert_eq!(count, 1);
+}
+
+#[test]
+fn test_insert_entity_reference_idempotent() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
+
+    let ref_ = EntityReference {
+        project_id,
+        source_entity_type: "merge_request",
+        source_entity_id: mr_id,
+        target_entity_type: "issue",
+        target_entity_id: Some(issue_id),
+        target_project_path: None,
+        target_entity_iid: None,
+        reference_type: "closes",
+        source_method: "api",
+    };
+
+    let first = insert_entity_reference(&conn, &ref_).unwrap();
+    assert!(first);
+
+    let second = insert_entity_reference(&conn, &ref_).unwrap();
+    assert!(!second, "Duplicate insert should be ignored");
+
+    let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
+    assert_eq!(count, 1, "Still just one reference");
+}
+
+#[test]
+fn test_insert_entity_reference_cross_project_unresolved() {
+    let conn = setup_test_db();
+    let (project_id, _issue_id, mr_id) = seed_project_issue_mr(&conn);
+
+    let ref_ = EntityReference {
+        project_id,
+        source_entity_type: "merge_request",
+        source_entity_id: mr_id,
+        target_entity_type: "issue",
+        target_entity_id: None,
+        target_project_path: Some("other-group/other-project"),
+        target_entity_iid: Some(99),
+        reference_type: "closes",
+        source_method: "api",
+    };
+
+    let inserted = insert_entity_reference(&conn, &ref_).unwrap();
+    assert!(inserted);
+
+    let (target_id, target_path, target_iid): (Option<i64>, Option<String>, Option<i64>) = conn
+        .query_row(
+            "SELECT target_entity_id, target_project_path, target_entity_iid \
+             FROM entity_references WHERE source_entity_id = ?1",
+            [mr_id],
+            |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
+        )
+        .unwrap();
+
+    assert!(target_id.is_none());
+    assert_eq!(target_path, Some("other-group/other-project".to_string()));
+    assert_eq!(target_iid, Some(99));
+}
+
+#[test]
+fn test_insert_multiple_closes_references() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, mr_id) = seed_project_issue_mr(&conn);
+
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, iid, project_id, title, state, created_at, updated_at, last_seen_at)
+         VALUES (10, 210, 11, ?1, 'Second issue', 'opened', 1000, 2000, 2000)",
+        rusqlite::params![project_id],
+    )
+    .unwrap();
+    let issue_id_2 = 10i64;
+
+    for target_id in [issue_id, issue_id_2] {
+        let ref_ = EntityReference {
+            project_id,
+            source_entity_type: "merge_request",
+            source_entity_id: mr_id,
+            target_entity_type: "issue",
+            target_entity_id: Some(target_id),
+            target_project_path: None,
+            target_entity_iid: None,
+            reference_type: "closes",
+            source_method: "api",
+        };
+        insert_entity_reference(&conn, &ref_).unwrap();
+    }
+
+    let count = count_references_for_source(&conn, "merge_request", mr_id).unwrap();
+    assert_eq!(count, 2);
+}
+
+#[test]
+fn test_resolve_issue_local_id_found() {
+    let conn = setup_test_db();
+    let (project_id, issue_id, _mr_id) = seed_project_issue_mr(&conn);
+
+    let resolved = resolve_issue_local_id(&conn, project_id, 10).unwrap();
+    assert_eq!(resolved, Some(issue_id));
+}
+
+#[test]
+fn test_resolve_issue_local_id_not_found() {
+    let conn = setup_test_db();
+    let (project_id, _issue_id, _mr_id) = seed_project_issue_mr(&conn);
+
+    let resolved = resolve_issue_local_id(&conn, project_id, 999).unwrap();
+    assert!(resolved.is_none());
+}
+
+#[test]
+fn test_resolve_project_path_found() {
+    let conn = setup_test_db();
+    seed_project_issue_mr(&conn);
+
+    let path = resolve_project_path(&conn, 100).unwrap();
+    assert_eq!(path, Some("group/repo".to_string()));
+}
+
+#[test]
+fn test_resolve_project_path_not_found() {
+    let conn = setup_test_db();
+
+    let path = resolve_project_path(&conn, 999).unwrap();
+    assert!(path.is_none());
+}
--- a/src/core/sync_run.rs
+++ b/src/core/sync_run.rs
@@ -66,153 +66,5 @@ impl SyncRunRecorder {
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::db::{create_connection, run_migrations};
-    use std::path::Path;
-
-    fn setup_test_db() -> Connection {
-        let conn = create_connection(Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-        conn
-    }
-
-    #[test]
-    fn test_sync_run_recorder_start() {
-        let conn = setup_test_db();
-        let recorder = SyncRunRecorder::start(&conn, "sync", "abc12345").unwrap();
-        assert!(recorder.row_id > 0);
-
-        let (status, command, run_id): (String, String, String) = conn
-            .query_row(
-                "SELECT status, command, run_id FROM sync_runs WHERE id = ?1",
-                [recorder.row_id],
-                |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
-            )
-            .unwrap();
-
-        assert_eq!(status, "running");
-        assert_eq!(command, "sync");
-        assert_eq!(run_id, "abc12345");
-    }
-
-    #[test]
-    fn test_sync_run_recorder_succeed() {
-        let conn = setup_test_db();
-        let recorder = SyncRunRecorder::start(&conn, "sync", "def67890").unwrap();
-        let row_id = recorder.row_id;
-
-        let metrics = vec![StageTiming {
-            name: "ingest".to_string(),
-            project: None,
-            elapsed_ms: 1200,
-            items_processed: 50,
-            items_skipped: 0,
-            errors: 2,
-            rate_limit_hits: 0,
-            retries: 0,
-            sub_stages: vec![],
-        }];
-
-        recorder.succeed(&conn, &metrics, 50, 2).unwrap();
-
-        let (status, finished_at, metrics_json, total_items, total_errors): (
-            String,
-            Option<i64>,
-            Option<String>,
-            i64,
-            i64,
-        ) = conn
-            .query_row(
-                "SELECT status, finished_at, metrics_json, total_items_processed, total_errors
-                 FROM sync_runs WHERE id = ?1",
-                [row_id],
-                |row| {
-                    Ok((
-                        row.get(0)?,
-                        row.get(1)?,
-                        row.get(2)?,
-                        row.get(3)?,
-                        row.get(4)?,
-                    ))
-                },
-            )
-            .unwrap();
-
-        assert_eq!(status, "succeeded");
-        assert!(finished_at.is_some());
-        assert!(metrics_json.is_some());
-        assert_eq!(total_items, 50);
-        assert_eq!(total_errors, 2);
-
-        let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
-        assert_eq!(parsed.len(), 1);
-        assert_eq!(parsed[0].name, "ingest");
-    }
-
-    #[test]
-    fn test_sync_run_recorder_fail() {
-        let conn = setup_test_db();
-        let recorder = SyncRunRecorder::start(&conn, "ingest issues", "fail0001").unwrap();
-        let row_id = recorder.row_id;
-
-        recorder.fail(&conn, "GitLab auth failed", None).unwrap();
-
-        let (status, finished_at, error, metrics_json): (
-            String,
-            Option<i64>,
-            Option<String>,
-            Option<String>,
-        ) = conn
-            .query_row(
-                "SELECT status, finished_at, error, metrics_json
-                 FROM sync_runs WHERE id = ?1",
-                [row_id],
-                |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)),
-            )
-            .unwrap();
-
-        assert_eq!(status, "failed");
-        assert!(finished_at.is_some());
-        assert_eq!(error.as_deref(), Some("GitLab auth failed"));
-        assert!(metrics_json.is_none());
-    }
-
-    #[test]
-    fn test_sync_run_recorder_fail_with_partial_metrics() {
-        let conn = setup_test_db();
-        let recorder = SyncRunRecorder::start(&conn, "sync", "part0001").unwrap();
-        let row_id = recorder.row_id;
-
-        let partial_metrics = vec![StageTiming {
-            name: "ingest_issues".to_string(),
-            project: Some("group/repo".to_string()),
-            elapsed_ms: 800,
-            items_processed: 30,
-            items_skipped: 0,
-            errors: 0,
-            rate_limit_hits: 1,
-            retries: 0,
-            sub_stages: vec![],
-        }];
-
-        recorder
-            .fail(&conn, "Embedding failed", Some(&partial_metrics))
-            .unwrap();
-
-        let (status, metrics_json): (String, Option<String>) = conn
-            .query_row(
-                "SELECT status, metrics_json FROM sync_runs WHERE id = ?1",
-                [row_id],
-                |row| Ok((row.get(0)?, row.get(1)?)),
-            )
-            .unwrap();
-
-        assert_eq!(status, "failed");
-        assert!(metrics_json.is_some());
-
-        let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
-        assert_eq!(parsed.len(), 1);
-        assert_eq!(parsed[0].name, "ingest_issues");
-    }
-}
+#[path = "sync_run_tests.rs"]
+mod tests;
--- a/src/core/sync_run_tests.rs
+++ b/src/core/sync_run_tests.rs
@@ -0,0 +1,148 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup_test_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+#[test]
+fn test_sync_run_recorder_start() {
+    let conn = setup_test_db();
+    let recorder = SyncRunRecorder::start(&conn, "sync", "abc12345").unwrap();
+    assert!(recorder.row_id > 0);
+
+    let (status, command, run_id): (String, String, String) = conn
+        .query_row(
+            "SELECT status, command, run_id FROM sync_runs WHERE id = ?1",
+            [recorder.row_id],
+            |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?)),
+        )
+        .unwrap();
+
+    assert_eq!(status, "running");
+    assert_eq!(command, "sync");
+    assert_eq!(run_id, "abc12345");
+}
+
+#[test]
+fn test_sync_run_recorder_succeed() {
+    let conn = setup_test_db();
+    let recorder = SyncRunRecorder::start(&conn, "sync", "def67890").unwrap();
+    let row_id = recorder.row_id;
+
+    let metrics = vec![StageTiming {
+        name: "ingest".to_string(),
+        project: None,
+        elapsed_ms: 1200,
+        items_processed: 50,
+        items_skipped: 0,
+        errors: 2,
+        rate_limit_hits: 0,
+        retries: 0,
+        sub_stages: vec![],
+    }];
+
+    recorder.succeed(&conn, &metrics, 50, 2).unwrap();
+
+    let (status, finished_at, metrics_json, total_items, total_errors): (
+        String,
+        Option<i64>,
+        Option<String>,
+        i64,
+        i64,
+    ) = conn
+        .query_row(
+            "SELECT status, finished_at, metrics_json, total_items_processed, total_errors
+             FROM sync_runs WHERE id = ?1",
+            [row_id],
+            |row| {
+                Ok((
+                    row.get(0)?,
+                    row.get(1)?,
+                    row.get(2)?,
+                    row.get(3)?,
+                    row.get(4)?,
+                ))
+            },
+        )
+        .unwrap();
+
+    assert_eq!(status, "succeeded");
+    assert!(finished_at.is_some());
+    assert!(metrics_json.is_some());
+    assert_eq!(total_items, 50);
+    assert_eq!(total_errors, 2);
+
+    let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
+    assert_eq!(parsed.len(), 1);
+    assert_eq!(parsed[0].name, "ingest");
+}
+
+#[test]
+fn test_sync_run_recorder_fail() {
+    let conn = setup_test_db();
+    let recorder = SyncRunRecorder::start(&conn, "ingest issues", "fail0001").unwrap();
+    let row_id = recorder.row_id;
+
+    recorder.fail(&conn, "GitLab auth failed", None).unwrap();
+
+    let (status, finished_at, error, metrics_json): (
+        String,
+        Option<i64>,
+        Option<String>,
+        Option<String>,
+    ) = conn
+        .query_row(
+            "SELECT status, finished_at, error, metrics_json
+             FROM sync_runs WHERE id = ?1",
+            [row_id],
+            |row| Ok((row.get(0)?, row.get(1)?, row.get(2)?, row.get(3)?)),
+        )
+        .unwrap();
+
+    assert_eq!(status, "failed");
+    assert!(finished_at.is_some());
+    assert_eq!(error.as_deref(), Some("GitLab auth failed"));
+    assert!(metrics_json.is_none());
+}
+
+#[test]
+fn test_sync_run_recorder_fail_with_partial_metrics() {
+    let conn = setup_test_db();
+    let recorder = SyncRunRecorder::start(&conn, "sync", "part0001").unwrap();
+    let row_id = recorder.row_id;
+
+    let partial_metrics = vec![StageTiming {
+        name: "ingest_issues".to_string(),
+        project: Some("group/repo".to_string()),
+        elapsed_ms: 800,
+        items_processed: 30,
+        items_skipped: 0,
+        errors: 0,
+        rate_limit_hits: 1,
+        retries: 0,
+        sub_stages: vec![],
+    }];
+
+    recorder
+        .fail(&conn, "Embedding failed", Some(&partial_metrics))
+        .unwrap();
+
+    let (status, metrics_json): (String, Option<String>) = conn
+        .query_row(
+            "SELECT status, metrics_json FROM sync_runs WHERE id = ?1",
+            [row_id],
+            |row| Ok((row.get(0)?, row.get(1)?)),
+        )
+        .unwrap();
+
+    assert_eq!(status, "failed");
+    assert!(metrics_json.is_some());
+
+    let parsed: Vec<StageTiming> = serde_json::from_str(&metrics_json.unwrap()).unwrap();
+    assert_eq!(parsed.len(), 1);
+    assert_eq!(parsed[0].name, "ingest_issues");
+}
--- a/src/core/time.rs
+++ b/src/core/time.rs
@@ -17,21 +17,27 @@ pub fn now_ms() -> i64 {
 }

 pub fn parse_since(input: &str) -> Option<i64> {
+    parse_since_from(input, now_ms())
+}
+
+/// Like `parse_since` but durations are relative to `reference_ms` instead of now.
+/// Absolute dates/timestamps are returned as-is regardless of `reference_ms`.
+pub fn parse_since_from(input: &str, reference_ms: i64) -> Option<i64> {
    let input = input.trim();

    if let Some(num_str) = input.strip_suffix('d') {
        let days: i64 = num_str.parse().ok()?;
-        return Some(now_ms() - (days * 24 * 60 * 60 * 1000));
+        return Some(reference_ms - (days * 24 * 60 * 60 * 1000));
    }

    if let Some(num_str) = input.strip_suffix('w') {
        let weeks: i64 = num_str.parse().ok()?;
-        return Some(now_ms() - (weeks * 7 * 24 * 60 * 60 * 1000));
+        return Some(reference_ms - (weeks * 7 * 24 * 60 * 60 * 1000));
    }

    if let Some(num_str) = input.strip_suffix('m') {
        let months: i64 = num_str.parse().ok()?;
-        return Some(now_ms() - (months * 30 * 24 * 60 * 60 * 1000));
+        return Some(reference_ms - (months * 30 * 24 * 60 * 60 * 1000));
    }

    if input.len() == 10 && input.chars().filter(|&c| c == '-').count() == 2 {
--- a/src/core/timeline.rs
+++ b/src/core/timeline.rs
@@ -49,6 +49,21 @@ impl Ord for TimelineEvent {
    }
 }

+/// Maximum characters per note body in a discussion thread.
+pub const THREAD_NOTE_MAX_CHARS: usize = 2000;
+
+/// Maximum notes per discussion thread before truncation.
+pub const THREAD_MAX_NOTES: usize = 50;
+
+/// A single note within a discussion thread.
+#[derive(Debug, Clone, PartialEq, Eq, PartialOrd, Ord, Serialize)]
+pub struct ThreadNote {
+    pub note_id: i64,
+    pub author: Option<String>,
+    pub body: String,
+    pub created_at: i64,
+}
+
 /// Per spec Section 3.3. Serde tagged enum for JSON output.
 ///
 /// Variant declaration order defines the sort order within a timestamp+entity
@@ -78,11 +93,39 @@ pub enum TimelineEventType {
        snippet: String,
        discussion_id: Option<i64>,
    },
+    DiscussionThread {
+        discussion_id: i64,
+        notes: Vec<ThreadNote>,
+    },
    CrossReferenced {
        target: String,
    },
 }

+/// Truncate a string to at most `max_chars` characters on a safe UTF-8 boundary.
+pub(crate) fn truncate_to_chars(s: &str, max_chars: usize) -> String {
+    let char_count = s.chars().count();
+    if char_count <= max_chars {
+        return s.to_owned();
+    }
+
+    let byte_end = s
+        .char_indices()
+        .nth(max_chars)
+        .map(|(i, _)| i)
+        .unwrap_or(s.len());
+    s[..byte_end].to_owned()
+}
+
+/// A discussion matched during the seed phase, to be collected as a full thread.
+#[derive(Debug, Clone)]
+pub struct MatchedDiscussion {
+    pub discussion_id: i64,
+    pub entity_type: String,
+    pub entity_id: i64,
+    pub project_id: i64,
+}
+
 /// Internal entity reference used across pipeline stages.
 #[derive(Debug, Clone, Serialize)]
 pub struct EntityRef {
@@ -118,6 +161,8 @@ pub struct UnresolvedRef {
 #[derive(Debug, Clone, Serialize)]
 pub struct TimelineResult {
    pub query: String,
+    /// The search mode actually used for seeding (e.g. "hybrid", "lexical", "lexical (hybrid fallback)").
+    pub search_mode: String,
    pub events: Vec<TimelineEvent>,
    /// Total events before the `--limit` was applied (for meta.total_events vs meta.showing).
    #[serde(skip)]
@@ -166,6 +211,77 @@ pub fn resolve_entity_ref(
    }
 }

+/// Resolve an entity by its user-facing IID (e.g. issue #42) to a full [`EntityRef`].
+///
+/// Unlike [`resolve_entity_ref`] which takes an internal DB id, this takes the
+/// GitLab IID that users see. Used by entity-direct timeline seeding (`issue:42`).
+///
+/// When `project_id` is `Some`, the query is scoped to that project (disambiguates
+/// duplicate IIDs across projects).
+///
+/// Returns `LoreError::NotFound` when no match exists, `LoreError::Ambiguous` when
+/// the same IID exists in multiple projects (suggest `--project`).
+pub fn resolve_entity_by_iid(
+    conn: &Connection,
+    entity_type: &str,
+    iid: i64,
+    project_id: Option<i64>,
+) -> Result<EntityRef> {
+    let table = match entity_type {
+        "issue" => "issues",
+        "merge_request" => "merge_requests",
+        _ => {
+            return Err(super::error::LoreError::NotFound(format!(
+                "Unknown entity type: {entity_type}"
+            )));
+        }
+    };
+
+    let sql = format!(
+        "SELECT e.id, e.iid, p.path_with_namespace
+         FROM {table} e
+         JOIN projects p ON p.id = e.project_id
+         WHERE e.iid = ?1 AND (?2 IS NULL OR e.project_id = ?2)"
+    );
+
+    let mut stmt = conn.prepare(&sql)?;
+    let rows: Vec<(i64, i64, String)> = stmt
+        .query_map(rusqlite::params![iid, project_id], |row| {
+            Ok((
+                row.get::<_, i64>(0)?,
+                row.get::<_, i64>(1)?,
+                row.get::<_, String>(2)?,
+            ))
+        })?
+        .collect::<std::result::Result<Vec<_>, _>>()?;
+
+    match rows.len() {
+        0 => {
+            let sigil = if entity_type == "issue" { "#" } else { "!" };
+            Err(super::error::LoreError::NotFound(format!(
+                "{entity_type} {sigil}{iid} not found"
+            )))
+        }
+        1 => {
+            let (entity_id, entity_iid, project_path) = rows.into_iter().next().unwrap();
+            Ok(EntityRef {
+                entity_type: entity_type.to_owned(),
+                entity_id,
+                entity_iid,
+                project_path,
+            })
+        }
+        _ => {
+            let projects: Vec<&str> = rows.iter().map(|(_, _, p)| p.as_str()).collect();
+            let sigil = if entity_type == "issue" { "#" } else { "!" };
+            Err(super::error::LoreError::Ambiguous(format!(
+                "{entity_type} {sigil}{iid} exists in multiple projects: {}. Use --project to specify.",
+                projects.join(", ")
+            )))
+        }
+    }
+}
+
 #[cfg(test)]
 mod tests {
    use super::*;
@@ -248,7 +364,7 @@ mod tests {

    #[test]
    fn test_timeline_event_type_variant_count() {
-        // Verify all 9 variants serialize without panic
+        // Verify all 10 variants serialize without panic
        let variants: Vec<TimelineEventType> = vec![
            TimelineEventType::Created,
            TimelineEventType::StateChanged {
@@ -272,13 +388,198 @@ mod tests {
                snippet: "text".to_owned(),
                discussion_id: None,
            },
+            TimelineEventType::DiscussionThread {
+                discussion_id: 1,
+                notes: vec![ThreadNote {
+                    note_id: 1,
+                    author: Some("alice".to_owned()),
+                    body: "hello".to_owned(),
+                    created_at: 1000,
+                }],
+            },
            TimelineEventType::CrossReferenced {
                target: "!567".to_owned(),
            },
        ];
-        assert_eq!(variants.len(), 9);
+        assert_eq!(variants.len(), 10);
        for v in &variants {
            serde_json::to_value(v).unwrap();
        }
    }
+
+    #[test]
+    fn test_discussion_thread_serializes_tagged() {
+        let event_type = TimelineEventType::DiscussionThread {
+            discussion_id: 42,
+            notes: vec![
+                ThreadNote {
+                    note_id: 1,
+                    author: Some("alice".to_owned()),
+                    body: "first note".to_owned(),
+                    created_at: 1000,
+                },
+                ThreadNote {
+                    note_id: 2,
+                    author: Some("bob".to_owned()),
+                    body: "second note".to_owned(),
+                    created_at: 2000,
+                },
+            ],
+        };
+        let json = serde_json::to_value(&event_type).unwrap();
+        assert_eq!(json["kind"], "discussion_thread");
+        assert_eq!(json["discussion_id"], 42);
+        assert_eq!(json["notes"].as_array().unwrap().len(), 2);
+        assert_eq!(json["notes"][0]["note_id"], 1);
+        assert_eq!(json["notes"][0]["author"], "alice");
+        assert_eq!(json["notes"][0]["body"], "first note");
+        assert_eq!(json["notes"][1]["note_id"], 2);
+    }
+
+    #[test]
+    fn test_discussion_thread_sort_order() {
+        // DiscussionThread should sort after NoteEvidence, before CrossReferenced
+        let note_ev = TimelineEventType::NoteEvidence {
+            note_id: 1,
+            snippet: "a".to_owned(),
+            discussion_id: None,
+        };
+        let thread = TimelineEventType::DiscussionThread {
+            discussion_id: 1,
+            notes: vec![],
+        };
+        let cross_ref = TimelineEventType::CrossReferenced {
+            target: "!1".to_owned(),
+        };
+
+        assert!(note_ev < thread);
+        assert!(thread < cross_ref);
+    }
+
+    #[test]
+    fn test_thread_note_ord() {
+        let a = ThreadNote {
+            note_id: 1,
+            author: Some("alice".to_owned()),
+            body: "first".to_owned(),
+            created_at: 1000,
+        };
+        let b = ThreadNote {
+            note_id: 2,
+            author: Some("bob".to_owned()),
+            body: "second".to_owned(),
+            created_at: 2000,
+        };
+        // ThreadNote derives Ord — note_id is the first field, so ordering is by note_id
+        assert!(a < b);
+    }
+
+    #[test]
+    fn test_truncate_to_chars() {
+        assert_eq!(truncate_to_chars("hello", 200), "hello");
+        let long = "a".repeat(300);
+        assert_eq!(truncate_to_chars(&long, 200).chars().count(), 200);
+    }
+
+    // ─── resolve_entity_by_iid tests ────────────────────────────────────────
+
+    use crate::core::db::{create_connection, run_migrations};
+    use std::path::Path;
+
+    fn setup_db() -> Connection {
+        let conn = create_connection(Path::new(":memory:")).unwrap();
+        run_migrations(&conn).unwrap();
+        conn
+    }
+
+    fn insert_project(conn: &Connection, gitlab_id: i64, path: &str) -> i64 {
+        conn.execute(
+            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (?1, ?2, ?3)",
+            rusqlite::params![gitlab_id, path, format!("https://gitlab.com/{path}")],
+        )
+        .unwrap();
+        conn.last_insert_rowid()
+    }
+
+    fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
+        conn.execute(
+            "INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test issue', 'opened', 'alice', 1000, 2000, 3000)",
+            rusqlite::params![project_id * 10000 + iid, project_id, iid],
+        )
+        .unwrap();
+        conn.last_insert_rowid()
+    }
+
+    fn insert_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
+        conn.execute(
+            "INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
+            rusqlite::params![project_id * 10000 + iid, project_id, iid],
+        )
+        .unwrap();
+        conn.last_insert_rowid()
+    }
+
+    #[test]
+    fn test_resolve_entity_by_iid_issue() {
+        let conn = setup_db();
+        let project_id = insert_project(&conn, 1, "group/project");
+        let entity_id = insert_issue(&conn, project_id, 42);
+
+        let result = resolve_entity_by_iid(&conn, "issue", 42, None).unwrap();
+        assert_eq!(result.entity_type, "issue");
+        assert_eq!(result.entity_id, entity_id);
+        assert_eq!(result.entity_iid, 42);
+        assert_eq!(result.project_path, "group/project");
+    }
+
+    #[test]
+    fn test_resolve_entity_by_iid_mr() {
+        let conn = setup_db();
+        let project_id = insert_project(&conn, 1, "group/project");
+        let entity_id = insert_mr(&conn, project_id, 99);
+
+        let result = resolve_entity_by_iid(&conn, "merge_request", 99, None).unwrap();
+        assert_eq!(result.entity_type, "merge_request");
+        assert_eq!(result.entity_id, entity_id);
+        assert_eq!(result.entity_iid, 99);
+        assert_eq!(result.project_path, "group/project");
+    }
+
+    #[test]
+    fn test_resolve_entity_by_iid_not_found() {
+        let conn = setup_db();
+        insert_project(&conn, 1, "group/project");
+
+        let result = resolve_entity_by_iid(&conn, "issue", 999, None);
+        assert!(result.is_err());
+        let err = result.unwrap_err();
+        assert!(matches!(err, crate::core::error::LoreError::NotFound(_)));
+    }
+
+    #[test]
+    fn test_resolve_entity_by_iid_ambiguous() {
+        let conn = setup_db();
+        let proj1 = insert_project(&conn, 1, "group/project-a");
+        let proj2 = insert_project(&conn, 2, "group/project-b");
+        insert_issue(&conn, proj1, 42);
+        insert_issue(&conn, proj2, 42);
+
+        let result = resolve_entity_by_iid(&conn, "issue", 42, None);
+        assert!(result.is_err());
+        let err = result.unwrap_err();
+        assert!(matches!(err, crate::core::error::LoreError::Ambiguous(_)));
+    }
+
+    #[test]
+    fn test_resolve_entity_by_iid_project_scoped() {
+        let conn = setup_db();
+        let proj1 = insert_project(&conn, 1, "group/project-a");
+        let proj2 = insert_project(&conn, 2, "group/project-b");
+        insert_issue(&conn, proj1, 42);
+        let entity_id_b = insert_issue(&conn, proj2, 42);
+
+        let result = resolve_entity_by_iid(&conn, "issue", 42, Some(proj2)).unwrap();
+        assert_eq!(result.entity_id, entity_id_b);
+        assert_eq!(result.project_path, "group/project-b");
+    }
 }
--- a/src/core/timeline_collect.rs
+++ b/src/core/timeline_collect.rs
@@ -1,20 +1,27 @@
 use rusqlite::Connection;

+use std::collections::HashSet;
+
 use crate::core::error::{LoreError, Result};
-use crate::core::timeline::{EntityRef, ExpandedEntityRef, TimelineEvent, TimelineEventType};
+use crate::core::timeline::{
+    EntityRef, ExpandedEntityRef, MatchedDiscussion, THREAD_MAX_NOTES, THREAD_NOTE_MAX_CHARS,
+    ThreadNote, TimelineEvent, TimelineEventType, truncate_to_chars,
+};

 /// Collect all events for seed and expanded entities, interleave chronologically.
 ///
 /// Steps 4-5 of the timeline pipeline:
 /// 1. For each entity, collect Created, StateChanged, Label, Milestone, Merged events
-/// 2. Merge in evidence notes from the seed phase
-/// 3. Sort chronologically with stable tiebreak
-/// 4. Apply --since filter and --limit
+/// 2. Collect discussion threads from matched discussions
+/// 3. Merge in evidence notes from the seed phase
+/// 4. Sort chronologically with stable tiebreak
+/// 5. Apply --since filter and --limit
 pub fn collect_events(
    conn: &Connection,
    seed_entities: &[EntityRef],
    expanded_entities: &[ExpandedEntityRef],
    evidence_notes: &[TimelineEvent],
+    matched_discussions: &[MatchedDiscussion],
    since_ms: Option<i64>,
    limit: usize,
 ) -> Result<(Vec<TimelineEvent>, usize)> {
@@ -30,6 +37,10 @@ pub fn collect_events(
        collect_entity_events(conn, &expanded.entity_ref, false, &mut all_events)?;
    }

+    // Collect discussion threads
+    let entity_lookup = build_entity_lookup(seed_entities, expanded_entities);
+    collect_discussion_threads(conn, matched_discussions, &entity_lookup, &mut all_events)?;
+
    // Add evidence notes from seed phase
    all_events.extend(evidence_notes.iter().cloned());

@@ -369,327 +380,117 @@ fn entity_id_column(entity: &EntityRef) -> Result<(&'static str, i64)> {
    }
 }

-#[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::db::{create_connection, run_migrations};
-    use std::path::Path;
+/// Lookup key: (entity_type, entity_id) -> (iid, project_path)
+type EntityLookup = std::collections::HashMap<(String, i64), (i64, String)>;

-    fn setup_test_db() -> Connection {
-        let conn = create_connection(Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-        conn
-    }
-
-    fn insert_project(conn: &Connection) -> i64 {
-        conn.execute(
-            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
-            [],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
-        conn.execute(
-            "INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, web_url) VALUES (?1, ?2, ?3, 'Auth bug', 'opened', 'alice', 1000, 2000, 3000, 'https://gitlab.com/group/project/-/issues/1')",
-            rusqlite::params![iid * 100, project_id, iid],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_mr(conn: &Connection, project_id: i64, iid: i64, merged_at: Option<i64>) -> i64 {
-        conn.execute(
-            "INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, merged_at, merge_user_username, web_url) VALUES (?1, ?2, ?3, 'Fix auth', 'merged', 'bob', 1000, 5000, 6000, ?4, 'charlie', 'https://gitlab.com/group/project/-/merge_requests/10')",
-            rusqlite::params![iid * 100, project_id, iid, merged_at],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
-        EntityRef {
-            entity_type: entity_type.to_owned(),
-            entity_id,
-            entity_iid: iid,
-            project_path: "group/project".to_owned(),
-        }
-    }
-
-    fn insert_state_event(
-        conn: &Connection,
-        project_id: i64,
-        issue_id: Option<i64>,
-        mr_id: Option<i64>,
-        state: &str,
-        created_at: i64,
-    ) {
-        let gitlab_id: i64 = rand::random::<u32>().into();
-        conn.execute(
-            "INSERT INTO resource_state_events (gitlab_id, project_id, issue_id, merge_request_id, state, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, 'alice', ?6)",
-            rusqlite::params![gitlab_id, project_id, issue_id, mr_id, state, created_at],
-        )
-        .unwrap();
-    }
-
-    fn insert_label_event(
-        conn: &Connection,
-        project_id: i64,
-        issue_id: Option<i64>,
-        mr_id: Option<i64>,
-        action: &str,
-        label_name: Option<&str>,
-        created_at: i64,
-    ) {
-        let gitlab_id: i64 = rand::random::<u32>().into();
-        conn.execute(
-            "INSERT INTO resource_label_events (gitlab_id, project_id, issue_id, merge_request_id, action, label_name, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
-            rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, label_name, created_at],
-        )
-        .unwrap();
-    }
-
-    fn insert_milestone_event(
-        conn: &Connection,
-        project_id: i64,
-        issue_id: Option<i64>,
-        mr_id: Option<i64>,
-        action: &str,
-        milestone_title: Option<&str>,
-        created_at: i64,
-    ) {
-        let gitlab_id: i64 = rand::random::<u32>().into();
-        conn.execute(
-            "INSERT INTO resource_milestone_events (gitlab_id, project_id, issue_id, merge_request_id, action, milestone_title, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
-            rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, milestone_title, created_at],
-        )
-        .unwrap();
-    }
-
-    #[test]
-    fn test_collect_creation_event() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
-        assert_eq!(events.len(), 1);
-        assert!(matches!(events[0].event_type, TimelineEventType::Created));
-        assert_eq!(events[0].timestamp, 1000);
-        assert_eq!(events[0].actor, Some("alice".to_owned()));
-        assert!(events[0].is_seed);
-    }
-
-    #[test]
-    fn test_collect_state_events() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-
-        insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
-        insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 4000);
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
-
-        // Created + 2 state changes = 3
-        assert_eq!(events.len(), 3);
-        assert!(matches!(events[0].event_type, TimelineEventType::Created));
-        assert!(matches!(
-            events[1].event_type,
-            TimelineEventType::StateChanged { ref state } if state == "closed"
-        ));
-        assert!(matches!(
-            events[2].event_type,
-            TimelineEventType::StateChanged { ref state } if state == "reopened"
-        ));
-    }
-
-    #[test]
-    fn test_collect_merged_dedup() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let mr_id = insert_mr(&conn, project_id, 10, Some(5000));
-
-        // Also add a state event for 'merged' — this should NOT produce a StateChanged
-        insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
-
-        let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
-
-        // Should have Created + Merged (not Created + StateChanged{merged} + Merged)
-        let merged_count = events
-            .iter()
-            .filter(|e| matches!(e.event_type, TimelineEventType::Merged))
-            .count();
-        let state_merged_count = events
-            .iter()
-            .filter(|e| matches!(&e.event_type, TimelineEventType::StateChanged { state } if state == "merged"))
-            .count();
-
-        assert_eq!(merged_count, 1);
-        assert_eq!(state_merged_count, 0);
-    }
-
-    #[test]
-    fn test_collect_null_label_fallback() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-
-        insert_label_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
-
-        let label_event = events.iter().find(|e| {
-            matches!(&e.event_type, TimelineEventType::LabelAdded { label } if label == "[deleted label]")
-        });
-        assert!(label_event.is_some());
-    }
-
-    #[test]
-    fn test_collect_null_milestone_fallback() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-
-        insert_milestone_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
-
-        let ms_event = events.iter().find(|e| {
-            matches!(&e.event_type, TimelineEventType::MilestoneSet { milestone } if milestone == "[deleted milestone]")
-        });
-        assert!(ms_event.is_some());
-    }
-
-    #[test]
-    fn test_collect_since_filter() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-
-        insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
-        insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 5000);
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-
-        // Since 4000: should exclude Created (1000) and closed (3000)
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], Some(4000), 100).unwrap();
-        assert_eq!(events.len(), 1);
-        assert_eq!(events[0].timestamp, 5000);
-    }
-
-    #[test]
-    fn test_collect_chronological_sort() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let mr_id = insert_mr(&conn, project_id, 10, Some(4000));
-
-        insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
-        insert_label_event(
-            &conn,
-            project_id,
-            None,
-            Some(mr_id),
-            "add",
-            Some("bug"),
-            2000,
+fn build_entity_lookup(seeds: &[EntityRef], expanded: &[ExpandedEntityRef]) -> EntityLookup {
+    let mut lookup = EntityLookup::new();
+    for e in seeds {
+        lookup.insert(
+            (e.entity_type.clone(), e.entity_id),
+            (e.entity_iid, e.project_path.clone()),
        );
-
-        let seeds = vec![
-            make_entity_ref("issue", issue_id, 1),
-            make_entity_ref("merge_request", mr_id, 10),
-        ];
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
-
-        // Verify chronological order
-        for window in events.windows(2) {
-            assert!(window[0].timestamp <= window[1].timestamp);
-        }
    }
+    for exp in expanded {
+        let e = &exp.entity_ref;
+        lookup.insert(
+            (e.entity_type.clone(), e.entity_id),
+            (e.entity_iid, e.project_path.clone()),
+        );
+    }
+    lookup
+}

-    #[test]
-    fn test_collect_respects_limit() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
+/// Collect full discussion threads for matched discussions.
+fn collect_discussion_threads(
+    conn: &Connection,
+    matched_discussions: &[MatchedDiscussion],
+    entity_lookup: &EntityLookup,
+    events: &mut Vec<TimelineEvent>,
+) -> Result<()> {
+    // Deduplicate by discussion_id
+    let mut seen = HashSet::new();

-        for i in 0..20 {
-            insert_state_event(
-                &conn,
-                project_id,
-                Some(issue_id),
-                None,
-                "closed",
-                3000 + i * 100,
-            );
+    let mut stmt = conn.prepare(
+        "SELECT id, author_username, body, created_at FROM notes
+         WHERE discussion_id = ?1 AND is_system = 0
+         ORDER BY created_at ASC",
+    )?;
+
+    for disc in matched_discussions {
+        if !seen.insert(disc.discussion_id) {
+            continue;
        }

-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let (events, total) = collect_events(&conn, &seeds, &[], &[], None, 5).unwrap();
-        assert_eq!(events.len(), 5);
-        // 20 state changes + 1 created = 21 total before limit
-        assert_eq!(total, 21);
-    }
+        let (iid, project_path) =
+            match entity_lookup.get(&(disc.entity_type.clone(), disc.entity_id)) {
+                Some(val) => val.clone(),
+                None => continue, // entity not in seed or expanded set
+            };

-    #[test]
-    fn test_collect_evidence_notes_included() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
+        let rows = stmt.query_map(rusqlite::params![disc.discussion_id], |row| {
+            Ok((
+                row.get::<_, i64>(0)?,            // id
+                row.get::<_, Option<String>>(1)?, // author_username
+                row.get::<_, Option<String>>(2)?, // body
+                row.get::<_, i64>(3)?,            // created_at
+            ))
+        })?;

-        let evidence = vec![TimelineEvent {
-            timestamp: 2500,
-            entity_type: "issue".to_owned(),
-            entity_id: issue_id,
-            entity_iid: 1,
-            project_path: "group/project".to_owned(),
-            event_type: TimelineEventType::NoteEvidence {
-                note_id: 42,
-                snippet: "relevant note".to_owned(),
-                discussion_id: Some(1),
+        let mut notes = Vec::new();
+        for row_result in rows {
+            let (note_id, author, body, created_at) = row_result?;
+            let body = truncate_to_chars(body.as_deref().unwrap_or(""), THREAD_NOTE_MAX_CHARS);
+            notes.push(ThreadNote {
+                note_id,
+                author,
+                body,
+                created_at,
+            });
+        }
+
+        // Skip empty threads (all notes were system notes)
+        if notes.is_empty() {
+            continue;
+        }
+
+        let first_created_at = notes[0].created_at;
+
+        // Cap notes per thread
+        let total_notes = notes.len();
+        if total_notes > THREAD_MAX_NOTES {
+            notes.truncate(THREAD_MAX_NOTES);
+            notes.push(ThreadNote {
+                note_id: -1,
+                author: None,
+                body: format!("[{} more notes not shown]", total_notes - THREAD_MAX_NOTES),
+                created_at: notes.last().map_or(first_created_at, |n| n.created_at),
+            });
+        }
+
+        let note_count = notes.len();
+        let actor = notes.first().and_then(|n| n.author.clone());
+
+        events.push(TimelineEvent {
+            timestamp: first_created_at,
+            entity_type: disc.entity_type.clone(),
+            entity_id: disc.entity_id,
+            entity_iid: iid,
+            project_path,
+            event_type: TimelineEventType::DiscussionThread {
+                discussion_id: disc.discussion_id,
+                notes,
            },
-            summary: "Note by alice".to_owned(),
-            actor: Some("alice".to_owned()),
+            summary: format!("Discussion ({note_count} notes)"),
+            actor,
            url: None,
            is_seed: true,
-        }];
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let (events, _) = collect_events(&conn, &seeds, &[], &evidence, None, 100).unwrap();
-
-        let note_event = events.iter().find(|e| {
-            matches!(
-                &e.event_type,
-                TimelineEventType::NoteEvidence { note_id, .. } if *note_id == 42
-            )
        });
-        assert!(note_event.is_some());
    }

-    #[test]
-    fn test_collect_merged_fallback_to_state_event() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        // MR with merged_at = NULL
-        let mr_id = insert_mr(&conn, project_id, 10, None);
-
-        // But has a state event for 'merged'
-        insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
-
-        let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
-        let (events, _) = collect_events(&conn, &seeds, &[], &[], None, 100).unwrap();
-
-        let merged = events
-            .iter()
-            .find(|e| matches!(e.event_type, TimelineEventType::Merged));
-        assert!(merged.is_some());
-        assert_eq!(merged.unwrap().timestamp, 5000);
-    }
+    Ok(())
 }
+
+#[cfg(test)]
+#[path = "timeline_collect_tests.rs"]
+mod tests;
--- a/src/core/timeline_collect_tests.rs
+++ b/src/core/timeline_collect_tests.rs
@@ -0,0 +1,704 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup_test_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn insert_project(conn: &Connection) -> i64 {
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
+        [],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
+    conn.execute(
+        "INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, web_url) VALUES (?1, ?2, ?3, 'Auth bug', 'opened', 'alice', 1000, 2000, 3000, 'https://gitlab.com/group/project/-/issues/1')",
+        rusqlite::params![iid * 100, project_id, iid],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_mr(conn: &Connection, project_id: i64, iid: i64, merged_at: Option<i64>) -> i64 {
+    conn.execute(
+        "INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, merged_at, merge_user_username, web_url) VALUES (?1, ?2, ?3, 'Fix auth', 'merged', 'bob', 1000, 5000, 6000, ?4, 'charlie', 'https://gitlab.com/group/project/-/merge_requests/10')",
+        rusqlite::params![iid * 100, project_id, iid, merged_at],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
+    EntityRef {
+        entity_type: entity_type.to_owned(),
+        entity_id,
+        entity_iid: iid,
+        project_path: "group/project".to_owned(),
+    }
+}
+
+fn insert_state_event(
+    conn: &Connection,
+    project_id: i64,
+    issue_id: Option<i64>,
+    mr_id: Option<i64>,
+    state: &str,
+    created_at: i64,
+) {
+    let gitlab_id: i64 = rand::random::<u32>().into();
+    conn.execute(
+        "INSERT INTO resource_state_events (gitlab_id, project_id, issue_id, merge_request_id, state, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, 'alice', ?6)",
+        rusqlite::params![gitlab_id, project_id, issue_id, mr_id, state, created_at],
+    )
+    .unwrap();
+}
+
+fn insert_label_event(
+    conn: &Connection,
+    project_id: i64,
+    issue_id: Option<i64>,
+    mr_id: Option<i64>,
+    action: &str,
+    label_name: Option<&str>,
+    created_at: i64,
+) {
+    let gitlab_id: i64 = rand::random::<u32>().into();
+    conn.execute(
+        "INSERT INTO resource_label_events (gitlab_id, project_id, issue_id, merge_request_id, action, label_name, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
+        rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, label_name, created_at],
+    )
+    .unwrap();
+}
+
+fn insert_milestone_event(
+    conn: &Connection,
+    project_id: i64,
+    issue_id: Option<i64>,
+    mr_id: Option<i64>,
+    action: &str,
+    milestone_title: Option<&str>,
+    created_at: i64,
+) {
+    let gitlab_id: i64 = rand::random::<u32>().into();
+    conn.execute(
+        "INSERT INTO resource_milestone_events (gitlab_id, project_id, issue_id, merge_request_id, action, milestone_title, actor_username, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, 'alice', ?7)",
+        rusqlite::params![gitlab_id, project_id, issue_id, mr_id, action, milestone_title, created_at],
+    )
+    .unwrap();
+}
+
+#[test]
+fn test_collect_creation_event() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
+    assert_eq!(events.len(), 1);
+    assert!(matches!(events[0].event_type, TimelineEventType::Created));
+    assert_eq!(events[0].timestamp, 1000);
+    assert_eq!(events[0].actor, Some("alice".to_owned()));
+    assert!(events[0].is_seed);
+}
+
+#[test]
+fn test_collect_state_events() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
+    insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 4000);
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
+
+    // Created + 2 state changes = 3
+    assert_eq!(events.len(), 3);
+    assert!(matches!(events[0].event_type, TimelineEventType::Created));
+    assert!(matches!(
+        events[1].event_type,
+        TimelineEventType::StateChanged { ref state } if state == "closed"
+    ));
+    assert!(matches!(
+        events[2].event_type,
+        TimelineEventType::StateChanged { ref state } if state == "reopened"
+    ));
+}
+
+#[test]
+fn test_collect_merged_dedup() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let mr_id = insert_mr(&conn, project_id, 10, Some(5000));
+
+    // Also add a state event for 'merged' — this should NOT produce a StateChanged
+    insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
+
+    let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
+
+    // Should have Created + Merged (not Created + StateChanged{merged} + Merged)
+    let merged_count = events
+        .iter()
+        .filter(|e| matches!(e.event_type, TimelineEventType::Merged))
+        .count();
+    let state_merged_count = events
+        .iter()
+        .filter(|e| matches!(&e.event_type, TimelineEventType::StateChanged { state } if state == "merged"))
+        .count();
+
+    assert_eq!(merged_count, 1);
+    assert_eq!(state_merged_count, 0);
+}
+
+#[test]
+fn test_collect_null_label_fallback() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    insert_label_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
+
+    let label_event = events.iter().find(|e| {
+        matches!(&e.event_type, TimelineEventType::LabelAdded { label } if label == "[deleted label]")
+    });
+    assert!(label_event.is_some());
+}
+
+#[test]
+fn test_collect_null_milestone_fallback() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    insert_milestone_event(&conn, project_id, Some(issue_id), None, "add", None, 2000);
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
+
+    let ms_event = events.iter().find(|e| {
+        matches!(&e.event_type, TimelineEventType::MilestoneSet { milestone } if milestone == "[deleted milestone]")
+    });
+    assert!(ms_event.is_some());
+}
+
+#[test]
+fn test_collect_since_filter() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
+    insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 5000);
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+
+    // Since 4000: should exclude Created (1000) and closed (3000)
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], Some(4000), 100).unwrap();
+    assert_eq!(events.len(), 1);
+    assert_eq!(events[0].timestamp, 5000);
+}
+
+#[test]
+fn test_collect_chronological_sort() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let mr_id = insert_mr(&conn, project_id, 10, Some(4000));
+
+    insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
+    insert_label_event(
+        &conn,
+        project_id,
+        None,
+        Some(mr_id),
+        "add",
+        Some("bug"),
+        2000,
+    );
+
+    let seeds = vec![
+        make_entity_ref("issue", issue_id, 1),
+        make_entity_ref("merge_request", mr_id, 10),
+    ];
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
+
+    // Verify chronological order
+    for window in events.windows(2) {
+        assert!(window[0].timestamp <= window[1].timestamp);
+    }
+}
+
+#[test]
+fn test_collect_respects_limit() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    for i in 0..20 {
+        insert_state_event(
+            &conn,
+            project_id,
+            Some(issue_id),
+            None,
+            "closed",
+            3000 + i * 100,
+        );
+    }
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let (events, total) = collect_events(&conn, &seeds, &[], &[], &[], None, 5).unwrap();
+    assert_eq!(events.len(), 5);
+    // 20 state changes + 1 created = 21 total before limit
+    assert_eq!(total, 21);
+}
+
+#[test]
+fn test_collect_evidence_notes_included() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    let evidence = vec![TimelineEvent {
+        timestamp: 2500,
+        entity_type: "issue".to_owned(),
+        entity_id: issue_id,
+        entity_iid: 1,
+        project_path: "group/project".to_owned(),
+        event_type: TimelineEventType::NoteEvidence {
+            note_id: 42,
+            snippet: "relevant note".to_owned(),
+            discussion_id: Some(1),
+        },
+        summary: "Note by alice".to_owned(),
+        actor: Some("alice".to_owned()),
+        url: None,
+        is_seed: true,
+    }];
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let (events, _) = collect_events(&conn, &seeds, &[], &evidence, &[], None, 100).unwrap();
+
+    let note_event = events.iter().find(|e| {
+        matches!(
+            &e.event_type,
+            TimelineEventType::NoteEvidence { note_id, .. } if *note_id == 42
+        )
+    });
+    assert!(note_event.is_some());
+}
+
+#[test]
+fn test_collect_merged_fallback_to_state_event() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    // MR with merged_at = NULL
+    let mr_id = insert_mr(&conn, project_id, 10, None);
+
+    // But has a state event for 'merged'
+    insert_state_event(&conn, project_id, None, Some(mr_id), "merged", 5000);
+
+    let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &[], None, 100).unwrap();
+
+    let merged = events
+        .iter()
+        .find(|e| matches!(e.event_type, TimelineEventType::Merged));
+    assert!(merged.is_some());
+    assert_eq!(merged.unwrap().timestamp, 5000);
+}
+
+// ─── Discussion thread tests ────────────────────────────────────────────────
+
+fn insert_discussion(
+    conn: &Connection,
+    project_id: i64,
+    issue_id: Option<i64>,
+    mr_id: Option<i64>,
+) -> i64 {
+    let noteable_type = if issue_id.is_some() {
+        "Issue"
+    } else {
+        "MergeRequest"
+    };
+    conn.execute(
+        "INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, merge_request_id, noteable_type, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, 0)",
+        rusqlite::params![format!("disc_{}", rand::random::<u32>()), project_id, issue_id, mr_id, noteable_type],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+#[allow(clippy::too_many_arguments)]
+fn insert_note(
+    conn: &Connection,
+    discussion_id: i64,
+    project_id: i64,
+    author: &str,
+    body: &str,
+    is_system: bool,
+    created_at: i64,
+) -> i64 {
+    let gitlab_id: i64 = rand::random::<u32>().into();
+    conn.execute(
+        "INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, author_username, body, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?7, ?7)",
+        rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32, author, body, created_at],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn make_matched_discussion(
+    discussion_id: i64,
+    entity_type: &str,
+    entity_id: i64,
+    project_id: i64,
+) -> MatchedDiscussion {
+    MatchedDiscussion {
+        discussion_id,
+        entity_type: entity_type.to_owned(),
+        entity_id,
+        project_id,
+    }
+}
+
+#[test]
+fn test_collect_discussion_thread_basic() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "alice",
+        "First note",
+        false,
+        2000,
+    );
+    insert_note(&conn, disc_id, project_id, "bob", "Reply here", false, 3000);
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "alice",
+        "Follow up",
+        false,
+        4000,
+    );
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    let discussions = [make_matched_discussion(
+        disc_id, "issue", issue_id, project_id,
+    )];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    let thread = events
+        .iter()
+        .find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }));
+    assert!(thread.is_some(), "Should have a DiscussionThread event");
+
+    let thread = thread.unwrap();
+    if let TimelineEventType::DiscussionThread {
+        discussion_id,
+        notes,
+    } = &thread.event_type
+    {
+        assert_eq!(*discussion_id, disc_id);
+        assert_eq!(notes.len(), 3);
+        assert_eq!(notes[0].author.as_deref(), Some("alice"));
+        assert_eq!(notes[0].body, "First note");
+        assert_eq!(notes[1].author.as_deref(), Some("bob"));
+        assert_eq!(notes[2].body, "Follow up");
+    } else {
+        panic!("Expected DiscussionThread variant");
+    }
+}
+
+#[test]
+fn test_collect_discussion_thread_skips_system_notes() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "alice",
+        "User note",
+        false,
+        2000,
+    );
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "system",
+        "added label ~bug",
+        true,
+        3000,
+    );
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "bob",
+        "Another user note",
+        false,
+        4000,
+    );
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    let discussions = [make_matched_discussion(
+        disc_id, "issue", issue_id, project_id,
+    )];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    let thread = events
+        .iter()
+        .find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }));
+    assert!(thread.is_some());
+
+    if let TimelineEventType::DiscussionThread { notes, .. } = &thread.unwrap().event_type {
+        assert_eq!(notes.len(), 2, "System notes should be filtered out");
+        assert_eq!(notes[0].body, "User note");
+        assert_eq!(notes[1].body, "Another user note");
+    } else {
+        panic!("Expected DiscussionThread");
+    }
+}
+
+#[test]
+fn test_collect_discussion_thread_empty_after_system_filter() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    // Only system notes
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "system",
+        "added label",
+        true,
+        2000,
+    );
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "system",
+        "removed label",
+        true,
+        3000,
+    );
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    let discussions = [make_matched_discussion(
+        disc_id, "issue", issue_id, project_id,
+    )];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    let thread_count = events
+        .iter()
+        .filter(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
+        .count();
+    assert_eq!(
+        thread_count, 0,
+        "All-system-note discussion should produce no thread"
+    );
+}
+
+#[test]
+fn test_collect_discussion_thread_body_truncation() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    let long_body = "x".repeat(10_000);
+    insert_note(&conn, disc_id, project_id, "alice", &long_body, false, 2000);
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    let discussions = [make_matched_discussion(
+        disc_id, "issue", issue_id, project_id,
+    )];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    let thread = events
+        .iter()
+        .find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
+        .unwrap();
+
+    if let TimelineEventType::DiscussionThread { notes, .. } = &thread.event_type {
+        assert!(
+            notes[0].body.chars().count() <= crate::core::timeline::THREAD_NOTE_MAX_CHARS,
+            "Body should be truncated to THREAD_NOTE_MAX_CHARS"
+        );
+    } else {
+        panic!("Expected DiscussionThread");
+    }
+}
+
+#[test]
+fn test_collect_discussion_thread_note_cap() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    // Insert 60 notes, exceeding THREAD_MAX_NOTES (50)
+    for i in 0..60 {
+        insert_note(
+            &conn,
+            disc_id,
+            project_id,
+            "alice",
+            &format!("Note {i}"),
+            false,
+            2000 + i * 100,
+        );
+    }
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    let discussions = [make_matched_discussion(
+        disc_id, "issue", issue_id, project_id,
+    )];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    let thread = events
+        .iter()
+        .find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
+        .unwrap();
+
+    if let TimelineEventType::DiscussionThread { notes, .. } = &thread.event_type {
+        // 50 notes + 1 synthetic summary = 51
+        assert_eq!(
+            notes.len(),
+            crate::core::timeline::THREAD_MAX_NOTES + 1,
+            "Should cap at THREAD_MAX_NOTES + synthetic summary"
+        );
+        let last = notes.last().unwrap();
+        assert!(last.body.contains("more notes not shown"));
+    } else {
+        panic!("Expected DiscussionThread");
+    }
+}
+
+#[test]
+fn test_collect_discussion_thread_timestamp_is_first_note() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    insert_note(&conn, disc_id, project_id, "alice", "First", false, 5000);
+    insert_note(&conn, disc_id, project_id, "bob", "Second", false, 8000);
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    let discussions = [make_matched_discussion(
+        disc_id, "issue", issue_id, project_id,
+    )];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    let thread = events
+        .iter()
+        .find(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
+        .unwrap();
+
+    assert_eq!(
+        thread.timestamp, 5000,
+        "Thread timestamp should be first note's created_at"
+    );
+}
+
+#[test]
+fn test_collect_discussion_thread_sort_position() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    // Note at t=2000 (between Created at t=1000 and state change at t=3000)
+    insert_note(
+        &conn,
+        disc_id,
+        project_id,
+        "alice",
+        "discussion",
+        false,
+        2000,
+    );
+    insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 3000);
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    let discussions = [make_matched_discussion(
+        disc_id, "issue", issue_id, project_id,
+    )];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    // Expected order: Created(1000), DiscussionThread(2000), StateChanged(3000)
+    assert!(events.len() >= 3);
+    assert!(matches!(events[0].event_type, TimelineEventType::Created));
+    assert!(matches!(
+        events[1].event_type,
+        TimelineEventType::DiscussionThread { .. }
+    ));
+    assert!(matches!(
+        events[2].event_type,
+        TimelineEventType::StateChanged { .. }
+    ));
+}
+
+#[test]
+fn test_collect_discussion_thread_dedup() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    insert_note(&conn, disc_id, project_id, "alice", "hello", false, 2000);
+
+    let seeds = [make_entity_ref("issue", issue_id, 1)];
+    // Same discussion_id twice
+    let discussions = [
+        make_matched_discussion(disc_id, "issue", issue_id, project_id),
+        make_matched_discussion(disc_id, "issue", issue_id, project_id),
+    ];
+
+    let (events, _) = collect_events(&conn, &seeds, &[], &[], &discussions, None, 100).unwrap();
+
+    let thread_count = events
+        .iter()
+        .filter(|e| matches!(&e.event_type, TimelineEventType::DiscussionThread { .. }))
+        .count();
+    assert_eq!(
+        thread_count, 1,
+        "Duplicate discussion_id should produce one thread"
+    );
+}
--- a/src/core/timeline_expand.rs
+++ b/src/core/timeline_expand.rs
@@ -248,310 +248,5 @@ fn find_incoming(
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::db::{create_connection, run_migrations};
-    use std::path::Path;
-
-    fn setup_test_db() -> Connection {
-        let conn = create_connection(Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-        conn
-    }
-
-    fn insert_project(conn: &Connection) -> i64 {
-        conn.execute(
-            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
-            [],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
-        conn.execute(
-            "INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test', 'opened', 'alice', 1000, 2000, 3000)",
-            rusqlite::params![iid * 100, project_id, iid],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
-        conn.execute(
-            "INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
-            rusqlite::params![iid * 100, project_id, iid],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    #[allow(clippy::too_many_arguments)]
-    fn insert_ref(
-        conn: &Connection,
-        project_id: i64,
-        source_type: &str,
-        source_id: i64,
-        target_type: &str,
-        target_id: Option<i64>,
-        ref_type: &str,
-        source_method: &str,
-    ) {
-        conn.execute(
-            "INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, reference_type, source_method, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, 1000)",
-            rusqlite::params![project_id, source_type, source_id, target_type, target_id, ref_type, source_method],
-        )
-        .unwrap();
-    }
-
-    fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
-        EntityRef {
-            entity_type: entity_type.to_owned(),
-            entity_id,
-            entity_iid: iid,
-            project_path: "group/project".to_owned(),
-        }
-    }
-
-    #[test]
-    fn test_expand_depth_zero() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-
-        let result = expand_timeline(&conn, &seeds, 0, false, 100).unwrap();
-        assert!(result.expanded_entities.is_empty());
-        assert!(result.unresolved_references.is_empty());
-    }
-
-    #[test]
-    fn test_expand_finds_linked_entity() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let mr_id = insert_mr(&conn, project_id, 10);
-
-        // MR closes issue
-        insert_ref(
-            &conn,
-            project_id,
-            "merge_request",
-            mr_id,
-            "issue",
-            Some(issue_id),
-            "closes",
-            "api",
-        );
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
-
-        assert_eq!(result.expanded_entities.len(), 1);
-        assert_eq!(
-            result.expanded_entities[0].entity_ref.entity_type,
-            "merge_request"
-        );
-        assert_eq!(result.expanded_entities[0].entity_ref.entity_iid, 10);
-        assert_eq!(result.expanded_entities[0].depth, 1);
-    }
-
-    #[test]
-    fn test_expand_bidirectional() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let mr_id = insert_mr(&conn, project_id, 10);
-
-        // MR closes issue (MR is source, issue is target)
-        insert_ref(
-            &conn,
-            project_id,
-            "merge_request",
-            mr_id,
-            "issue",
-            Some(issue_id),
-            "closes",
-            "api",
-        );
-
-        // Starting from MR should find the issue (outgoing)
-        let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
-        let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
-
-        assert_eq!(result.expanded_entities.len(), 1);
-        assert_eq!(result.expanded_entities[0].entity_ref.entity_type, "issue");
-    }
-
-    #[test]
-    fn test_expand_respects_max_entities() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-
-        // Create 10 MRs that all close this issue
-        for i in 2..=11 {
-            let mr_id = insert_mr(&conn, project_id, i);
-            insert_ref(
-                &conn,
-                project_id,
-                "merge_request",
-                mr_id,
-                "issue",
-                Some(issue_id),
-                "closes",
-                "api",
-            );
-        }
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let result = expand_timeline(&conn, &seeds, 1, false, 3).unwrap();
-
-        assert!(result.expanded_entities.len() <= 3);
-    }
-
-    #[test]
-    fn test_expand_skips_mentions_by_default() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let mr_id = insert_mr(&conn, project_id, 10);
-
-        // MR mentions issue (should be skipped by default)
-        insert_ref(
-            &conn,
-            project_id,
-            "merge_request",
-            mr_id,
-            "issue",
-            Some(issue_id),
-            "mentioned",
-            "note_parse",
-        );
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
-        assert!(result.expanded_entities.is_empty());
-    }
-
-    #[test]
-    fn test_expand_includes_mentions_when_flagged() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let mr_id = insert_mr(&conn, project_id, 10);
-
-        // MR mentions issue
-        insert_ref(
-            &conn,
-            project_id,
-            "merge_request",
-            mr_id,
-            "issue",
-            Some(issue_id),
-            "mentioned",
-            "note_parse",
-        );
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let result = expand_timeline(&conn, &seeds, 1, true, 100).unwrap();
-        assert_eq!(result.expanded_entities.len(), 1);
-    }
-
-    #[test]
-    fn test_expand_collects_unresolved() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-
-        // Unresolved cross-project reference
-        conn.execute(
-            "INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, target_project_path, target_entity_iid, reference_type, source_method, created_at) VALUES (?1, 'issue', ?2, 'issue', NULL, 'other/repo', 42, 'closes', 'description_parse', 1000)",
-            rusqlite::params![project_id, issue_id],
-        )
-        .unwrap();
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
-
-        assert!(result.expanded_entities.is_empty());
-        assert_eq!(result.unresolved_references.len(), 1);
-        assert_eq!(
-            result.unresolved_references[0].target_project,
-            Some("other/repo".to_owned())
-        );
-        assert_eq!(result.unresolved_references[0].target_iid, Some(42));
-    }
-
-    #[test]
-    fn test_expand_tracks_provenance() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let mr_id = insert_mr(&conn, project_id, 10);
-
-        insert_ref(
-            &conn,
-            project_id,
-            "merge_request",
-            mr_id,
-            "issue",
-            Some(issue_id),
-            "closes",
-            "api",
-        );
-
-        let seeds = vec![make_entity_ref("issue", issue_id, 1)];
-        let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
-
-        assert_eq!(result.expanded_entities.len(), 1);
-        let expanded = &result.expanded_entities[0];
-        assert_eq!(expanded.via_reference_type, "closes");
-        assert_eq!(expanded.via_source_method, "api");
-        assert_eq!(expanded.via_from.entity_type, "issue");
-        assert_eq!(expanded.via_from.entity_id, issue_id);
-    }
-
-    #[test]
-    fn test_expand_no_duplicates() {
-        let conn = setup_test_db();
-        let project_id = insert_project(&conn);
-        let issue_id = insert_issue(&conn, project_id, 1);
-        let mr_id = insert_mr(&conn, project_id, 10);
-
-        // Two references from MR to same issue (different methods)
-        insert_ref(
-            &conn,
-            project_id,
-            "merge_request",
-            mr_id,
-            "issue",
-            Some(issue_id),
-            "closes",
-            "api",
-        );
-        insert_ref(
-            &conn,
-            project_id,
-            "merge_request",
-            mr_id,
-            "issue",
-            Some(issue_id),
-            "related",
-            "note_parse",
-        );
-
-        let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
-        let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
-
-        // Should only appear once (first-come wins)
-        assert_eq!(result.expanded_entities.len(), 1);
-    }
-
-    #[test]
-    fn test_expand_empty_seeds() {
-        let conn = setup_test_db();
-        let result = expand_timeline(&conn, &[], 1, false, 100).unwrap();
-        assert!(result.expanded_entities.is_empty());
-    }
-}
+#[path = "timeline_expand_tests.rs"]
+mod tests;
--- a/src/core/timeline_expand_tests.rs
+++ b/src/core/timeline_expand_tests.rs
@@ -0,0 +1,305 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup_test_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn insert_project(conn: &Connection) -> i64 {
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
+        [],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
+    conn.execute(
+        "INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test', 'opened', 'alice', 1000, 2000, 3000)",
+        rusqlite::params![iid * 100, project_id, iid],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
+    conn.execute(
+        "INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
+        rusqlite::params![iid * 100, project_id, iid],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+#[allow(clippy::too_many_arguments)]
+fn insert_ref(
+    conn: &Connection,
+    project_id: i64,
+    source_type: &str,
+    source_id: i64,
+    target_type: &str,
+    target_id: Option<i64>,
+    ref_type: &str,
+    source_method: &str,
+) {
+    conn.execute(
+        "INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, reference_type, source_method, created_at) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, 1000)",
+        rusqlite::params![project_id, source_type, source_id, target_type, target_id, ref_type, source_method],
+    )
+    .unwrap();
+}
+
+fn make_entity_ref(entity_type: &str, entity_id: i64, iid: i64) -> EntityRef {
+    EntityRef {
+        entity_type: entity_type.to_owned(),
+        entity_id,
+        entity_iid: iid,
+        project_path: "group/project".to_owned(),
+    }
+}
+
+#[test]
+fn test_expand_depth_zero() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+
+    let result = expand_timeline(&conn, &seeds, 0, false, 100).unwrap();
+    assert!(result.expanded_entities.is_empty());
+    assert!(result.unresolved_references.is_empty());
+}
+
+#[test]
+fn test_expand_finds_linked_entity() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let mr_id = insert_mr(&conn, project_id, 10);
+
+    // MR closes issue
+    insert_ref(
+        &conn,
+        project_id,
+        "merge_request",
+        mr_id,
+        "issue",
+        Some(issue_id),
+        "closes",
+        "api",
+    );
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
+
+    assert_eq!(result.expanded_entities.len(), 1);
+    assert_eq!(
+        result.expanded_entities[0].entity_ref.entity_type,
+        "merge_request"
+    );
+    assert_eq!(result.expanded_entities[0].entity_ref.entity_iid, 10);
+    assert_eq!(result.expanded_entities[0].depth, 1);
+}
+
+#[test]
+fn test_expand_bidirectional() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let mr_id = insert_mr(&conn, project_id, 10);
+
+    // MR closes issue (MR is source, issue is target)
+    insert_ref(
+        &conn,
+        project_id,
+        "merge_request",
+        mr_id,
+        "issue",
+        Some(issue_id),
+        "closes",
+        "api",
+    );
+
+    // Starting from MR should find the issue (outgoing)
+    let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
+    let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
+
+    assert_eq!(result.expanded_entities.len(), 1);
+    assert_eq!(result.expanded_entities[0].entity_ref.entity_type, "issue");
+}
+
+#[test]
+fn test_expand_respects_max_entities() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    // Create 10 MRs that all close this issue
+    for i in 2..=11 {
+        let mr_id = insert_mr(&conn, project_id, i);
+        insert_ref(
+            &conn,
+            project_id,
+            "merge_request",
+            mr_id,
+            "issue",
+            Some(issue_id),
+            "closes",
+            "api",
+        );
+    }
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let result = expand_timeline(&conn, &seeds, 1, false, 3).unwrap();
+
+    assert!(result.expanded_entities.len() <= 3);
+}
+
+#[test]
+fn test_expand_skips_mentions_by_default() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let mr_id = insert_mr(&conn, project_id, 10);
+
+    // MR mentions issue (should be skipped by default)
+    insert_ref(
+        &conn,
+        project_id,
+        "merge_request",
+        mr_id,
+        "issue",
+        Some(issue_id),
+        "mentioned",
+        "note_parse",
+    );
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
+    assert!(result.expanded_entities.is_empty());
+}
+
+#[test]
+fn test_expand_includes_mentions_when_flagged() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let mr_id = insert_mr(&conn, project_id, 10);
+
+    // MR mentions issue
+    insert_ref(
+        &conn,
+        project_id,
+        "merge_request",
+        mr_id,
+        "issue",
+        Some(issue_id),
+        "mentioned",
+        "note_parse",
+    );
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let result = expand_timeline(&conn, &seeds, 1, true, 100).unwrap();
+    assert_eq!(result.expanded_entities.len(), 1);
+}
+
+#[test]
+fn test_expand_collects_unresolved() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+
+    // Unresolved cross-project reference
+    conn.execute(
+        "INSERT INTO entity_references (project_id, source_entity_type, source_entity_id, target_entity_type, target_entity_id, target_project_path, target_entity_iid, reference_type, source_method, created_at) VALUES (?1, 'issue', ?2, 'issue', NULL, 'other/repo', 42, 'closes', 'description_parse', 1000)",
+        rusqlite::params![project_id, issue_id],
+    )
+    .unwrap();
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
+
+    assert!(result.expanded_entities.is_empty());
+    assert_eq!(result.unresolved_references.len(), 1);
+    assert_eq!(
+        result.unresolved_references[0].target_project,
+        Some("other/repo".to_owned())
+    );
+    assert_eq!(result.unresolved_references[0].target_iid, Some(42));
+}
+
+#[test]
+fn test_expand_tracks_provenance() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let mr_id = insert_mr(&conn, project_id, 10);
+
+    insert_ref(
+        &conn,
+        project_id,
+        "merge_request",
+        mr_id,
+        "issue",
+        Some(issue_id),
+        "closes",
+        "api",
+    );
+
+    let seeds = vec![make_entity_ref("issue", issue_id, 1)];
+    let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
+
+    assert_eq!(result.expanded_entities.len(), 1);
+    let expanded = &result.expanded_entities[0];
+    assert_eq!(expanded.via_reference_type, "closes");
+    assert_eq!(expanded.via_source_method, "api");
+    assert_eq!(expanded.via_from.entity_type, "issue");
+    assert_eq!(expanded.via_from.entity_id, issue_id);
+}
+
+#[test]
+fn test_expand_no_duplicates() {
+    let conn = setup_test_db();
+    let project_id = insert_project(&conn);
+    let issue_id = insert_issue(&conn, project_id, 1);
+    let mr_id = insert_mr(&conn, project_id, 10);
+
+    // Two references from MR to same issue (different methods)
+    insert_ref(
+        &conn,
+        project_id,
+        "merge_request",
+        mr_id,
+        "issue",
+        Some(issue_id),
+        "closes",
+        "api",
+    );
+    insert_ref(
+        &conn,
+        project_id,
+        "merge_request",
+        mr_id,
+        "issue",
+        Some(issue_id),
+        "related",
+        "note_parse",
+    );
+
+    let seeds = vec![make_entity_ref("merge_request", mr_id, 10)];
+    let result = expand_timeline(&conn, &seeds, 1, false, 100).unwrap();
+
+    // Should only appear once (first-come wins)
+    assert_eq!(result.expanded_entities.len(), 1);
+}
+
+#[test]
+fn test_expand_empty_seeds() {
+    let conn = setup_test_db();
+    let result = expand_timeline(&conn, &[], 1, false, 100).unwrap();
+    assert!(result.expanded_entities.is_empty());
+}
--- a/src/core/timeline_seed.rs
+++ b/src/core/timeline_seed.rs
@@ -4,24 +4,34 @@ use rusqlite::Connection;
 use tracing::debug;

 use crate::core::error::Result;
-use crate::core::timeline::{EntityRef, TimelineEvent, TimelineEventType, resolve_entity_ref};
-use crate::search::{FtsQueryMode, to_fts_query};
+use crate::core::timeline::{
+    EntityRef, MatchedDiscussion, TimelineEvent, TimelineEventType, resolve_entity_by_iid,
+    resolve_entity_ref, truncate_to_chars,
+};
+use crate::embedding::ollama::OllamaClient;
+use crate::search::{FtsQueryMode, SearchFilters, SearchMode, search_hybrid, to_fts_query};

 /// Result of the seed + hydrate phases.
 pub struct SeedResult {
    pub seed_entities: Vec<EntityRef>,
    pub evidence_notes: Vec<TimelineEvent>,
+    /// Discussions matched during seeding, to be collected as full threads.
+    pub matched_discussions: Vec<MatchedDiscussion>,
+    /// The search mode actually used (hybrid with fallback info).
+    pub search_mode: String,
 }

 /// Run the SEED + HYDRATE phases of the timeline pipeline.
 ///
-/// 1. SEED: FTS5 keyword search over documents -> matched document IDs
+/// 1. SEED: Hybrid search (FTS + vector via RRF) over documents -> matched document IDs
 /// 2. HYDRATE: Map document IDs -> source entities + top matched notes as evidence
 ///
+/// When `client` is `None` or Ollama is unavailable, falls back to FTS-only search.
 /// Discussion documents are resolved to their parent entity (issue or MR).
 /// Entities are deduplicated. Evidence notes are capped at `max_evidence`.
-pub fn seed_timeline(
+pub async fn seed_timeline(
    conn: &Connection,
+    client: Option<&OllamaClient>,
    query: &str,
    project_id: Option<i64>,
    since_ms: Option<i64>,
@@ -33,81 +43,206 @@ pub fn seed_timeline(
        return Ok(SeedResult {
            seed_entities: Vec::new(),
            evidence_notes: Vec::new(),
+            matched_discussions: Vec::new(),
+            search_mode: "lexical".to_owned(),
        });
    }

-    let seed_entities = find_seed_entities(conn, &fts_query, project_id, since_ms, max_seeds)?;
+    // Use hybrid search for seed entity discovery (better recall than FTS alone).
+    // search_hybrid gracefully falls back to FTS-only when Ollama is unavailable.
+    let filters = SearchFilters {
+        project_id,
+        updated_since: since_ms,
+        limit: max_seeds.saturating_mul(3),
+        ..SearchFilters::default()
+    };
+
+    let (hybrid_results, warnings) = search_hybrid(
+        conn,
+        client,
+        query,
+        SearchMode::Hybrid,
+        &filters,
+        FtsQueryMode::Safe,
+    )
+    .await?;
+
+    let search_mode = if warnings
+        .iter()
+        .any(|w| w.contains("falling back") || w.contains("FTS only"))
+    {
+        "lexical (hybrid fallback)".to_owned()
+    } else if client.is_some() && !hybrid_results.is_empty() {
+        "hybrid".to_owned()
+    } else {
+        "lexical".to_owned()
+    };
+
+    for w in &warnings {
+        debug!(warning = %w, "hybrid search warning during timeline seeding");
+    }
+
+    let (seed_entities, matched_discussions) = resolve_documents_to_entities(
+        conn,
+        &hybrid_results
+            .iter()
+            .map(|r| r.document_id)
+            .collect::<Vec<_>>(),
+        max_seeds,
+    )?;
+
+    // Evidence notes stay FTS-only (supplementary context, not worth a second embedding call)
    let evidence_notes = find_evidence_notes(conn, &fts_query, project_id, since_ms, max_evidence)?;

    Ok(SeedResult {
        seed_entities,
        evidence_notes,
+        matched_discussions,
+        search_mode,
    })
 }

-/// Find seed entities via FTS5 search, resolving discussions to their parent entity.
-fn find_seed_entities(
+/// Seed the timeline directly from an entity IID, bypassing search entirely.
+///
+/// Used for `issue:42` / `mr:99` syntax. Resolves the entity, gathers ALL its
+/// discussions, and returns a `SeedResult` compatible with the rest of the pipeline.
+pub fn seed_timeline_direct(
    conn: &Connection,
-    fts_query: &str,
+    entity_type: &str,
+    iid: i64,
    project_id: Option<i64>,
-    since_ms: Option<i64>,
-    max_seeds: usize,
-) -> Result<Vec<EntityRef>> {
-    let sql = r"
+) -> Result<SeedResult> {
+    let entity_ref = resolve_entity_by_iid(conn, entity_type, iid, project_id)?;
+
+    // Gather all discussions for this entity (not search-matched, ALL of them)
+    let entity_id_col = match entity_type {
+        "issue" => "issue_id",
+        "merge_request" => "merge_request_id",
+        _ => {
+            return Ok(SeedResult {
+                seed_entities: vec![entity_ref],
+                evidence_notes: Vec::new(),
+                matched_discussions: Vec::new(),
+                search_mode: "direct".to_owned(),
+            });
+        }
+    };
+
+    let sql = format!("SELECT id, project_id FROM discussions WHERE {entity_id_col} = ?1");
+    let mut stmt = conn.prepare(&sql)?;
+    let matched_discussions: Vec<MatchedDiscussion> = stmt
+        .query_map(rusqlite::params![entity_ref.entity_id], |row| {
+            Ok(MatchedDiscussion {
+                discussion_id: row.get(0)?,
+                entity_type: entity_type.to_owned(),
+                entity_id: entity_ref.entity_id,
+                project_id: row.get(1)?,
+            })
+        })?
+        .collect::<std::result::Result<Vec<_>, _>>()?;
+
+    Ok(SeedResult {
+        seed_entities: vec![entity_ref],
+        evidence_notes: Vec::new(),
+        matched_discussions,
+        search_mode: "direct".to_owned(),
+    })
+}
+
+/// Resolve a list of document IDs to deduplicated entity refs and matched discussions.
+/// Discussion and note documents are resolved to their parent entity (issue or MR).
+/// Returns (entities, matched_discussions).
+fn resolve_documents_to_entities(
+    conn: &Connection,
+    document_ids: &[i64],
+    max_entities: usize,
+) -> Result<(Vec<EntityRef>, Vec<MatchedDiscussion>)> {
+    if document_ids.is_empty() {
+        return Ok((Vec::new(), Vec::new()));
+    }
+
+    let placeholders: String = document_ids
+        .iter()
+        .map(|_| "?")
+        .collect::<Vec<_>>()
+        .join(",");
+    let sql = format!(
+        r"
        SELECT d.source_type, d.source_id, d.project_id,
-               disc.issue_id, disc.merge_request_id
-        FROM documents_fts
-        JOIN documents d ON d.id = documents_fts.rowid
+               COALESCE(disc.issue_id, note_disc.issue_id) AS issue_id,
+               COALESCE(disc.merge_request_id, note_disc.merge_request_id) AS mr_id,
+               COALESCE(disc.id, note_disc.id) AS discussion_id
+        FROM documents d
        LEFT JOIN discussions disc ON disc.id = d.source_id AND d.source_type = 'discussion'
-        WHERE documents_fts MATCH ?1
-          AND (?2 IS NULL OR d.project_id = ?2)
-          AND (?3 IS NULL OR d.updated_at >= ?3)
-        ORDER BY rank
-        LIMIT ?4
-    ";
+        LEFT JOIN notes n ON n.id = d.source_id AND d.source_type = 'note'
+        LEFT JOIN discussions note_disc ON note_disc.id = n.discussion_id AND d.source_type = 'note'
+        WHERE d.id IN ({placeholders})
+        ORDER BY CASE d.id {order_clause} END
+        ",
+        order_clause = document_ids
+            .iter()
+            .enumerate()
+            .map(|(i, id)| format!("WHEN {id} THEN {i}"))
+            .collect::<Vec<_>>()
+            .join(" "),
+    );

-    let mut stmt = conn.prepare(sql)?;
-    let rows = stmt.query_map(
-        rusqlite::params![
-            fts_query,
-            project_id,
-            since_ms,
-            max_seeds.saturating_mul(3) as i64
-        ],
-        |row| {
-            Ok((
-                row.get::<_, String>(0)?,
-                row.get::<_, i64>(1)?,
-                row.get::<_, i64>(2)?,
-                row.get::<_, Option<i64>>(3)?,
-                row.get::<_, Option<i64>>(4)?,
-            ))
-        },
-    )?;
+    let mut stmt = conn.prepare(&sql)?;
+    let params: Vec<&dyn rusqlite::types::ToSql> = document_ids
+        .iter()
+        .map(|id| id as &dyn rusqlite::types::ToSql)
+        .collect();
+    let rows = stmt.query_map(params.as_slice(), |row| {
+        Ok((
+            row.get::<_, String>(0)?,      // source_type
+            row.get::<_, i64>(1)?,         // source_id
+            row.get::<_, i64>(2)?,         // project_id
+            row.get::<_, Option<i64>>(3)?, // issue_id (coalesced)
+            row.get::<_, Option<i64>>(4)?, // mr_id (coalesced)
+            row.get::<_, Option<i64>>(5)?, // discussion_id (coalesced)
+        ))
+    })?;

-    let mut seen = HashSet::new();
+    let mut seen_entities = HashSet::new();
+    let mut seen_discussions = HashSet::new();
    let mut entities = Vec::new();
+    let mut matched_discussions = Vec::new();

    for row_result in rows {
-        let (source_type, source_id, proj_id, disc_issue_id, disc_mr_id) = row_result?;
+        let (source_type, source_id, proj_id, disc_issue_id, disc_mr_id, discussion_id) =
+            row_result?;

        let (entity_type, entity_id) = match source_type.as_str() {
            "issue" => ("issue".to_owned(), source_id),
            "merge_request" => ("merge_request".to_owned(), source_id),
-            "discussion" => {
+            "discussion" | "note" => {
                if let Some(issue_id) = disc_issue_id {
                    ("issue".to_owned(), issue_id)
                } else if let Some(mr_id) = disc_mr_id {
                    ("merge_request".to_owned(), mr_id)
                } else {
-                    continue; // orphaned discussion
+                    continue; // orphaned discussion/note
                }
            }
            _ => continue,
        };

+        // Capture matched discussion (deduplicated)
+        if let Some(disc_id) = discussion_id
+            && (source_type == "discussion" || source_type == "note")
+            && seen_discussions.insert(disc_id)
+        {
+            matched_discussions.push(MatchedDiscussion {
+                discussion_id: disc_id,
+                entity_type: entity_type.clone(),
+                entity_id,
+                project_id: proj_id,
+            });
+        }
+
+        // Entity dedup
        let key = (entity_type.clone(), entity_id);
-        if !seen.insert(key) {
+        if !seen_entities.insert(key) {
            continue;
        }

@@ -116,12 +251,12 @@ fn find_seed_entities(
            entities.push(entity_ref);
        }

-        if entities.len() >= max_seeds {
+        if entities.len() >= max_entities {
            break;
        }
    }

-    Ok(entities)
+    Ok((entities, matched_discussions))
 }

 /// Find evidence notes: FTS5-matched discussion notes that provide context.
@@ -217,336 +352,6 @@ fn find_evidence_notes(
    Ok(events)
 }

-/// Truncate a string to at most `max_chars` characters on a safe UTF-8 boundary.
-fn truncate_to_chars(s: &str, max_chars: usize) -> String {
-    let char_count = s.chars().count();
-    if char_count <= max_chars {
-        return s.to_owned();
-    }
-
-    let byte_end = s
-        .char_indices()
-        .nth(max_chars)
-        .map(|(i, _)| i)
-        .unwrap_or(s.len());
-    s[..byte_end].to_owned()
-}
-
 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::db::{create_connection, run_migrations};
-    use std::path::Path;
-
-    fn setup_test_db() -> Connection {
-        let conn = create_connection(Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-        conn
-    }
-
-    fn insert_test_project(conn: &Connection) -> i64 {
-        conn.execute(
-            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
-            [],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_test_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
-        conn.execute(
-            "INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test issue', 'opened', 'alice', 1000, 2000, 3000)",
-            rusqlite::params![iid * 100, project_id, iid],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_test_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
-        conn.execute(
-            "INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
-            rusqlite::params![iid * 100, project_id, iid],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_document(
-        conn: &Connection,
-        source_type: &str,
-        source_id: i64,
-        project_id: i64,
-        content: &str,
-    ) -> i64 {
-        conn.execute(
-            "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) VALUES (?1, ?2, ?3, ?4, ?5)",
-            rusqlite::params![source_type, source_id, project_id, content, format!("hash_{source_id}")],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_discussion(
-        conn: &Connection,
-        project_id: i64,
-        issue_id: Option<i64>,
-        mr_id: Option<i64>,
-    ) -> i64 {
-        let noteable_type = if issue_id.is_some() {
-            "Issue"
-        } else {
-            "MergeRequest"
-        };
-        conn.execute(
-            "INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, merge_request_id, noteable_type, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, 0)",
-            rusqlite::params![format!("disc_{}", rand::random::<u32>()), project_id, issue_id, mr_id, noteable_type],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_note(
-        conn: &Connection,
-        discussion_id: i64,
-        project_id: i64,
-        body: &str,
-        is_system: bool,
-    ) -> i64 {
-        let gitlab_id: i64 = rand::random::<u32>().into();
-        conn.execute(
-            "INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, author_username, body, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, ?4, 'alice', ?5, 5000, 5000, 5000)",
-            rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32, body],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    #[test]
-    fn test_seed_empty_query_returns_empty() {
-        let conn = setup_test_db();
-        let result = seed_timeline(&conn, "", None, None, 50, 10).unwrap();
-        assert!(result.seed_entities.is_empty());
-        assert!(result.evidence_notes.is_empty());
-    }
-
-    #[test]
-    fn test_seed_no_matches_returns_empty() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-        let issue_id = insert_test_issue(&conn, project_id, 1);
-        insert_document(
-            &conn,
-            "issue",
-            issue_id,
-            project_id,
-            "unrelated content here",
-        );
-
-        let result = seed_timeline(&conn, "nonexistent_xyzzy_query", None, None, 50, 10).unwrap();
-        assert!(result.seed_entities.is_empty());
-    }
-
-    #[test]
-    fn test_seed_finds_issue() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-        let issue_id = insert_test_issue(&conn, project_id, 42);
-        insert_document(
-            &conn,
-            "issue",
-            issue_id,
-            project_id,
-            "authentication error in login flow",
-        );
-
-        let result = seed_timeline(&conn, "authentication", None, None, 50, 10).unwrap();
-        assert_eq!(result.seed_entities.len(), 1);
-        assert_eq!(result.seed_entities[0].entity_type, "issue");
-        assert_eq!(result.seed_entities[0].entity_iid, 42);
-        assert_eq!(result.seed_entities[0].project_path, "group/project");
-    }
-
-    #[test]
-    fn test_seed_finds_mr() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-        let mr_id = insert_test_mr(&conn, project_id, 99);
-        insert_document(
-            &conn,
-            "merge_request",
-            mr_id,
-            project_id,
-            "fix authentication bug",
-        );
-
-        let result = seed_timeline(&conn, "authentication", None, None, 50, 10).unwrap();
-        assert_eq!(result.seed_entities.len(), 1);
-        assert_eq!(result.seed_entities[0].entity_type, "merge_request");
-        assert_eq!(result.seed_entities[0].entity_iid, 99);
-    }
-
-    #[test]
-    fn test_seed_deduplicates_entities() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-        let issue_id = insert_test_issue(&conn, project_id, 10);
-
-        // Two documents referencing the same issue
-        insert_document(
-            &conn,
-            "issue",
-            issue_id,
-            project_id,
-            "authentication error first doc",
-        );
-        let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
-        insert_document(
-            &conn,
-            "discussion",
-            disc_id,
-            project_id,
-            "authentication error second doc",
-        );
-
-        let result = seed_timeline(&conn, "authentication", None, None, 50, 10).unwrap();
-        // Should deduplicate: both map to the same issue
-        assert_eq!(result.seed_entities.len(), 1);
-        assert_eq!(result.seed_entities[0].entity_iid, 10);
-    }
-
-    #[test]
-    fn test_seed_resolves_discussion_to_parent() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-        let issue_id = insert_test_issue(&conn, project_id, 7);
-        let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
-        insert_document(
-            &conn,
-            "discussion",
-            disc_id,
-            project_id,
-            "deployment pipeline failed",
-        );
-
-        let result = seed_timeline(&conn, "deployment", None, None, 50, 10).unwrap();
-        assert_eq!(result.seed_entities.len(), 1);
-        assert_eq!(result.seed_entities[0].entity_type, "issue");
-        assert_eq!(result.seed_entities[0].entity_iid, 7);
-    }
-
-    #[test]
-    fn test_seed_evidence_capped() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-        let issue_id = insert_test_issue(&conn, project_id, 1);
-
-        // Create 15 discussion documents with notes about "deployment"
-        for i in 0..15 {
-            let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
-            insert_document(
-                &conn,
-                "discussion",
-                disc_id,
-                project_id,
-                &format!("deployment issue number {i}"),
-            );
-            insert_note(
-                &conn,
-                disc_id,
-                project_id,
-                &format!("deployment note {i}"),
-                false,
-            );
-        }
-
-        let result = seed_timeline(&conn, "deployment", None, None, 50, 5).unwrap();
-        assert!(result.evidence_notes.len() <= 5);
-    }
-
-    #[test]
-    fn test_seed_evidence_snippet_truncated() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-        let issue_id = insert_test_issue(&conn, project_id, 1);
-        let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
-        insert_document(
-            &conn,
-            "discussion",
-            disc_id,
-            project_id,
-            "deployment configuration",
-        );
-
-        let long_body = "x".repeat(500);
-        insert_note(&conn, disc_id, project_id, &long_body, false);
-
-        let result = seed_timeline(&conn, "deployment", None, None, 50, 10).unwrap();
-        assert!(!result.evidence_notes.is_empty());
-        if let TimelineEventType::NoteEvidence { snippet, .. } =
-            &result.evidence_notes[0].event_type
-        {
-            assert!(snippet.chars().count() <= 200);
-        } else {
-            panic!("Expected NoteEvidence");
-        }
-    }
-
-    #[test]
-    fn test_seed_respects_project_filter() {
-        let conn = setup_test_db();
-        let project_id = insert_test_project(&conn);
-
-        // Insert a second project
-        conn.execute(
-            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (2, 'other/repo', 'https://gitlab.com/other/repo')",
-            [],
-        )
-        .unwrap();
-        let project2_id = conn.last_insert_rowid();
-
-        let issue1_id = insert_test_issue(&conn, project_id, 1);
-        insert_document(
-            &conn,
-            "issue",
-            issue1_id,
-            project_id,
-            "authentication error",
-        );
-
-        let issue2_id = insert_test_issue(&conn, project2_id, 2);
-        insert_document(
-            &conn,
-            "issue",
-            issue2_id,
-            project2_id,
-            "authentication error",
-        );
-
-        // Filter to project 1 only
-        let result =
-            seed_timeline(&conn, "authentication", Some(project_id), None, 50, 10).unwrap();
-        assert_eq!(result.seed_entities.len(), 1);
-        assert_eq!(result.seed_entities[0].project_path, "group/project");
-    }
-
-    #[test]
-    fn test_truncate_to_chars_short() {
-        assert_eq!(truncate_to_chars("hello", 200), "hello");
-    }
-
-    #[test]
-    fn test_truncate_to_chars_long() {
-        let long = "a".repeat(300);
-        let result = truncate_to_chars(&long, 200);
-        assert_eq!(result.chars().count(), 200);
-    }
-
-    #[test]
-    fn test_truncate_to_chars_multibyte() {
-        let s = "\u{1F600}".repeat(300); // emoji
-        let result = truncate_to_chars(&s, 200);
-        assert_eq!(result.chars().count(), 200);
-        // Verify valid UTF-8
-        assert!(std::str::from_utf8(result.as_bytes()).is_ok());
-    }
-}
+#[path = "timeline_seed_tests.rs"]
+mod tests;
--- a/src/core/timeline_seed_tests.rs
+++ b/src/core/timeline_seed_tests.rs
@@ -0,0 +1,512 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup_test_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn insert_test_project(conn: &Connection) -> i64 {
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/project', 'https://gitlab.com/group/project')",
+        [],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_test_issue(conn: &Connection, project_id: i64, iid: i64) -> i64 {
+    conn.execute(
+        "INSERT INTO issues (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test issue', 'opened', 'alice', 1000, 2000, 3000)",
+        rusqlite::params![iid * 100, project_id, iid],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_test_mr(conn: &Connection, project_id: i64, iid: i64) -> i64 {
+    conn.execute(
+        "INSERT INTO merge_requests (gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, 'Test MR', 'opened', 'bob', 1000, 2000, 3000)",
+        rusqlite::params![iid * 100, project_id, iid],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_document(
+    conn: &Connection,
+    source_type: &str,
+    source_id: i64,
+    project_id: i64,
+    content: &str,
+) -> i64 {
+    conn.execute(
+        "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) VALUES (?1, ?2, ?3, ?4, ?5)",
+        rusqlite::params![source_type, source_id, project_id, content, format!("hash_{source_id}")],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_discussion(
+    conn: &Connection,
+    project_id: i64,
+    issue_id: Option<i64>,
+    mr_id: Option<i64>,
+) -> i64 {
+    let noteable_type = if issue_id.is_some() {
+        "Issue"
+    } else {
+        "MergeRequest"
+    };
+    conn.execute(
+        "INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, merge_request_id, noteable_type, last_seen_at) VALUES (?1, ?2, ?3, ?4, ?5, 0)",
+        rusqlite::params![format!("disc_{}", rand::random::<u32>()), project_id, issue_id, mr_id, noteable_type],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_note(
+    conn: &Connection,
+    discussion_id: i64,
+    project_id: i64,
+    body: &str,
+    is_system: bool,
+) -> i64 {
+    let gitlab_id: i64 = rand::random::<u32>().into();
+    conn.execute(
+        "INSERT INTO notes (gitlab_id, discussion_id, project_id, is_system, author_username, body, created_at, updated_at, last_seen_at) VALUES (?1, ?2, ?3, ?4, 'alice', ?5, 5000, 5000, 5000)",
+        rusqlite::params![gitlab_id, discussion_id, project_id, is_system as i32, body],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+#[tokio::test]
+async fn test_seed_empty_query_returns_empty() {
+    let conn = setup_test_db();
+    let result = seed_timeline(&conn, None, "", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert!(result.seed_entities.is_empty());
+    assert!(result.evidence_notes.is_empty());
+}
+
+#[tokio::test]
+async fn test_seed_no_matches_returns_empty() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 1);
+    insert_document(
+        &conn,
+        "issue",
+        issue_id,
+        project_id,
+        "unrelated content here",
+    );
+
+    let result = seed_timeline(&conn, None, "nonexistent_xyzzy_query", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert!(result.seed_entities.is_empty());
+}
+
+#[tokio::test]
+async fn test_seed_finds_issue() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 42);
+    insert_document(
+        &conn,
+        "issue",
+        issue_id,
+        project_id,
+        "authentication error in login flow",
+    );
+
+    let result = seed_timeline(&conn, None, "authentication", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert_eq!(result.seed_entities.len(), 1);
+    assert_eq!(result.seed_entities[0].entity_type, "issue");
+    assert_eq!(result.seed_entities[0].entity_iid, 42);
+    assert_eq!(result.seed_entities[0].project_path, "group/project");
+}
+
+#[tokio::test]
+async fn test_seed_finds_mr() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let mr_id = insert_test_mr(&conn, project_id, 99);
+    insert_document(
+        &conn,
+        "merge_request",
+        mr_id,
+        project_id,
+        "fix authentication bug",
+    );
+
+    let result = seed_timeline(&conn, None, "authentication", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert_eq!(result.seed_entities.len(), 1);
+    assert_eq!(result.seed_entities[0].entity_type, "merge_request");
+    assert_eq!(result.seed_entities[0].entity_iid, 99);
+}
+
+#[tokio::test]
+async fn test_seed_deduplicates_entities() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 10);
+
+    // Two documents referencing the same issue
+    insert_document(
+        &conn,
+        "issue",
+        issue_id,
+        project_id,
+        "authentication error first doc",
+    );
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+    insert_document(
+        &conn,
+        "discussion",
+        disc_id,
+        project_id,
+        "authentication error second doc",
+    );
+
+    let result = seed_timeline(&conn, None, "authentication", None, None, 50, 10)
+        .await
+        .unwrap();
+    // Should deduplicate: both map to the same issue
+    assert_eq!(result.seed_entities.len(), 1);
+    assert_eq!(result.seed_entities[0].entity_iid, 10);
+}
+
+#[tokio::test]
+async fn test_seed_resolves_discussion_to_parent() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 7);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+    insert_document(
+        &conn,
+        "discussion",
+        disc_id,
+        project_id,
+        "deployment pipeline failed",
+    );
+
+    let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert_eq!(result.seed_entities.len(), 1);
+    assert_eq!(result.seed_entities[0].entity_type, "issue");
+    assert_eq!(result.seed_entities[0].entity_iid, 7);
+}
+
+#[tokio::test]
+async fn test_seed_evidence_capped() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 1);
+
+    // Create 15 discussion documents with notes about "deployment"
+    for i in 0..15 {
+        let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+        insert_document(
+            &conn,
+            "discussion",
+            disc_id,
+            project_id,
+            &format!("deployment issue number {i}"),
+        );
+        insert_note(
+            &conn,
+            disc_id,
+            project_id,
+            &format!("deployment note {i}"),
+            false,
+        );
+    }
+
+    let result = seed_timeline(&conn, None, "deployment", None, None, 50, 5)
+        .await
+        .unwrap();
+    assert!(result.evidence_notes.len() <= 5);
+}
+
+#[tokio::test]
+async fn test_seed_evidence_snippet_truncated() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+    insert_document(
+        &conn,
+        "discussion",
+        disc_id,
+        project_id,
+        "deployment configuration",
+    );
+
+    let long_body = "x".repeat(500);
+    insert_note(&conn, disc_id, project_id, &long_body, false);
+
+    let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert!(!result.evidence_notes.is_empty());
+    if let TimelineEventType::NoteEvidence { snippet, .. } = &result.evidence_notes[0].event_type {
+        assert!(snippet.chars().count() <= 200);
+    } else {
+        panic!("Expected NoteEvidence");
+    }
+}
+
+#[tokio::test]
+async fn test_seed_respects_project_filter() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+
+    // Insert a second project
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (2, 'other/repo', 'https://gitlab.com/other/repo')",
+        [],
+    )
+    .unwrap();
+    let project2_id = conn.last_insert_rowid();
+
+    let issue1_id = insert_test_issue(&conn, project_id, 1);
+    insert_document(
+        &conn,
+        "issue",
+        issue1_id,
+        project_id,
+        "authentication error",
+    );
+
+    let issue2_id = insert_test_issue(&conn, project2_id, 2);
+    insert_document(
+        &conn,
+        "issue",
+        issue2_id,
+        project2_id,
+        "authentication error",
+    );
+
+    // Filter to project 1 only
+    let result = seed_timeline(
+        &conn,
+        None,
+        "authentication",
+        Some(project_id),
+        None,
+        50,
+        10,
+    )
+    .await
+    .unwrap();
+    assert_eq!(result.seed_entities.len(), 1);
+    assert_eq!(result.seed_entities[0].project_path, "group/project");
+}
+
+// ─── Matched discussion tests ───────────────────────────────────────────────
+
+#[tokio::test]
+async fn test_seed_captures_matched_discussions_from_discussion_doc() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+    insert_document(
+        &conn,
+        "discussion",
+        disc_id,
+        project_id,
+        "deployment pipeline authentication",
+    );
+
+    let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert_eq!(result.matched_discussions.len(), 1);
+    assert_eq!(result.matched_discussions[0].discussion_id, disc_id);
+    assert_eq!(result.matched_discussions[0].entity_type, "issue");
+    assert_eq!(result.matched_discussions[0].entity_id, issue_id);
+}
+
+#[tokio::test]
+async fn test_seed_captures_matched_discussions_from_note_doc() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+    let note_id = insert_note(&conn, disc_id, project_id, "note about deployment", false);
+    insert_document(
+        &conn,
+        "note",
+        note_id,
+        project_id,
+        "deployment configuration details",
+    );
+
+    let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert_eq!(
+        result.matched_discussions.len(),
+        1,
+        "Note doc should resolve to parent discussion"
+    );
+    assert_eq!(result.matched_discussions[0].discussion_id, disc_id);
+    assert_eq!(result.matched_discussions[0].entity_type, "issue");
+}
+
+#[tokio::test]
+async fn test_seed_deduplicates_matched_discussions() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 1);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    // Two docs referencing the same discussion
+    insert_document(
+        &conn,
+        "discussion",
+        disc_id,
+        project_id,
+        "deployment pipeline first doc",
+    );
+    let note_id = insert_note(&conn, disc_id, project_id, "deployment note", false);
+    insert_document(
+        &conn,
+        "note",
+        note_id,
+        project_id,
+        "deployment pipeline second doc",
+    );
+
+    let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert_eq!(
+        result.matched_discussions.len(),
+        1,
+        "Same discussion_id from two docs should deduplicate"
+    );
+}
+
+#[tokio::test]
+async fn test_seed_matched_discussions_have_correct_parent_entity() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let mr_id = insert_test_mr(&conn, project_id, 99);
+    let disc_id = insert_discussion(&conn, project_id, None, Some(mr_id));
+    insert_document(
+        &conn,
+        "discussion",
+        disc_id,
+        project_id,
+        "deployment pipeline for merge request",
+    );
+
+    let result = seed_timeline(&conn, None, "deployment", None, None, 50, 10)
+        .await
+        .unwrap();
+    assert_eq!(result.matched_discussions.len(), 1);
+    assert_eq!(result.matched_discussions[0].entity_type, "merge_request");
+    assert_eq!(result.matched_discussions[0].entity_id, mr_id);
+}
+
+// ─── seed_timeline_direct tests ─────────────────────────────────────────────
+
+#[test]
+fn test_direct_seed_resolves_entity() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    insert_test_issue(&conn, project_id, 42);
+
+    let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
+    assert_eq!(result.seed_entities.len(), 1);
+    assert_eq!(result.seed_entities[0].entity_type, "issue");
+    assert_eq!(result.seed_entities[0].entity_iid, 42);
+    assert_eq!(result.seed_entities[0].project_path, "group/project");
+}
+
+#[test]
+fn test_direct_seed_gathers_all_discussions() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 42);
+
+    // Create 3 discussions for this issue
+    let disc1 = insert_discussion(&conn, project_id, Some(issue_id), None);
+    let disc2 = insert_discussion(&conn, project_id, Some(issue_id), None);
+    let disc3 = insert_discussion(&conn, project_id, Some(issue_id), None);
+
+    let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
+    assert_eq!(result.matched_discussions.len(), 3);
+    let disc_ids: Vec<i64> = result
+        .matched_discussions
+        .iter()
+        .map(|d| d.discussion_id)
+        .collect();
+    assert!(disc_ids.contains(&disc1));
+    assert!(disc_ids.contains(&disc2));
+    assert!(disc_ids.contains(&disc3));
+}
+
+#[test]
+fn test_direct_seed_no_evidence_notes() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let issue_id = insert_test_issue(&conn, project_id, 42);
+    let disc_id = insert_discussion(&conn, project_id, Some(issue_id), None);
+    insert_note(&conn, disc_id, project_id, "some note body", false);
+
+    let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
+    assert!(
+        result.evidence_notes.is_empty(),
+        "Direct seeding should not produce evidence notes"
+    );
+}
+
+#[test]
+fn test_direct_seed_search_mode_is_direct() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    insert_test_issue(&conn, project_id, 42);
+
+    let result = seed_timeline_direct(&conn, "issue", 42, None).unwrap();
+    assert_eq!(result.search_mode, "direct");
+}
+
+#[test]
+fn test_direct_seed_not_found() {
+    let conn = setup_test_db();
+    insert_test_project(&conn);
+
+    let result = seed_timeline_direct(&conn, "issue", 999, None);
+    assert!(result.is_err());
+}
+
+#[test]
+fn test_direct_seed_mr() {
+    let conn = setup_test_db();
+    let project_id = insert_test_project(&conn);
+    let mr_id = insert_test_mr(&conn, project_id, 99);
+    let disc_id = insert_discussion(&conn, project_id, None, Some(mr_id));
+
+    let result = seed_timeline_direct(&conn, "merge_request", 99, None).unwrap();
+    assert_eq!(result.seed_entities.len(), 1);
+    assert_eq!(result.seed_entities[0].entity_type, "merge_request");
+    assert_eq!(result.seed_entities[0].entity_iid, 99);
+    assert_eq!(result.matched_discussions.len(), 1);
+    assert_eq!(result.matched_discussions[0].discussion_id, disc_id);
+}
--- a/src/documents/extractor.rs
+++ b/src/documents/extractor.rs
--- a/src/documents/extractor_tests.rs
+++ b/src/documents/extractor_tests.rs
--- a/src/documents/mod.rs
+++ b/src/documents/mod.rs
@@ -3,8 +3,9 @@ mod regenerator;
 mod truncation;

 pub use extractor::{
-    DocumentData, SourceType, compute_content_hash, compute_list_hash, extract_discussion_document,
-    extract_issue_document, extract_mr_document,
+    DocumentData, ParentMetadataCache, SourceType, compute_content_hash, compute_list_hash,
+    extract_discussion_document, extract_issue_document, extract_mr_document,
+    extract_note_document, extract_note_document_cached,
 };
 pub use regenerator::{RegenerateResult, regenerate_dirty_documents};
 pub use truncation::{
--- a/src/documents/regenerator.rs
+++ b/src/documents/regenerator.rs
@@ -4,8 +4,8 @@ use tracing::{debug, instrument, warn};

 use crate::core::error::Result;
 use crate::documents::{
-    DocumentData, SourceType, extract_discussion_document, extract_issue_document,
-    extract_mr_document,
+    DocumentData, ParentMetadataCache, SourceType, extract_discussion_document,
+    extract_issue_document, extract_mr_document, extract_note_document_cached,
 };
 use crate::ingestion::dirty_tracker::{clear_dirty, get_dirty_sources, record_dirty_error};

@@ -27,6 +27,7 @@ pub fn regenerate_dirty_documents(
    let mut result = RegenerateResult::default();

    let mut estimated_total: usize = 0;
+    let mut cache = ParentMetadataCache::new();

    loop {
        let dirty = get_dirty_sources(conn)?;
@@ -41,7 +42,7 @@ pub fn regenerate_dirty_documents(
        estimated_total = estimated_total.max(processed_so_far + remaining);

        for (source_type, source_id) in &dirty {
-            match regenerate_one(conn, *source_type, *source_id) {
+            match regenerate_one(conn, *source_type, *source_id, &mut cache) {
                Ok(changed) => {
                    if changed {
                        result.regenerated += 1;
@@ -83,11 +84,17 @@ pub fn regenerate_dirty_documents(
    Ok(result)
 }

-fn regenerate_one(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<bool> {
+fn regenerate_one(
+    conn: &Connection,
+    source_type: SourceType,
+    source_id: i64,
+    cache: &mut ParentMetadataCache,
+) -> Result<bool> {
    let doc = match source_type {
        SourceType::Issue => extract_issue_document(conn, source_id)?,
        SourceType::MergeRequest => extract_mr_document(conn, source_id)?,
        SourceType::Discussion => extract_discussion_document(conn, source_id)?,
+        SourceType::Note => extract_note_document_cached(conn, source_id, cache)?,
    };

    let Some(doc) = doc else {
@@ -122,11 +129,7 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<bool>
        )
        .optional()?;

-    let content_changed = match &existing {
-        Some((_, old_content_hash, _, _)) => old_content_hash != &doc.content_hash,
-        None => true,
-    };
-
+    // Fast path: if all three hashes match, nothing changed at all.
    if let Some((_, ref old_content_hash, ref old_labels_hash, ref old_paths_hash)) = existing
        && old_content_hash == &doc.content_hash
        && old_labels_hash == &doc.labels_hash
@@ -134,6 +137,7 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<bool>
    {
        return Ok(false);
    }
+    // Past this point at least one hash differs, so the document will be updated.

    let labels_json = serde_json::to_string(&doc.labels).unwrap_or_else(|_| "[]".to_string());

@@ -243,7 +247,8 @@ fn upsert_document_inner(conn: &Connection, doc: &DocumentData) -> Result<bool>
        }
    }

-    Ok(content_changed)
+    // We passed the triple-hash fast path, so at least one hash differs.
+    Ok(true)
 }

 fn delete_document(conn: &Connection, source_type: SourceType, source_id: i64) -> Result<()> {
@@ -264,213 +269,5 @@ fn get_document_id(conn: &Connection, source_type: SourceType, source_id: i64) -
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::ingestion::dirty_tracker::mark_dirty;
-
-    fn setup_db() -> Connection {
-        let conn = Connection::open_in_memory().unwrap();
-        conn.execute_batch("
-            CREATE TABLE projects (
-                id INTEGER PRIMARY KEY,
-                gitlab_project_id INTEGER UNIQUE NOT NULL,
-                path_with_namespace TEXT NOT NULL,
-                default_branch TEXT,
-                web_url TEXT,
-                created_at INTEGER,
-                updated_at INTEGER,
-                raw_payload_id INTEGER
-            );
-            INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project');
-
-            CREATE TABLE issues (
-                id INTEGER PRIMARY KEY,
-                gitlab_id INTEGER UNIQUE NOT NULL,
-                project_id INTEGER NOT NULL REFERENCES projects(id),
-                iid INTEGER NOT NULL,
-                title TEXT,
-                description TEXT,
-                state TEXT NOT NULL,
-                author_username TEXT,
-                created_at INTEGER NOT NULL,
-                updated_at INTEGER NOT NULL,
-                last_seen_at INTEGER NOT NULL,
-                discussions_synced_for_updated_at INTEGER,
-                resource_events_synced_for_updated_at INTEGER,
-                web_url TEXT,
-                raw_payload_id INTEGER
-            );
-            CREATE TABLE labels (
-                id INTEGER PRIMARY KEY,
-                gitlab_id INTEGER,
-                project_id INTEGER NOT NULL REFERENCES projects(id),
-                name TEXT NOT NULL,
-                color TEXT,
-                description TEXT
-            );
-            CREATE TABLE issue_labels (
-                issue_id INTEGER NOT NULL REFERENCES issues(id),
-                label_id INTEGER NOT NULL REFERENCES labels(id),
-                PRIMARY KEY(issue_id, label_id)
-            );
-
-            CREATE TABLE documents (
-                id INTEGER PRIMARY KEY,
-                source_type TEXT NOT NULL,
-                source_id INTEGER NOT NULL,
-                project_id INTEGER NOT NULL,
-                author_username TEXT,
-                label_names TEXT,
-                created_at INTEGER,
-                updated_at INTEGER,
-                url TEXT,
-                title TEXT,
-                content_text TEXT NOT NULL,
-                content_hash TEXT NOT NULL,
-                labels_hash TEXT NOT NULL DEFAULT '',
-                paths_hash TEXT NOT NULL DEFAULT '',
-                is_truncated INTEGER NOT NULL DEFAULT 0,
-                truncated_reason TEXT,
-                UNIQUE(source_type, source_id)
-            );
-            CREATE TABLE document_labels (
-                document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
-                label_name TEXT NOT NULL,
-                PRIMARY KEY(document_id, label_name)
-            );
-            CREATE TABLE document_paths (
-                document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
-                path TEXT NOT NULL,
-                PRIMARY KEY(document_id, path)
-            );
-            CREATE TABLE dirty_sources (
-                source_type TEXT NOT NULL,
-                source_id INTEGER NOT NULL,
-                queued_at INTEGER NOT NULL,
-                attempt_count INTEGER NOT NULL DEFAULT 0,
-                last_attempt_at INTEGER,
-                last_error TEXT,
-                next_attempt_at INTEGER,
-                PRIMARY KEY(source_type, source_id)
-            );
-            CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
-        ").unwrap();
-        conn
-    }
-
-    #[test]
-    fn test_regenerate_creates_document() {
-        let conn = setup_db();
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, description, state, author_username, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test Issue', 'Description here', 'opened', 'alice', 1000, 2000, 3000)",
-            [],
-        ).unwrap();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-
-        let result = regenerate_dirty_documents(&conn, None).unwrap();
-        assert_eq!(result.regenerated, 1);
-        assert_eq!(result.unchanged, 0);
-        assert_eq!(result.errored, 0);
-
-        let count: i64 = conn
-            .query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0))
-            .unwrap();
-        assert_eq!(count, 1);
-
-        let content: String = conn
-            .query_row("SELECT content_text FROM documents", [], |r| r.get(0))
-            .unwrap();
-        assert!(content.contains("[[Issue]] #42: Test Issue"));
-    }
-
-    #[test]
-    fn test_regenerate_unchanged() {
-        let conn = setup_db();
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, description, state, author_username, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test', 'Desc', 'opened', 'alice', 1000, 2000, 3000)",
-            [],
-        ).unwrap();
-
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        let r1 = regenerate_dirty_documents(&conn, None).unwrap();
-        assert_eq!(r1.regenerated, 1);
-
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        let r2 = regenerate_dirty_documents(&conn, None).unwrap();
-        assert_eq!(r2.unchanged, 1);
-        assert_eq!(r2.regenerated, 0);
-    }
-
-    #[test]
-    fn test_regenerate_deleted_source() {
-        let conn = setup_db();
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test', 'opened', 1000, 2000, 3000)",
-            [],
-        ).unwrap();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        regenerate_dirty_documents(&conn, None).unwrap();
-
-        conn.execute("PRAGMA foreign_keys = OFF", []).unwrap();
-        conn.execute("DELETE FROM issues WHERE id = 1", []).unwrap();
-        conn.execute("PRAGMA foreign_keys = ON", []).unwrap();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-
-        let result = regenerate_dirty_documents(&conn, None).unwrap();
-        assert_eq!(result.regenerated, 1);
-
-        let count: i64 = conn
-            .query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0))
-            .unwrap();
-        assert_eq!(count, 0);
-    }
-
-    #[test]
-    fn test_regenerate_drains_queue() {
-        let conn = setup_db();
-        for i in 1..=10 {
-            conn.execute(
-                "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (?1, ?2, 1, ?1, 'Test', 'opened', 1000, 2000, 3000)",
-                rusqlite::params![i, i * 10],
-            ).unwrap();
-            mark_dirty(&conn, SourceType::Issue, i).unwrap();
-        }
-
-        let result = regenerate_dirty_documents(&conn, None).unwrap();
-        assert_eq!(result.regenerated, 10);
-
-        let dirty = get_dirty_sources(&conn).unwrap();
-        assert!(dirty.is_empty());
-    }
-
-    #[test]
-    fn test_triple_hash_fast_path() {
-        let conn = setup_db();
-        conn.execute(
-            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test', 'opened', 1000, 2000, 3000)",
-            [],
-        ).unwrap();
-        conn.execute(
-            "INSERT INTO labels (id, project_id, name) VALUES (1, 1, 'bug')",
-            [],
-        )
-        .unwrap();
-        conn.execute(
-            "INSERT INTO issue_labels (issue_id, label_id) VALUES (1, 1)",
-            [],
-        )
-        .unwrap();
-
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        regenerate_dirty_documents(&conn, None).unwrap();
-
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        let result = regenerate_dirty_documents(&conn, None).unwrap();
-        assert_eq!(result.unchanged, 1);
-
-        let label_count: i64 = conn
-            .query_row("SELECT COUNT(*) FROM document_labels", [], |r| r.get(0))
-            .unwrap();
-        assert_eq!(label_count, 1);
-    }
-}
+#[path = "regenerator_tests.rs"]
+mod tests;
--- a/src/documents/regenerator_tests.rs
+++ b/src/documents/regenerator_tests.rs
@@ -0,0 +1,520 @@
+use super::*;
+use crate::ingestion::dirty_tracker::mark_dirty;
+
+fn setup_db() -> Connection {
+    let conn = Connection::open_in_memory().unwrap();
+    conn.execute_batch("
+        CREATE TABLE projects (
+            id INTEGER PRIMARY KEY,
+            gitlab_project_id INTEGER UNIQUE NOT NULL,
+            path_with_namespace TEXT NOT NULL,
+            default_branch TEXT,
+            web_url TEXT,
+            created_at INTEGER,
+            updated_at INTEGER,
+            raw_payload_id INTEGER
+        );
+        INSERT INTO projects (id, gitlab_project_id, path_with_namespace) VALUES (1, 100, 'group/project');
+
+        CREATE TABLE issues (
+            id INTEGER PRIMARY KEY,
+            gitlab_id INTEGER UNIQUE NOT NULL,
+            project_id INTEGER NOT NULL REFERENCES projects(id),
+            iid INTEGER NOT NULL,
+            title TEXT,
+            description TEXT,
+            state TEXT NOT NULL,
+            author_username TEXT,
+            created_at INTEGER NOT NULL,
+            updated_at INTEGER NOT NULL,
+            last_seen_at INTEGER NOT NULL,
+            discussions_synced_for_updated_at INTEGER,
+            resource_events_synced_for_updated_at INTEGER,
+            web_url TEXT,
+            raw_payload_id INTEGER
+        );
+        CREATE TABLE labels (
+            id INTEGER PRIMARY KEY,
+            gitlab_id INTEGER,
+            project_id INTEGER NOT NULL REFERENCES projects(id),
+            name TEXT NOT NULL,
+            color TEXT,
+            description TEXT
+        );
+        CREATE TABLE issue_labels (
+            issue_id INTEGER NOT NULL REFERENCES issues(id),
+            label_id INTEGER NOT NULL REFERENCES labels(id),
+            PRIMARY KEY(issue_id, label_id)
+        );
+
+        CREATE TABLE documents (
+            id INTEGER PRIMARY KEY,
+            source_type TEXT NOT NULL,
+            source_id INTEGER NOT NULL,
+            project_id INTEGER NOT NULL,
+            author_username TEXT,
+            label_names TEXT,
+            created_at INTEGER,
+            updated_at INTEGER,
+            url TEXT,
+            title TEXT,
+            content_text TEXT NOT NULL,
+            content_hash TEXT NOT NULL,
+            labels_hash TEXT NOT NULL DEFAULT '',
+            paths_hash TEXT NOT NULL DEFAULT '',
+            is_truncated INTEGER NOT NULL DEFAULT 0,
+            truncated_reason TEXT,
+            UNIQUE(source_type, source_id)
+        );
+        CREATE TABLE document_labels (
+            document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+            label_name TEXT NOT NULL,
+            PRIMARY KEY(document_id, label_name)
+        );
+        CREATE TABLE document_paths (
+            document_id INTEGER NOT NULL REFERENCES documents(id) ON DELETE CASCADE,
+            path TEXT NOT NULL,
+            PRIMARY KEY(document_id, path)
+        );
+        CREATE TABLE dirty_sources (
+            source_type TEXT NOT NULL,
+            source_id INTEGER NOT NULL,
+            queued_at INTEGER NOT NULL,
+            attempt_count INTEGER NOT NULL DEFAULT 0,
+            last_attempt_at INTEGER,
+            last_error TEXT,
+            next_attempt_at INTEGER,
+            PRIMARY KEY(source_type, source_id)
+        );
+        CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
+    ").unwrap();
+    conn
+}
+
+#[test]
+fn test_regenerate_creates_document() {
+    let conn = setup_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, description, state, author_username, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test Issue', 'Description here', 'opened', 'alice', 1000, 2000, 3000)",
+        [],
+    ).unwrap();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.regenerated, 1);
+    assert_eq!(result.unchanged, 0);
+    assert_eq!(result.errored, 0);
+
+    let count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0))
+        .unwrap();
+    assert_eq!(count, 1);
+
+    let content: String = conn
+        .query_row("SELECT content_text FROM documents", [], |r| r.get(0))
+        .unwrap();
+    assert!(content.contains("[[Issue]] #42: Test Issue"));
+}
+
+#[test]
+fn test_regenerate_unchanged() {
+    let conn = setup_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, description, state, author_username, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test', 'Desc', 'opened', 'alice', 1000, 2000, 3000)",
+        [],
+    ).unwrap();
+
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    let r1 = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(r1.regenerated, 1);
+
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    let r2 = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(r2.unchanged, 1);
+    assert_eq!(r2.regenerated, 0);
+}
+
+#[test]
+fn test_regenerate_deleted_source() {
+    let conn = setup_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test', 'opened', 1000, 2000, 3000)",
+        [],
+    ).unwrap();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    regenerate_dirty_documents(&conn, None).unwrap();
+
+    conn.execute("PRAGMA foreign_keys = OFF", []).unwrap();
+    conn.execute("DELETE FROM issues WHERE id = 1", []).unwrap();
+    conn.execute("PRAGMA foreign_keys = ON", []).unwrap();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.regenerated, 1);
+
+    let count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM documents", [], |r| r.get(0))
+        .unwrap();
+    assert_eq!(count, 0);
+}
+
+#[test]
+fn test_regenerate_drains_queue() {
+    let conn = setup_db();
+    for i in 1..=10 {
+        conn.execute(
+            "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (?1, ?2, 1, ?1, 'Test', 'opened', 1000, 2000, 3000)",
+            rusqlite::params![i, i * 10],
+        ).unwrap();
+        mark_dirty(&conn, SourceType::Issue, i).unwrap();
+    }
+
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.regenerated, 10);
+
+    let dirty = get_dirty_sources(&conn).unwrap();
+    assert!(dirty.is_empty());
+}
+
+#[test]
+fn test_triple_hash_fast_path() {
+    let conn = setup_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test', 'opened', 1000, 2000, 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO labels (id, project_id, name) VALUES (1, 1, 'bug')",
+        [],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO issue_labels (issue_id, label_id) VALUES (1, 1)",
+        [],
+    )
+    .unwrap();
+
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    regenerate_dirty_documents(&conn, None).unwrap();
+
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.unchanged, 1);
+
+    let label_count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM document_labels", [], |r| r.get(0))
+        .unwrap();
+    assert_eq!(label_count, 1);
+}
+
+fn setup_note_db() -> Connection {
+    let conn = setup_db();
+    conn.execute_batch(
+        "
+        CREATE TABLE merge_requests (
+            id INTEGER PRIMARY KEY,
+            gitlab_id INTEGER UNIQUE NOT NULL,
+            project_id INTEGER NOT NULL REFERENCES projects(id),
+            iid INTEGER NOT NULL,
+            title TEXT,
+            description TEXT,
+            state TEXT,
+            draft INTEGER NOT NULL DEFAULT 0,
+            author_username TEXT,
+            source_branch TEXT,
+            target_branch TEXT,
+            head_sha TEXT,
+            references_short TEXT,
+            references_full TEXT,
+            detailed_merge_status TEXT,
+            merge_user_username TEXT,
+            created_at INTEGER,
+            updated_at INTEGER,
+            merged_at INTEGER,
+            closed_at INTEGER,
+            last_seen_at INTEGER NOT NULL,
+            discussions_synced_for_updated_at INTEGER,
+            discussions_sync_last_attempt_at INTEGER,
+            discussions_sync_attempts INTEGER DEFAULT 0,
+            discussions_sync_last_error TEXT,
+            resource_events_synced_for_updated_at INTEGER,
+            web_url TEXT,
+            raw_payload_id INTEGER
+        );
+        CREATE TABLE mr_labels (
+            merge_request_id INTEGER REFERENCES merge_requests(id),
+            label_id INTEGER REFERENCES labels(id),
+            PRIMARY KEY(merge_request_id, label_id)
+        );
+        CREATE TABLE discussions (
+            id INTEGER PRIMARY KEY,
+            gitlab_discussion_id TEXT NOT NULL,
+            project_id INTEGER NOT NULL REFERENCES projects(id),
+            issue_id INTEGER REFERENCES issues(id),
+            merge_request_id INTEGER,
+            noteable_type TEXT NOT NULL,
+            individual_note INTEGER NOT NULL DEFAULT 0,
+            first_note_at INTEGER,
+            last_note_at INTEGER,
+            last_seen_at INTEGER NOT NULL,
+            resolvable INTEGER NOT NULL DEFAULT 0,
+            resolved INTEGER NOT NULL DEFAULT 0
+        );
+        CREATE TABLE notes (
+            id INTEGER PRIMARY KEY,
+            gitlab_id INTEGER UNIQUE NOT NULL,
+            discussion_id INTEGER NOT NULL REFERENCES discussions(id),
+            project_id INTEGER NOT NULL REFERENCES projects(id),
+            note_type TEXT,
+            is_system INTEGER NOT NULL DEFAULT 0,
+            author_username TEXT,
+            body TEXT,
+            created_at INTEGER NOT NULL,
+            updated_at INTEGER NOT NULL,
+            last_seen_at INTEGER NOT NULL,
+            position INTEGER,
+            resolvable INTEGER NOT NULL DEFAULT 0,
+            resolved INTEGER NOT NULL DEFAULT 0,
+            resolved_by TEXT,
+            resolved_at INTEGER,
+            position_old_path TEXT,
+            position_new_path TEXT,
+            position_old_line INTEGER,
+            position_new_line INTEGER,
+            raw_payload_id INTEGER
+        );
+    ",
+    )
+    .unwrap();
+    conn
+}
+
+#[test]
+fn test_regenerate_note_document() {
+    let conn = setup_note_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, web_url) VALUES (1, 10, 1, 42, 'Test Issue', 'opened', 'alice', 1000, 2000, 3000, 'https://example.com/issues/42')",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (1, 100, 1, 1, 'bob', 'This is a note', 1000, 2000, 3000, 0)",
+        [],
+    ).unwrap();
+
+    mark_dirty(&conn, SourceType::Note, 1).unwrap();
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.regenerated, 1);
+    assert_eq!(result.unchanged, 0);
+    assert_eq!(result.errored, 0);
+
+    let (source_type, content): (String, String) = conn
+        .query_row(
+            "SELECT source_type, content_text FROM documents WHERE source_id = 1",
+            [],
+            |r| Ok((r.get(0)?, r.get(1)?)),
+        )
+        .unwrap();
+    assert_eq!(source_type, "note");
+    assert!(content.contains("[[Note]]"));
+    assert!(content.contains("author: @bob"));
+}
+
+#[test]
+fn test_regenerate_note_system_note_deletes() {
+    let conn = setup_note_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at) VALUES (1, 10, 1, 42, 'Test', 'opened', 1000, 2000, 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (1, 100, 1, 1, 'bot', 'assigned to @alice', 1000, 2000, 3000, 1)",
+        [],
+    ).unwrap();
+
+    // Pre-insert a document for this note (simulating a previously-generated doc)
+    conn.execute(
+        "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) VALUES ('note', 1, 1, 'old content', 'oldhash')",
+        [],
+    ).unwrap();
+
+    mark_dirty(&conn, SourceType::Note, 1).unwrap();
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.regenerated, 1);
+
+    let count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note'",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert_eq!(count, 0);
+}
+
+#[test]
+fn test_regenerate_note_unchanged() {
+    let conn = setup_note_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at, web_url) VALUES (1, 10, 1, 42, 'Test', 'opened', 1000, 2000, 3000, 'https://example.com/issues/42')",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (1, 100, 1, 1, 'bob', 'Some note', 1000, 2000, 3000, 0)",
+        [],
+    ).unwrap();
+
+    mark_dirty(&conn, SourceType::Note, 1).unwrap();
+    let r1 = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(r1.regenerated, 1);
+
+    mark_dirty(&conn, SourceType::Note, 1).unwrap();
+    let r2 = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(r2.unchanged, 1);
+    assert_eq!(r2.regenerated, 0);
+}
+
+#[test]
+fn test_note_regeneration_batch_uses_cache() {
+    let conn = setup_note_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, web_url) VALUES (1, 10, 1, 42, 'Shared Issue', 'opened', 'alice', 1000, 2000, 3000, 'https://example.com/issues/42')",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
+        [],
+    ).unwrap();
+
+    for i in 1..=10 {
+        conn.execute(
+            "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (?1, ?2, 1, 1, 'bob', ?3, 1000, 2000, 3000, 0)",
+            rusqlite::params![i, i * 100, format!("Note body {}", i)],
+        ).unwrap();
+        mark_dirty(&conn, SourceType::Note, i).unwrap();
+    }
+
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.regenerated, 10);
+    assert_eq!(result.errored, 0);
+
+    let count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note'",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert_eq!(count, 10);
+}
+
+#[test]
+fn test_note_regeneration_cache_consistent_with_direct_extraction() {
+    let conn = setup_note_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, author_username, created_at, updated_at, last_seen_at, web_url) VALUES (1, 10, 1, 42, 'Consistency Check', 'opened', 'alice', 1000, 2000, 3000, 'https://example.com/issues/42')",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO labels (id, project_id, name) VALUES (1, 1, 'backend')",
+        [],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO issue_labels (issue_id, label_id) VALUES (1, 1)",
+        [],
+    )
+    .unwrap();
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (1, 100, 1, 1, 'bob', 'Some content', 1000, 2000, 3000, 0)",
+        [],
+    ).unwrap();
+
+    use crate::documents::extract_note_document;
+    let direct = extract_note_document(&conn, 1).unwrap().unwrap();
+
+    let mut cache = ParentMetadataCache::new();
+    let cached = extract_note_document_cached(&conn, 1, &mut cache)
+        .unwrap()
+        .unwrap();
+
+    assert_eq!(direct.content_text, cached.content_text);
+    assert_eq!(direct.content_hash, cached.content_hash);
+    assert_eq!(direct.labels, cached.labels);
+    assert_eq!(direct.labels_hash, cached.labels_hash);
+    assert_eq!(direct.paths_hash, cached.paths_hash);
+    assert_eq!(direct.title, cached.title);
+    assert_eq!(direct.url, cached.url);
+    assert_eq!(direct.author_username, cached.author_username);
+}
+
+#[test]
+fn test_note_regeneration_cache_invalidates_across_parents() {
+    let conn = setup_note_db();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at, web_url) VALUES (1, 10, 1, 42, 'Issue Alpha', 'opened', 1000, 2000, 3000, 'https://example.com/issues/42')",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO issues (id, gitlab_id, project_id, iid, title, state, created_at, updated_at, last_seen_at, web_url) VALUES (2, 20, 1, 99, 'Issue Beta', 'opened', 1000, 2000, 3000, 'https://example.com/issues/99')",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (1, 'disc_1', 1, 1, 'Issue', 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO discussions (id, gitlab_discussion_id, project_id, issue_id, noteable_type, last_seen_at) VALUES (2, 'disc_2', 1, 2, 'Issue', 3000)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (1, 100, 1, 1, 'bob', 'Alpha note', 1000, 2000, 3000, 0)",
+        [],
+    ).unwrap();
+    conn.execute(
+        "INSERT INTO notes (id, gitlab_id, discussion_id, project_id, author_username, body, created_at, updated_at, last_seen_at, is_system) VALUES (2, 200, 2, 1, 'alice', 'Beta note', 1000, 2000, 3000, 0)",
+        [],
+    ).unwrap();
+
+    mark_dirty(&conn, SourceType::Note, 1).unwrap();
+    mark_dirty(&conn, SourceType::Note, 2).unwrap();
+
+    let result = regenerate_dirty_documents(&conn, None).unwrap();
+    assert_eq!(result.regenerated, 2);
+    assert_eq!(result.errored, 0);
+
+    let alpha_content: String = conn
+        .query_row(
+            "SELECT content_text FROM documents WHERE source_type = 'note' AND source_id = 1",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    let beta_content: String = conn
+        .query_row(
+            "SELECT content_text FROM documents WHERE source_type = 'note' AND source_id = 2",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+
+    assert!(alpha_content.contains("parent_iid: 42"));
+    assert!(alpha_content.contains("parent_title: Issue Alpha"));
+    assert!(beta_content.contains("parent_iid: 99"));
+    assert!(beta_content.contains("parent_title: Issue Beta"));
+}
--- a/src/embedding/change_detector.rs
+++ b/src/embedding/change_detector.rs
@@ -85,146 +85,5 @@ pub fn count_pending_documents(conn: &Connection, model_name: &str) -> Result<i6
 }

 #[cfg(test)]
-mod tests {
-    use std::path::Path;
-
-    use super::*;
-    use crate::core::db::{create_connection, run_migrations};
-    use crate::embedding::pipeline::record_embedding_error;
-
-    const MODEL: &str = "nomic-embed-text";
-
-    fn setup_db() -> Connection {
-        let conn = create_connection(Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-        conn
-    }
-
-    fn insert_test_project(conn: &Connection) -> i64 {
-        conn.execute(
-            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url)
-             VALUES (1, 'group/test', 'https://gitlab.example.com/group/test')",
-            [],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    fn insert_test_document(conn: &Connection, project_id: i64, content: &str) -> i64 {
-        conn.execute(
-            "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash)
-             VALUES ('issue', 1, ?1, ?2, 'hash123')",
-            rusqlite::params![project_id, content],
-        )
-        .unwrap();
-        conn.last_insert_rowid()
-    }
-
-    #[test]
-    fn retry_failed_delete_makes_doc_pending_again() {
-        let conn = setup_db();
-        let proj_id = insert_test_project(&conn);
-        let doc_id = insert_test_document(&conn, proj_id, "some text content");
-
-        // Doc starts as pending
-        let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
-        assert_eq!(pending.len(), 1, "Doc should be pending initially");
-
-        // Record an error — doc should no longer be pending
-        record_embedding_error(
-            &conn,
-            doc_id,
-            0,
-            "hash123",
-            "chunkhash",
-            MODEL,
-            "test error",
-        )
-        .unwrap();
-        let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
-        assert!(
-            pending.is_empty(),
-            "Doc with error metadata should not be pending"
-        );
-
-        // DELETE error rows (mimicking --retry-failed) — doc should become pending again
-        conn.execute_batch(
-            "DELETE FROM embeddings WHERE rowid / 1000 IN (
-               SELECT DISTINCT document_id FROM embedding_metadata
-               WHERE last_error IS NOT NULL
-             );
-             DELETE FROM embedding_metadata WHERE last_error IS NOT NULL;",
-        )
-        .unwrap();
-        let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
-        assert_eq!(pending.len(), 1, "Doc should be pending again after DELETE");
-        assert_eq!(pending[0].document_id, doc_id);
-    }
-
-    #[test]
-    fn empty_doc_with_error_not_pending() {
-        let conn = setup_db();
-        let proj_id = insert_test_project(&conn);
-        let doc_id = insert_test_document(&conn, proj_id, "");
-
-        // Empty doc starts as pending
-        let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
-        assert_eq!(pending.len(), 1, "Empty doc should be pending initially");
-
-        // Record an error for the empty doc
-        record_embedding_error(
-            &conn,
-            doc_id,
-            0,
-            "hash123",
-            "empty",
-            MODEL,
-            "Document has empty content",
-        )
-        .unwrap();
-
-        // Should no longer be pending
-        let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
-        assert!(
-            pending.is_empty(),
-            "Empty doc with error metadata should not be pending"
-        );
-    }
-
-    #[test]
-    fn old_update_approach_leaves_doc_invisible() {
-        // This test demonstrates WHY we use DELETE instead of UPDATE.
-        // UPDATE clears last_error but the row still matches config params,
-        // so the doc stays "not pending" — permanently invisible.
-        let conn = setup_db();
-        let proj_id = insert_test_project(&conn);
-        let doc_id = insert_test_document(&conn, proj_id, "some text content");
-
-        // Record an error
-        record_embedding_error(
-            &conn,
-            doc_id,
-            0,
-            "hash123",
-            "chunkhash",
-            MODEL,
-            "test error",
-        )
-        .unwrap();
-
-        // Old approach: UPDATE to clear error
-        conn.execute(
-            "UPDATE embedding_metadata SET last_error = NULL, attempt_count = 0
-             WHERE last_error IS NOT NULL",
-            [],
-        )
-        .unwrap();
-
-        // Doc is NOT pending — it's permanently invisible! This is the bug.
-        let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
-        assert!(
-            pending.is_empty(),
-            "UPDATE approach leaves doc invisible (this proves the bug)"
-        );
-    }
-}
+#[path = "change_detector_tests.rs"]
+mod tests;
--- a/src/embedding/change_detector_tests.rs
+++ b/src/embedding/change_detector_tests.rs
@@ -0,0 +1,141 @@
+use std::path::Path;
+
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use crate::embedding::pipeline::record_embedding_error;
+
+const MODEL: &str = "nomic-embed-text";
+
+fn setup_db() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+    conn
+}
+
+fn insert_test_project(conn: &Connection) -> i64 {
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url)
+         VALUES (1, 'group/test', 'https://gitlab.example.com/group/test')",
+        [],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+fn insert_test_document(conn: &Connection, project_id: i64, content: &str) -> i64 {
+    conn.execute(
+        "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash)
+         VALUES ('issue', 1, ?1, ?2, 'hash123')",
+        rusqlite::params![project_id, content],
+    )
+    .unwrap();
+    conn.last_insert_rowid()
+}
+
+#[test]
+fn retry_failed_delete_makes_doc_pending_again() {
+    let conn = setup_db();
+    let proj_id = insert_test_project(&conn);
+    let doc_id = insert_test_document(&conn, proj_id, "some text content");
+
+    // Doc starts as pending
+    let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
+    assert_eq!(pending.len(), 1, "Doc should be pending initially");
+
+    // Record an error — doc should no longer be pending
+    record_embedding_error(
+        &conn,
+        doc_id,
+        0,
+        "hash123",
+        "chunkhash",
+        MODEL,
+        "test error",
+    )
+    .unwrap();
+    let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
+    assert!(
+        pending.is_empty(),
+        "Doc with error metadata should not be pending"
+    );
+
+    // DELETE error rows (mimicking --retry-failed) — doc should become pending again
+    conn.execute_batch(
+        "DELETE FROM embeddings WHERE rowid / 1000 IN (
+           SELECT DISTINCT document_id FROM embedding_metadata
+           WHERE last_error IS NOT NULL
+         );
+         DELETE FROM embedding_metadata WHERE last_error IS NOT NULL;",
+    )
+    .unwrap();
+    let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
+    assert_eq!(pending.len(), 1, "Doc should be pending again after DELETE");
+    assert_eq!(pending[0].document_id, doc_id);
+}
+
+#[test]
+fn empty_doc_with_error_not_pending() {
+    let conn = setup_db();
+    let proj_id = insert_test_project(&conn);
+    let doc_id = insert_test_document(&conn, proj_id, "");
+
+    // Empty doc starts as pending
+    let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
+    assert_eq!(pending.len(), 1, "Empty doc should be pending initially");
+
+    // Record an error for the empty doc
+    record_embedding_error(
+        &conn,
+        doc_id,
+        0,
+        "hash123",
+        "empty",
+        MODEL,
+        "Document has empty content",
+    )
+    .unwrap();
+
+    // Should no longer be pending
+    let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
+    assert!(
+        pending.is_empty(),
+        "Empty doc with error metadata should not be pending"
+    );
+}
+
+#[test]
+fn old_update_approach_leaves_doc_invisible() {
+    // This test demonstrates WHY we use DELETE instead of UPDATE.
+    // UPDATE clears last_error but the row still matches config params,
+    // so the doc stays "not pending" — permanently invisible.
+    let conn = setup_db();
+    let proj_id = insert_test_project(&conn);
+    let doc_id = insert_test_document(&conn, proj_id, "some text content");
+
+    // Record an error
+    record_embedding_error(
+        &conn,
+        doc_id,
+        0,
+        "hash123",
+        "chunkhash",
+        MODEL,
+        "test error",
+    )
+    .unwrap();
+
+    // Old approach: UPDATE to clear error
+    conn.execute(
+        "UPDATE embedding_metadata SET last_error = NULL, attempt_count = 0
+         WHERE last_error IS NOT NULL",
+        [],
+    )
+    .unwrap();
+
+    // Doc is NOT pending — it's permanently invisible! This is the bug.
+    let pending = find_pending_documents(&conn, 100, 0, MODEL).unwrap();
+    assert!(
+        pending.is_empty(),
+        "UPDATE approach leaves doc invisible (this proves the bug)"
+    );
+}
--- a/src/embedding/chunking.rs
+++ b/src/embedding/chunking.rs
@@ -103,231 +103,5 @@ fn floor_char_boundary(s: &str, idx: usize) -> usize {
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn test_empty_content() {
-        let chunks = split_into_chunks("");
-        assert!(chunks.is_empty());
-    }
-
-    #[test]
-    fn test_short_document_single_chunk() {
-        let content = "Short document content.";
-        let chunks = split_into_chunks(content);
-        assert_eq!(chunks.len(), 1);
-        assert_eq!(chunks[0].0, 0);
-        assert_eq!(chunks[0].1, content);
-    }
-
-    #[test]
-    fn test_exactly_max_chars() {
-        let content = "a".repeat(CHUNK_MAX_BYTES);
-        let chunks = split_into_chunks(&content);
-        assert_eq!(chunks.len(), 1);
-    }
-
-    #[test]
-    fn test_long_document_multiple_chunks() {
-        let paragraph = "This is a paragraph of text.\n\n";
-        let mut content = String::new();
-        while content.len() < CHUNK_MAX_BYTES * 2 {
-            content.push_str(paragraph);
-        }
-
-        let chunks = split_into_chunks(&content);
-        assert!(
-            chunks.len() >= 2,
-            "Expected multiple chunks, got {}",
-            chunks.len()
-        );
-
-        for (i, (idx, _)) in chunks.iter().enumerate() {
-            assert_eq!(*idx, i);
-        }
-
-        assert!(!chunks.last().unwrap().1.is_empty());
-    }
-
-    #[test]
-    fn test_chunk_overlap() {
-        let paragraph = "This is paragraph content for testing chunk overlap behavior.\n\n";
-        let mut content = String::new();
-        while content.len() < CHUNK_MAX_BYTES + CHUNK_OVERLAP_CHARS + 1000 {
-            content.push_str(paragraph);
-        }
-
-        let chunks = split_into_chunks(&content);
-        assert!(chunks.len() >= 2);
-
-        if chunks.len() >= 2 {
-            let end_of_first = &chunks[0].1;
-            let start_of_second = &chunks[1].1;
-            let overlap_region =
-                &end_of_first[end_of_first.len().saturating_sub(CHUNK_OVERLAP_CHARS)..];
-            assert!(
-                start_of_second.starts_with(overlap_region)
-                    || overlap_region.contains(&start_of_second[..100.min(start_of_second.len())]),
-                "Expected overlap between chunks"
-            );
-        }
-    }
-
-    #[test]
-    fn test_no_paragraph_boundary() {
-        let content = "word ".repeat(CHUNK_MAX_BYTES / 5 * 3);
-        let chunks = split_into_chunks(&content);
-        assert!(chunks.len() >= 2);
-        for (_, chunk) in &chunks {
-            assert!(!chunk.is_empty());
-        }
-    }
-
-    #[test]
-    fn test_chunk_indices_sequential() {
-        let content = "a ".repeat(CHUNK_MAX_BYTES);
-        let chunks = split_into_chunks(&content);
-        for (i, (idx, _)) in chunks.iter().enumerate() {
-            assert_eq!(*idx, i, "Chunk index mismatch at position {}", i);
-        }
-    }
-
-    #[test]
-    fn test_multibyte_characters_no_panic() {
-        // Build content with multi-byte UTF-8 chars (smart quotes, emoji, CJK)
-        // placed at positions likely to hit len()*2/3 and len()/2 boundaries
-        let segment = "We\u{2019}ve gradually ar\u{2014}ranged the components. ";
-        let mut content = String::new();
-        while content.len() < CHUNK_MAX_BYTES * 3 {
-            content.push_str(segment);
-        }
-        // Should not panic on multi-byte boundary
-        let chunks = split_into_chunks(&content);
-        assert!(chunks.len() >= 2);
-        for (_, chunk) in &chunks {
-            assert!(!chunk.is_empty());
-        }
-    }
-
-    #[test]
-    fn test_nbsp_at_overlap_boundary() {
-        // Reproduce the exact crash: \u{a0} (non-breaking space, 2-byte UTF-8)
-        // placed so that split_at - CHUNK_OVERLAP_CHARS lands mid-character
-        let mut content = String::new();
-        // Fill with ASCII up to near CHUNK_MAX_BYTES, then place \u{a0}
-        // near where the overlap subtraction would land
-        let target = CHUNK_MAX_BYTES - CHUNK_OVERLAP_CHARS;
-        while content.len() < target - 2 {
-            content.push('a');
-        }
-        content.push('\u{a0}'); // 2-byte char right at the overlap boundary
-        while content.len() < CHUNK_MAX_BYTES * 3 {
-            content.push('b');
-        }
-        // Should not panic
-        let chunks = split_into_chunks(&content);
-        assert!(chunks.len() >= 2);
-    }
-
-    #[test]
-    fn test_box_drawing_heavy_content() {
-        // Simulates a document with many box-drawing characters (3-byte UTF-8)
-        // like the ─ (U+2500) character found in markdown tables
-        let mut content = String::new();
-        // Normal text header
-        content.push_str("# Title\n\nSome description text.\n\n");
-        // Table header with box drawing
-        content.push('┌');
-        for _ in 0..200 {
-            content.push('─');
-        }
-        content.push('┬');
-        for _ in 0..200 {
-            content.push('─');
-        }
-        content.push_str("┐\n"); // clippy: push_str is correct here (multi-char)
-        // Table rows
-        for row in 0..50 {
-            content.push_str(&format!("│ row {:<194}│ data {:<193}│\n", row, row));
-            content.push('├');
-            for _ in 0..200 {
-                content.push('─');
-            }
-            content.push('┼');
-            for _ in 0..200 {
-                content.push('─');
-            }
-            content.push_str("┤\n"); // push_str for multi-char
-        }
-        content.push('└');
-        for _ in 0..200 {
-            content.push('─');
-        }
-        content.push('┴');
-        for _ in 0..200 {
-            content.push('─');
-        }
-        content.push_str("┘\n"); // push_str for multi-char
-
-        eprintln!(
-            "Content size: {} bytes, {} chars",
-            content.len(),
-            content.chars().count()
-        );
-        let start = std::time::Instant::now();
-        let chunks = split_into_chunks(&content);
-        let elapsed = start.elapsed();
-        eprintln!(
-            "Chunking took {:?}, produced {} chunks",
-            elapsed,
-            chunks.len()
-        );
-
-        // Should complete in reasonable time
-        assert!(
-            elapsed.as_secs() < 5,
-            "Chunking took too long: {:?}",
-            elapsed
-        );
-        assert!(!chunks.is_empty());
-    }
-
-    #[test]
-    fn test_real_doc_18526_pattern() {
-        // Reproduce exact pattern: long lines of ─ (3 bytes each, no spaces)
-        // followed by newlines, creating a pattern where chunk windows
-        // land in spaceless regions
-        let mut content = String::new();
-        content.push_str("Header text with spaces\n\n");
-        // Create a very long line of ─ chars (2000+ bytes, exceeding CHUNK_MAX_BYTES)
-        for _ in 0..800 {
-            content.push('─'); // 3 bytes each = 2400 bytes
-        }
-        content.push('\n');
-        content.push_str("Some more text.\n\n");
-        // Another long run
-        for _ in 0..800 {
-            content.push('─');
-        }
-        content.push('\n');
-        content.push_str("End text.\n");
-
-        eprintln!("Content size: {} bytes", content.len());
-        let start = std::time::Instant::now();
-        let chunks = split_into_chunks(&content);
-        let elapsed = start.elapsed();
-        eprintln!(
-            "Chunking took {:?}, produced {} chunks",
-            elapsed,
-            chunks.len()
-        );
-
-        assert!(
-            elapsed.as_secs() < 2,
-            "Chunking took too long: {:?}",
-            elapsed
-        );
-        assert!(!chunks.is_empty());
-    }
-}
+#[path = "chunking_tests.rs"]
+mod tests;
--- a/src/embedding/chunking_tests.rs
+++ b/src/embedding/chunking_tests.rs
@@ -0,0 +1,226 @@
+use super::*;
+
+#[test]
+fn test_empty_content() {
+    let chunks = split_into_chunks("");
+    assert!(chunks.is_empty());
+}
+
+#[test]
+fn test_short_document_single_chunk() {
+    let content = "Short document content.";
+    let chunks = split_into_chunks(content);
+    assert_eq!(chunks.len(), 1);
+    assert_eq!(chunks[0].0, 0);
+    assert_eq!(chunks[0].1, content);
+}
+
+#[test]
+fn test_exactly_max_chars() {
+    let content = "a".repeat(CHUNK_MAX_BYTES);
+    let chunks = split_into_chunks(&content);
+    assert_eq!(chunks.len(), 1);
+}
+
+#[test]
+fn test_long_document_multiple_chunks() {
+    let paragraph = "This is a paragraph of text.\n\n";
+    let mut content = String::new();
+    while content.len() < CHUNK_MAX_BYTES * 2 {
+        content.push_str(paragraph);
+    }
+
+    let chunks = split_into_chunks(&content);
+    assert!(
+        chunks.len() >= 2,
+        "Expected multiple chunks, got {}",
+        chunks.len()
+    );
+
+    for (i, (idx, _)) in chunks.iter().enumerate() {
+        assert_eq!(*idx, i);
+    }
+
+    assert!(!chunks.last().unwrap().1.is_empty());
+}
+
+#[test]
+fn test_chunk_overlap() {
+    let paragraph = "This is paragraph content for testing chunk overlap behavior.\n\n";
+    let mut content = String::new();
+    while content.len() < CHUNK_MAX_BYTES + CHUNK_OVERLAP_CHARS + 1000 {
+        content.push_str(paragraph);
+    }
+
+    let chunks = split_into_chunks(&content);
+    assert!(chunks.len() >= 2);
+
+    if chunks.len() >= 2 {
+        let end_of_first = &chunks[0].1;
+        let start_of_second = &chunks[1].1;
+        let overlap_region =
+            &end_of_first[end_of_first.len().saturating_sub(CHUNK_OVERLAP_CHARS)..];
+        assert!(
+            start_of_second.starts_with(overlap_region)
+                || overlap_region.contains(&start_of_second[..100.min(start_of_second.len())]),
+            "Expected overlap between chunks"
+        );
+    }
+}
+
+#[test]
+fn test_no_paragraph_boundary() {
+    let content = "word ".repeat(CHUNK_MAX_BYTES / 5 * 3);
+    let chunks = split_into_chunks(&content);
+    assert!(chunks.len() >= 2);
+    for (_, chunk) in &chunks {
+        assert!(!chunk.is_empty());
+    }
+}
+
+#[test]
+fn test_chunk_indices_sequential() {
+    let content = "a ".repeat(CHUNK_MAX_BYTES);
+    let chunks = split_into_chunks(&content);
+    for (i, (idx, _)) in chunks.iter().enumerate() {
+        assert_eq!(*idx, i, "Chunk index mismatch at position {}", i);
+    }
+}
+
+#[test]
+fn test_multibyte_characters_no_panic() {
+    // Build content with multi-byte UTF-8 chars (smart quotes, emoji, CJK)
+    // placed at positions likely to hit len()*2/3 and len()/2 boundaries
+    let segment = "We\u{2019}ve gradually ar\u{2014}ranged the components. ";
+    let mut content = String::new();
+    while content.len() < CHUNK_MAX_BYTES * 3 {
+        content.push_str(segment);
+    }
+    // Should not panic on multi-byte boundary
+    let chunks = split_into_chunks(&content);
+    assert!(chunks.len() >= 2);
+    for (_, chunk) in &chunks {
+        assert!(!chunk.is_empty());
+    }
+}
+
+#[test]
+fn test_nbsp_at_overlap_boundary() {
+    // Reproduce the exact crash: \u{a0} (non-breaking space, 2-byte UTF-8)
+    // placed so that split_at - CHUNK_OVERLAP_CHARS lands mid-character
+    let mut content = String::new();
+    // Fill with ASCII up to near CHUNK_MAX_BYTES, then place \u{a0}
+    // near where the overlap subtraction would land
+    let target = CHUNK_MAX_BYTES - CHUNK_OVERLAP_CHARS;
+    while content.len() < target - 2 {
+        content.push('a');
+    }
+    content.push('\u{a0}'); // 2-byte char right at the overlap boundary
+    while content.len() < CHUNK_MAX_BYTES * 3 {
+        content.push('b');
+    }
+    // Should not panic
+    let chunks = split_into_chunks(&content);
+    assert!(chunks.len() >= 2);
+}
+
+#[test]
+fn test_box_drawing_heavy_content() {
+    // Simulates a document with many box-drawing characters (3-byte UTF-8)
+    // like the ─ (U+2500) character found in markdown tables
+    let mut content = String::new();
+    // Normal text header
+    content.push_str("# Title\n\nSome description text.\n\n");
+    // Table header with box drawing
+    content.push('┌');
+    for _ in 0..200 {
+        content.push('─');
+    }
+    content.push('┬');
+    for _ in 0..200 {
+        content.push('─');
+    }
+    content.push_str("┐\n"); // clippy: push_str is correct here (multi-char)
+    // Table rows
+    for row in 0..50 {
+        content.push_str(&format!("│ row {:<194}│ data {:<193}│\n", row, row));
+        content.push('├');
+        for _ in 0..200 {
+            content.push('─');
+        }
+        content.push('┼');
+        for _ in 0..200 {
+            content.push('─');
+        }
+        content.push_str("┤\n"); // push_str for multi-char
+    }
+    content.push('└');
+    for _ in 0..200 {
+        content.push('─');
+    }
+    content.push('┴');
+    for _ in 0..200 {
+        content.push('─');
+    }
+    content.push_str("┘\n"); // push_str for multi-char
+
+    eprintln!(
+        "Content size: {} bytes, {} chars",
+        content.len(),
+        content.chars().count()
+    );
+    let start = std::time::Instant::now();
+    let chunks = split_into_chunks(&content);
+    let elapsed = start.elapsed();
+    eprintln!(
+        "Chunking took {:?}, produced {} chunks",
+        elapsed,
+        chunks.len()
+    );
+
+    // Should complete in reasonable time
+    assert!(
+        elapsed.as_secs() < 5,
+        "Chunking took too long: {:?}",
+        elapsed
+    );
+    assert!(!chunks.is_empty());
+}
+
+#[test]
+fn test_real_doc_18526_pattern() {
+    // Reproduce exact pattern: long lines of ─ (3 bytes each, no spaces)
+    // followed by newlines, creating a pattern where chunk windows
+    // land in spaceless regions
+    let mut content = String::new();
+    content.push_str("Header text with spaces\n\n");
+    // Create a very long line of ─ chars (2000+ bytes, exceeding CHUNK_MAX_BYTES)
+    for _ in 0..800 {
+        content.push('─'); // 3 bytes each = 2400 bytes
+    }
+    content.push('\n');
+    content.push_str("Some more text.\n\n");
+    // Another long run
+    for _ in 0..800 {
+        content.push('─');
+    }
+    content.push('\n');
+    content.push_str("End text.\n");
+
+    eprintln!("Content size: {} bytes", content.len());
+    let start = std::time::Instant::now();
+    let chunks = split_into_chunks(&content);
+    let elapsed = start.elapsed();
+    eprintln!(
+        "Chunking took {:?}, produced {} chunks",
+        elapsed,
+        chunks.len()
+    );
+
+    assert!(
+        elapsed.as_secs() < 2,
+        "Chunking took too long: {:?}",
+        elapsed
+    );
+    assert!(!chunks.is_empty());
+}
--- a/src/gitlab/graphql.rs
+++ b/src/gitlab/graphql.rs
@@ -364,930 +364,5 @@ pub async fn fetch_issue_statuses_with_progress(
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::error::LoreError;
-    use wiremock::matchers::{body_json, header, method, path};
-    use wiremock::{Mock, MockServer, ResponseTemplate};
-
-    // ═══════════════════════════════════════════════════════════════════════
-    //  AC-1: GraphQL Client
-    // ═══════════════════════════════════════════════════════════════════════
-
-    #[tokio::test]
-    async fn test_graphql_query_success() {
-        let server = MockServer::start().await;
-        let response_body = serde_json::json!({
-            "data": { "project": { "id": "1" } }
-        });
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(&response_body))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "test-token");
-        let result = client
-            .query("{ project { id } }", serde_json::json!({}))
-            .await
-            .unwrap();
-
-        assert_eq!(result.data["project"]["id"], "1");
-        assert!(!result.had_partial_errors);
-        assert!(result.first_partial_error.is_none());
-    }
-
-    #[tokio::test]
-    async fn test_graphql_query_with_errors_no_data() {
-        let server = MockServer::start().await;
-        let response_body = serde_json::json!({
-            "errors": [{ "message": "Field 'foo' not found" }]
-        });
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(&response_body))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "test-token");
-        let err = client
-            .query("{ foo }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        match err {
-            LoreError::Other(msg) => {
-                assert!(msg.contains("Field 'foo' not found"), "got: {msg}");
-            }
-            other => panic!("Expected LoreError::Other, got: {other:?}"),
-        }
-    }
-
-    #[tokio::test]
-    async fn test_graphql_auth_uses_bearer() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .and(header("Authorization", "Bearer my-secret-token"))
-            .respond_with(
-                ResponseTemplate::new(200).set_body_json(serde_json::json!({"data": {"ok": true}})),
-            )
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "my-secret-token");
-        let result = client.query("{ ok }", serde_json::json!({})).await;
-        assert!(result.is_ok());
-    }
-
-    #[tokio::test]
-    async fn test_graphql_401_maps_to_auth_failed() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(401))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "bad-token");
-        let err = client
-            .query("{ me }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        assert!(matches!(err, LoreError::GitLabAuthFailed));
-    }
-
-    #[tokio::test]
-    async fn test_graphql_403_maps_to_auth_failed() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(403))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "forbidden-token");
-        let err = client
-            .query("{ admin }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        assert!(matches!(err, LoreError::GitLabAuthFailed));
-    }
-
-    #[tokio::test]
-    async fn test_graphql_404_maps_to_not_found() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(404))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "token");
-        let err = client
-            .query("{ x }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        match err {
-            LoreError::GitLabNotFound { resource } => {
-                assert_eq!(resource, "GraphQL endpoint");
-            }
-            other => panic!("Expected GitLabNotFound, got: {other:?}"),
-        }
-    }
-
-    #[tokio::test]
-    async fn test_graphql_partial_data_with_errors_returns_data() {
-        let server = MockServer::start().await;
-        let response_body = serde_json::json!({
-            "data": { "project": { "name": "test" } },
-            "errors": [{ "message": "Some field failed" }]
-        });
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(&response_body))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "token");
-        let result = client
-            .query("{ project { name } }", serde_json::json!({}))
-            .await
-            .unwrap();
-
-        assert_eq!(result.data["project"]["name"], "test");
-        assert!(result.had_partial_errors);
-        assert_eq!(
-            result.first_partial_error.as_deref(),
-            Some("Some field failed")
-        );
-    }
-
-    #[tokio::test]
-    async fn test_retry_after_delta_seconds() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(429).insert_header("Retry-After", "120"))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "token");
-        let err = client
-            .query("{ x }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        match err {
-            LoreError::GitLabRateLimited { retry_after } => {
-                assert_eq!(retry_after, 120);
-            }
-            other => panic!("Expected GitLabRateLimited, got: {other:?}"),
-        }
-    }
-
-    #[tokio::test]
-    async fn test_retry_after_http_date_format() {
-        let server = MockServer::start().await;
-
-        let future = SystemTime::now() + Duration::from_secs(90);
-        let date_str = httpdate::fmt_http_date(future);
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(429).insert_header("Retry-After", date_str))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "token");
-        let err = client
-            .query("{ x }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        match err {
-            LoreError::GitLabRateLimited { retry_after } => {
-                assert!(
-                    (85..=95).contains(&retry_after),
-                    "retry_after={retry_after}"
-                );
-            }
-            other => panic!("Expected GitLabRateLimited, got: {other:?}"),
-        }
-    }
-
-    #[tokio::test]
-    async fn test_retry_after_invalid_falls_back_to_60() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(429).insert_header("Retry-After", "garbage"))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "token");
-        let err = client
-            .query("{ x }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        match err {
-            LoreError::GitLabRateLimited { retry_after } => {
-                assert_eq!(retry_after, 60);
-            }
-            other => panic!("Expected GitLabRateLimited, got: {other:?}"),
-        }
-    }
-
-    #[tokio::test]
-    async fn test_graphql_network_error() {
-        let client = GraphqlClient::new("http://127.0.0.1:1", "token");
-        let err = client
-            .query("{ x }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        assert!(
-            matches!(err, LoreError::GitLabNetworkError { .. }),
-            "Expected GitLabNetworkError, got: {err:?}"
-        );
-    }
-
-    #[tokio::test]
-    async fn test_graphql_request_body_format() {
-        let server = MockServer::start().await;
-
-        let expected_body = serde_json::json!({
-            "query": "{ project(fullPath: $path) { id } }",
-            "variables": { "path": "group/repo" }
-        });
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .and(body_json(&expected_body))
-            .respond_with(
-                ResponseTemplate::new(200)
-                    .set_body_json(serde_json::json!({"data": {"project": {"id": "1"}}})),
-            )
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "token");
-        let result = client
-            .query(
-                "{ project(fullPath: $path) { id } }",
-                serde_json::json!({"path": "group/repo"}),
-            )
-            .await;
-
-        assert!(result.is_ok(), "Body format mismatch: {result:?}");
-    }
-
-    #[tokio::test]
-    async fn test_graphql_base_url_trailing_slash() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(
-                ResponseTemplate::new(200).set_body_json(serde_json::json!({"data": {"ok": true}})),
-            )
-            .mount(&server)
-            .await;
-
-        let url_with_slash = format!("{}/", server.uri());
-        let client = GraphqlClient::new(&url_with_slash, "token");
-        let result = client.query("{ ok }", serde_json::json!({})).await;
-
-        assert!(result.is_ok());
-    }
-
-    #[tokio::test]
-    async fn test_graphql_data_null_no_errors() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(
-                ResponseTemplate::new(200).set_body_json(serde_json::json!({"data": null})),
-            )
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "token");
-        let err = client
-            .query("{ x }", serde_json::json!({}))
-            .await
-            .unwrap_err();
-
-        match err {
-            LoreError::Other(msg) => {
-                assert!(msg.contains("missing 'data' field"), "got: {msg}");
-            }
-            other => panic!("Expected LoreError::Other, got: {other:?}"),
-        }
-    }
-
-    // ═══════════════════════════════════════════════════════════════════════
-    //  AC-3: Status Fetcher
-    // ═══════════════════════════════════════════════════════════════════════
-
-    /// Helper: build a GraphQL work-items response page with given issues.
-    fn make_work_items_page(
-        items: &[(i64, Option<&str>)],
-        has_next_page: bool,
-        end_cursor: Option<&str>,
-    ) -> serde_json::Value {
-        let nodes: Vec<serde_json::Value> = items
-            .iter()
-            .map(|(iid, status_name)| {
-                let mut widgets =
-                    vec![serde_json::json!({"__typename": "WorkItemWidgetDescription"})];
-                if let Some(name) = status_name {
-                    widgets.push(serde_json::json!({
-                        "__typename": "WorkItemWidgetStatus",
-                        "status": {
-                            "name": name,
-                            "category": "IN_PROGRESS",
-                            "color": "#1f75cb",
-                            "iconName": "status-in-progress"
-                        }
-                    }));
-                }
-                serde_json::json!({
-                    "iid": iid.to_string(),
-                    "widgets": widgets,
-                })
-            })
-            .collect();
-
-        serde_json::json!({
-            "data": {
-                "project": {
-                    "workItems": {
-                        "nodes": nodes,
-                        "pageInfo": {
-                            "endCursor": end_cursor,
-                            "hasNextPage": has_next_page,
-                        }
-                    }
-                }
-            }
-        })
-    }
-
-    /// Helper: build a page where issue has status widget but status is null.
-    fn make_null_status_widget_page(iid: i64) -> serde_json::Value {
-        serde_json::json!({
-            "data": {
-                "project": {
-                    "workItems": {
-                        "nodes": [{
-                            "iid": iid.to_string(),
-                            "widgets": [
-                                {"__typename": "WorkItemWidgetStatus", "status": null}
-                            ]
-                        }],
-                        "pageInfo": {
-                            "endCursor": null,
-                            "hasNextPage": false,
-                        }
-                    }
-                }
-            }
-        })
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_pagination() {
-        let server = MockServer::start().await;
-
-        // Page 1: returns cursor "cursor_page2"
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with({
-                ResponseTemplate::new(200).set_body_json(make_work_items_page(
-                    &[(1, Some("In progress")), (2, Some("To do"))],
-                    true,
-                    Some("cursor_page2"),
-                ))
-            })
-            .up_to_n_times(1)
-            .expect(1)
-            .mount(&server)
-            .await;
-
-        // Page 2: no more pages
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(
-                ResponseTemplate::new(200).set_body_json(make_work_items_page(
-                    &[(3, Some("Done"))],
-                    false,
-                    None,
-                )),
-            )
-            .expect(1)
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.statuses.len(), 3);
-        assert!(result.statuses.contains_key(&1));
-        assert!(result.statuses.contains_key(&2));
-        assert!(result.statuses.contains_key(&3));
-        assert_eq!(result.all_fetched_iids.len(), 3);
-        assert!(result.unsupported_reason.is_none());
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_no_status_widget() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                "data": {
-                    "project": {
-                        "workItems": {
-                            "nodes": [{
-                                "iid": "42",
-                                "widgets": [
-                                    {"__typename": "WorkItemWidgetDescription"},
-                                    {"__typename": "WorkItemWidgetLabels"}
-                                ]
-                            }],
-                            "pageInfo": {"endCursor": null, "hasNextPage": false}
-                        }
-                    }
-                }
-            })))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert!(result.statuses.is_empty(), "No status widget → no statuses");
-        assert!(
-            result.all_fetched_iids.contains(&42),
-            "IID 42 should still be in all_fetched_iids"
-        );
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_404_graceful() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(404))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert!(result.statuses.is_empty());
-        assert!(result.all_fetched_iids.is_empty());
-        assert!(matches!(
-            result.unsupported_reason,
-            Some(UnsupportedReason::GraphqlEndpointMissing)
-        ));
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_403_graceful() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(403))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert!(result.statuses.is_empty());
-        assert!(result.all_fetched_iids.is_empty());
-        assert!(matches!(
-            result.unsupported_reason,
-            Some(UnsupportedReason::AuthForbidden)
-        ));
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_unsupported_reason_none_on_success() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(
-                ResponseTemplate::new(200).set_body_json(make_work_items_page(
-                    &[(1, Some("To do"))],
-                    false,
-                    None,
-                )),
-            )
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert!(result.unsupported_reason.is_none());
-    }
-
-    #[tokio::test]
-    async fn test_typename_matching_ignores_non_status_widgets() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                "data": {
-                    "project": {
-                        "workItems": {
-                            "nodes": [{
-                                "iid": "10",
-                                "widgets": [
-                                    {"__typename": "WorkItemWidgetDescription"},
-                                    {"__typename": "WorkItemWidgetLabels"},
-                                    {"__typename": "WorkItemWidgetAssignees"},
-                                    {
-                                        "__typename": "WorkItemWidgetStatus",
-                                        "status": {
-                                            "name": "In progress",
-                                            "category": "IN_PROGRESS"
-                                        }
-                                    }
-                                ]
-                            }],
-                            "pageInfo": {"endCursor": null, "hasNextPage": false}
-                        }
-                    }
-                }
-            })))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.statuses.len(), 1);
-        assert_eq!(result.statuses[&10].name, "In progress");
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_cursor_stall_aborts() {
-        let server = MockServer::start().await;
-
-        let stall_response = serde_json::json!({
-            "data": {
-                "project": {
-                    "workItems": {
-                        "nodes": [{"iid": "1", "widgets": []}],
-                        "pageInfo": {"endCursor": "same_cursor", "hasNextPage": true}
-                    }
-                }
-            }
-        });
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(stall_response))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert!(
-            result.all_fetched_iids.contains(&1),
-            "Should contain the one IID fetched before stall"
-        );
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_complexity_error_reduces_page_size() {
-        let server = MockServer::start().await;
-        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
-        let call_count_clone = call_count.clone();
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(move |_req: &wiremock::Request| {
-                let n =
-                    call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
-                if n == 0 {
-                    ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                        "errors": [{"message": "Query has complexity of 300, which exceeds max complexity of 250"}]
-                    }))
-                } else {
-                    ResponseTemplate::new(200).set_body_json(make_work_items_page(
-                        &[(1, Some("In progress"))],
-                        false,
-                        None,
-                    ))
-                }
-            })
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.statuses.len(), 1);
-        assert_eq!(result.statuses[&1].name, "In progress");
-        assert_eq!(call_count.load(std::sync::atomic::Ordering::SeqCst), 2);
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_timeout_error_reduces_page_size() {
-        let server = MockServer::start().await;
-        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
-        let call_count_clone = call_count.clone();
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(move |_req: &wiremock::Request| {
-                let n = call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
-                if n == 0 {
-                    ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                        "errors": [{"message": "Query timeout after 30000ms"}]
-                    }))
-                } else {
-                    ResponseTemplate::new(200).set_body_json(make_work_items_page(
-                        &[(5, Some("Done"))],
-                        false,
-                        None,
-                    ))
-                }
-            })
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.statuses.len(), 1);
-        assert!(call_count.load(std::sync::atomic::Ordering::SeqCst) >= 2);
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_smallest_page_still_fails() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                "errors": [{"message": "Query has complexity of 9999"}]
-            })))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let err = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap_err();
-
-        assert!(
-            matches!(err, LoreError::Other(_)),
-            "Expected error after exhausting all page sizes, got: {err:?}"
-        );
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_page_size_resets_after_success() {
-        let server = MockServer::start().await;
-        let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
-        let call_count_clone = call_count.clone();
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(move |_req: &wiremock::Request| {
-                let n = call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
-                match n {
-                    0 => {
-                        // Page 1 at size 100: success, has next page
-                        ResponseTemplate::new(200).set_body_json(make_work_items_page(
-                            &[(1, Some("To do"))],
-                            true,
-                            Some("cursor_p2"),
-                        ))
-                    }
-                    1 => {
-                        // Page 2 at size 100 (reset): complexity error
-                        ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                            "errors": [{"message": "Query has complexity of 300"}]
-                        }))
-                    }
-                    2 => {
-                        // Page 2 retry at size 50: success
-                        ResponseTemplate::new(200).set_body_json(make_work_items_page(
-                            &[(2, Some("Done"))],
-                            false,
-                            None,
-                        ))
-                    }
-                    _ => ResponseTemplate::new(500),
-                }
-            })
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.statuses.len(), 2);
-        assert!(result.statuses.contains_key(&1));
-        assert!(result.statuses.contains_key(&2));
-        assert_eq!(call_count.load(std::sync::atomic::Ordering::SeqCst), 3);
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_partial_errors_tracked() {
-        let server = MockServer::start().await;
-
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                "data": {
-                    "project": {
-                        "workItems": {
-                            "nodes": [{"iid": "1", "widgets": [
-                                {"__typename": "WorkItemWidgetStatus", "status": {"name": "To do"}}
-                            ]}],
-                            "pageInfo": {"endCursor": null, "hasNextPage": false}
-                        }
-                    }
-                },
-                "errors": [{"message": "Rate limit warning: approaching limit"}]
-            })))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.partial_error_count, 1);
-        assert_eq!(
-            result.first_partial_error.as_deref(),
-            Some("Rate limit warning: approaching limit")
-        );
-        assert_eq!(result.statuses.len(), 1);
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_empty_project() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                "data": {
-                    "project": {
-                        "workItems": {
-                            "nodes": [],
-                            "pageInfo": {"endCursor": null, "hasNextPage": false}
-                        }
-                    }
-                }
-            })))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert!(result.statuses.is_empty());
-        assert!(result.all_fetched_iids.is_empty());
-        assert!(result.unsupported_reason.is_none());
-        assert_eq!(result.partial_error_count, 0);
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_null_status_in_widget() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(
-                ResponseTemplate::new(200).set_body_json(make_null_status_widget_page(42)),
-            )
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert!(
-            result.statuses.is_empty(),
-            "Null status should not be in map"
-        );
-        assert!(
-            result.all_fetched_iids.contains(&42),
-            "IID should still be tracked in all_fetched_iids"
-        );
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_non_numeric_iid_skipped() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(
-                ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                    "data": {
-                        "project": {
-                            "workItems": {
-                                "nodes": [
-                                    {
-                                        "iid": "not_a_number",
-                                        "widgets": [{"__typename": "WorkItemWidgetStatus", "status": {"name": "To do"}}]
-                                    },
-                                    {
-                                        "iid": "7",
-                                        "widgets": [{"__typename": "WorkItemWidgetStatus", "status": {"name": "Done"}}]
-                                    }
-                                ],
-                                "pageInfo": {"endCursor": null, "hasNextPage": false}
-                            }
-                        }
-                    }
-                })),
-            )
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.statuses.len(), 1);
-        assert!(result.statuses.contains_key(&7));
-        assert_eq!(result.all_fetched_iids.len(), 1);
-    }
-
-    #[tokio::test]
-    async fn test_fetch_statuses_null_cursor_with_has_next_aborts() {
-        let server = MockServer::start().await;
-        Mock::given(method("POST"))
-            .and(path("/api/graphql"))
-            .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
-                "data": {
-                    "project": {
-                        "workItems": {
-                            "nodes": [{"iid": "1", "widgets": []}],
-                            "pageInfo": {"endCursor": null, "hasNextPage": true}
-                        }
-                    }
-                }
-            })))
-            .mount(&server)
-            .await;
-
-        let client = GraphqlClient::new(&server.uri(), "tok123");
-        let result = fetch_issue_statuses(&client, "group/project")
-            .await
-            .unwrap();
-
-        assert_eq!(result.all_fetched_iids.len(), 1);
-    }
-}
+#[path = "graphql_tests.rs"]
+mod tests;
--- a/src/gitlab/graphql_tests.rs
+++ b/src/gitlab/graphql_tests.rs
@@ -0,0 +1,923 @@
+use super::*;
+use crate::core::error::LoreError;
+use wiremock::matchers::{body_json, header, method, path};
+use wiremock::{Mock, MockServer, ResponseTemplate};
+
+// ═══════════════════════════════════════════════════════════════════════
+//  AC-1: GraphQL Client
+// ═══════════════════════════════════════════════════════════════════════
+
+#[tokio::test]
+async fn test_graphql_query_success() {
+    let server = MockServer::start().await;
+    let response_body = serde_json::json!({
+        "data": { "project": { "id": "1" } }
+    });
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(&response_body))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "test-token");
+    let result = client
+        .query("{ project { id } }", serde_json::json!({}))
+        .await
+        .unwrap();
+
+    assert_eq!(result.data["project"]["id"], "1");
+    assert!(!result.had_partial_errors);
+    assert!(result.first_partial_error.is_none());
+}
+
+#[tokio::test]
+async fn test_graphql_query_with_errors_no_data() {
+    let server = MockServer::start().await;
+    let response_body = serde_json::json!({
+        "errors": [{ "message": "Field 'foo' not found" }]
+    });
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(&response_body))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "test-token");
+    let err = client
+        .query("{ foo }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    match err {
+        LoreError::Other(msg) => {
+            assert!(msg.contains("Field 'foo' not found"), "got: {msg}");
+        }
+        other => panic!("Expected LoreError::Other, got: {other:?}"),
+    }
+}
+
+#[tokio::test]
+async fn test_graphql_auth_uses_bearer() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .and(header("Authorization", "Bearer my-secret-token"))
+        .respond_with(
+            ResponseTemplate::new(200).set_body_json(serde_json::json!({"data": {"ok": true}})),
+        )
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "my-secret-token");
+    let result = client.query("{ ok }", serde_json::json!({})).await;
+    assert!(result.is_ok());
+}
+
+#[tokio::test]
+async fn test_graphql_401_maps_to_auth_failed() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(401))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "bad-token");
+    let err = client
+        .query("{ me }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    assert!(matches!(err, LoreError::GitLabAuthFailed));
+}
+
+#[tokio::test]
+async fn test_graphql_403_maps_to_auth_failed() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(403))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "forbidden-token");
+    let err = client
+        .query("{ admin }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    assert!(matches!(err, LoreError::GitLabAuthFailed));
+}
+
+#[tokio::test]
+async fn test_graphql_404_maps_to_not_found() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(404))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "token");
+    let err = client
+        .query("{ x }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    match err {
+        LoreError::GitLabNotFound { resource } => {
+            assert_eq!(resource, "GraphQL endpoint");
+        }
+        other => panic!("Expected GitLabNotFound, got: {other:?}"),
+    }
+}
+
+#[tokio::test]
+async fn test_graphql_partial_data_with_errors_returns_data() {
+    let server = MockServer::start().await;
+    let response_body = serde_json::json!({
+        "data": { "project": { "name": "test" } },
+        "errors": [{ "message": "Some field failed" }]
+    });
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(&response_body))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "token");
+    let result = client
+        .query("{ project { name } }", serde_json::json!({}))
+        .await
+        .unwrap();
+
+    assert_eq!(result.data["project"]["name"], "test");
+    assert!(result.had_partial_errors);
+    assert_eq!(
+        result.first_partial_error.as_deref(),
+        Some("Some field failed")
+    );
+}
+
+#[tokio::test]
+async fn test_retry_after_delta_seconds() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(429).insert_header("Retry-After", "120"))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "token");
+    let err = client
+        .query("{ x }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    match err {
+        LoreError::GitLabRateLimited { retry_after } => {
+            assert_eq!(retry_after, 120);
+        }
+        other => panic!("Expected GitLabRateLimited, got: {other:?}"),
+    }
+}
+
+#[tokio::test]
+async fn test_retry_after_http_date_format() {
+    let server = MockServer::start().await;
+
+    let future = SystemTime::now() + Duration::from_secs(90);
+    let date_str = httpdate::fmt_http_date(future);
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(429).insert_header("Retry-After", date_str))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "token");
+    let err = client
+        .query("{ x }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    match err {
+        LoreError::GitLabRateLimited { retry_after } => {
+            assert!(
+                (85..=95).contains(&retry_after),
+                "retry_after={retry_after}"
+            );
+        }
+        other => panic!("Expected GitLabRateLimited, got: {other:?}"),
+    }
+}
+
+#[tokio::test]
+async fn test_retry_after_invalid_falls_back_to_60() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(429).insert_header("Retry-After", "garbage"))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "token");
+    let err = client
+        .query("{ x }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    match err {
+        LoreError::GitLabRateLimited { retry_after } => {
+            assert_eq!(retry_after, 60);
+        }
+        other => panic!("Expected GitLabRateLimited, got: {other:?}"),
+    }
+}
+
+#[tokio::test]
+async fn test_graphql_network_error() {
+    let client = GraphqlClient::new("http://127.0.0.1:1", "token");
+    let err = client
+        .query("{ x }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    assert!(
+        matches!(err, LoreError::GitLabNetworkError { .. }),
+        "Expected GitLabNetworkError, got: {err:?}"
+    );
+}
+
+#[tokio::test]
+async fn test_graphql_request_body_format() {
+    let server = MockServer::start().await;
+
+    let expected_body = serde_json::json!({
+        "query": "{ project(fullPath: $path) { id } }",
+        "variables": { "path": "group/repo" }
+    });
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .and(body_json(&expected_body))
+        .respond_with(
+            ResponseTemplate::new(200)
+                .set_body_json(serde_json::json!({"data": {"project": {"id": "1"}}})),
+        )
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "token");
+    let result = client
+        .query(
+            "{ project(fullPath: $path) { id } }",
+            serde_json::json!({"path": "group/repo"}),
+        )
+        .await;
+
+    assert!(result.is_ok(), "Body format mismatch: {result:?}");
+}
+
+#[tokio::test]
+async fn test_graphql_base_url_trailing_slash() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(
+            ResponseTemplate::new(200).set_body_json(serde_json::json!({"data": {"ok": true}})),
+        )
+        .mount(&server)
+        .await;
+
+    let url_with_slash = format!("{}/", server.uri());
+    let client = GraphqlClient::new(&url_with_slash, "token");
+    let result = client.query("{ ok }", serde_json::json!({})).await;
+
+    assert!(result.is_ok());
+}
+
+#[tokio::test]
+async fn test_graphql_data_null_no_errors() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({"data": null})))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "token");
+    let err = client
+        .query("{ x }", serde_json::json!({}))
+        .await
+        .unwrap_err();
+
+    match err {
+        LoreError::Other(msg) => {
+            assert!(msg.contains("missing 'data' field"), "got: {msg}");
+        }
+        other => panic!("Expected LoreError::Other, got: {other:?}"),
+    }
+}
+
+// ═══════════════════════════════════════════════════════════════════════
+//  AC-3: Status Fetcher
+// ═══════════════════════════════════════════════════════════════════════
+
+/// Helper: build a GraphQL work-items response page with given issues.
+fn make_work_items_page(
+    items: &[(i64, Option<&str>)],
+    has_next_page: bool,
+    end_cursor: Option<&str>,
+) -> serde_json::Value {
+    let nodes: Vec<serde_json::Value> = items
+        .iter()
+        .map(|(iid, status_name)| {
+            let mut widgets = vec![serde_json::json!({"__typename": "WorkItemWidgetDescription"})];
+            if let Some(name) = status_name {
+                widgets.push(serde_json::json!({
+                    "__typename": "WorkItemWidgetStatus",
+                    "status": {
+                        "name": name,
+                        "category": "IN_PROGRESS",
+                        "color": "#1f75cb",
+                        "iconName": "status-in-progress"
+                    }
+                }));
+            }
+            serde_json::json!({
+                "iid": iid.to_string(),
+                "widgets": widgets,
+            })
+        })
+        .collect();
+
+    serde_json::json!({
+        "data": {
+            "project": {
+                "workItems": {
+                    "nodes": nodes,
+                    "pageInfo": {
+                        "endCursor": end_cursor,
+                        "hasNextPage": has_next_page,
+                    }
+                }
+            }
+        }
+    })
+}
+
+/// Helper: build a page where issue has status widget but status is null.
+fn make_null_status_widget_page(iid: i64) -> serde_json::Value {
+    serde_json::json!({
+        "data": {
+            "project": {
+                "workItems": {
+                    "nodes": [{
+                        "iid": iid.to_string(),
+                        "widgets": [
+                            {"__typename": "WorkItemWidgetStatus", "status": null}
+                        ]
+                    }],
+                    "pageInfo": {
+                        "endCursor": null,
+                        "hasNextPage": false,
+                    }
+                }
+            }
+        }
+    })
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_pagination() {
+    let server = MockServer::start().await;
+
+    // Page 1: returns cursor "cursor_page2"
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with({
+            ResponseTemplate::new(200).set_body_json(make_work_items_page(
+                &[(1, Some("In progress")), (2, Some("To do"))],
+                true,
+                Some("cursor_page2"),
+            ))
+        })
+        .up_to_n_times(1)
+        .expect(1)
+        .mount(&server)
+        .await;
+
+    // Page 2: no more pages
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(
+            ResponseTemplate::new(200).set_body_json(make_work_items_page(
+                &[(3, Some("Done"))],
+                false,
+                None,
+            )),
+        )
+        .expect(1)
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.statuses.len(), 3);
+    assert!(result.statuses.contains_key(&1));
+    assert!(result.statuses.contains_key(&2));
+    assert!(result.statuses.contains_key(&3));
+    assert_eq!(result.all_fetched_iids.len(), 3);
+    assert!(result.unsupported_reason.is_none());
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_no_status_widget() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
+            "data": {
+                "project": {
+                    "workItems": {
+                        "nodes": [{
+                            "iid": "42",
+                            "widgets": [
+                                {"__typename": "WorkItemWidgetDescription"},
+                                {"__typename": "WorkItemWidgetLabels"}
+                            ]
+                        }],
+                        "pageInfo": {"endCursor": null, "hasNextPage": false}
+                    }
+                }
+            }
+        })))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert!(
+        result.statuses.is_empty(),
+        "No status widget -> no statuses"
+    );
+    assert!(
+        result.all_fetched_iids.contains(&42),
+        "IID 42 should still be in all_fetched_iids"
+    );
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_404_graceful() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(404))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert!(result.statuses.is_empty());
+    assert!(result.all_fetched_iids.is_empty());
+    assert!(matches!(
+        result.unsupported_reason,
+        Some(UnsupportedReason::GraphqlEndpointMissing)
+    ));
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_403_graceful() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(403))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert!(result.statuses.is_empty());
+    assert!(result.all_fetched_iids.is_empty());
+    assert!(matches!(
+        result.unsupported_reason,
+        Some(UnsupportedReason::AuthForbidden)
+    ));
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_unsupported_reason_none_on_success() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(
+            ResponseTemplate::new(200).set_body_json(make_work_items_page(
+                &[(1, Some("To do"))],
+                false,
+                None,
+            )),
+        )
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert!(result.unsupported_reason.is_none());
+}
+
+#[tokio::test]
+async fn test_typename_matching_ignores_non_status_widgets() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
+            "data": {
+                "project": {
+                    "workItems": {
+                        "nodes": [{
+                            "iid": "10",
+                            "widgets": [
+                                {"__typename": "WorkItemWidgetDescription"},
+                                {"__typename": "WorkItemWidgetLabels"},
+                                {"__typename": "WorkItemWidgetAssignees"},
+                                {
+                                    "__typename": "WorkItemWidgetStatus",
+                                    "status": {
+                                        "name": "In progress",
+                                        "category": "IN_PROGRESS"
+                                    }
+                                }
+                            ]
+                        }],
+                        "pageInfo": {"endCursor": null, "hasNextPage": false}
+                    }
+                }
+            }
+        })))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.statuses.len(), 1);
+    assert_eq!(result.statuses[&10].name, "In progress");
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_cursor_stall_aborts() {
+    let server = MockServer::start().await;
+
+    let stall_response = serde_json::json!({
+        "data": {
+            "project": {
+                "workItems": {
+                    "nodes": [{"iid": "1", "widgets": []}],
+                    "pageInfo": {"endCursor": "same_cursor", "hasNextPage": true}
+                }
+            }
+        }
+    });
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(stall_response))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert!(
+        result.all_fetched_iids.contains(&1),
+        "Should contain the one IID fetched before stall"
+    );
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_complexity_error_reduces_page_size() {
+    let server = MockServer::start().await;
+    let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
+    let call_count_clone = call_count.clone();
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(move |_req: &wiremock::Request| {
+            let n =
+                call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
+            if n == 0 {
+                ResponseTemplate::new(200).set_body_json(serde_json::json!({
+                    "errors": [{"message": "Query has complexity of 300, which exceeds max complexity of 250"}]
+                }))
+            } else {
+                ResponseTemplate::new(200).set_body_json(make_work_items_page(
+                    &[(1, Some("In progress"))],
+                    false,
+                    None,
+                ))
+            }
+        })
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.statuses.len(), 1);
+    assert_eq!(result.statuses[&1].name, "In progress");
+    assert_eq!(call_count.load(std::sync::atomic::Ordering::SeqCst), 2);
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_timeout_error_reduces_page_size() {
+    let server = MockServer::start().await;
+    let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
+    let call_count_clone = call_count.clone();
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(move |_req: &wiremock::Request| {
+            let n = call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
+            if n == 0 {
+                ResponseTemplate::new(200).set_body_json(serde_json::json!({
+                    "errors": [{"message": "Query timeout after 30000ms"}]
+                }))
+            } else {
+                ResponseTemplate::new(200).set_body_json(make_work_items_page(
+                    &[(5, Some("Done"))],
+                    false,
+                    None,
+                ))
+            }
+        })
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.statuses.len(), 1);
+    assert!(call_count.load(std::sync::atomic::Ordering::SeqCst) >= 2);
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_smallest_page_still_fails() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
+            "errors": [{"message": "Query has complexity of 9999"}]
+        })))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let err = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap_err();
+
+    assert!(
+        matches!(err, LoreError::Other(_)),
+        "Expected error after exhausting all page sizes, got: {err:?}"
+    );
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_page_size_resets_after_success() {
+    let server = MockServer::start().await;
+    let call_count = std::sync::Arc::new(std::sync::atomic::AtomicUsize::new(0));
+    let call_count_clone = call_count.clone();
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(move |_req: &wiremock::Request| {
+            let n = call_count_clone.fetch_add(1, std::sync::atomic::Ordering::SeqCst);
+            match n {
+                0 => {
+                    // Page 1 at size 100: success, has next page
+                    ResponseTemplate::new(200).set_body_json(make_work_items_page(
+                        &[(1, Some("To do"))],
+                        true,
+                        Some("cursor_p2"),
+                    ))
+                }
+                1 => {
+                    // Page 2 at size 100 (reset): complexity error
+                    ResponseTemplate::new(200).set_body_json(serde_json::json!({
+                        "errors": [{"message": "Query has complexity of 300"}]
+                    }))
+                }
+                2 => {
+                    // Page 2 retry at size 50: success
+                    ResponseTemplate::new(200).set_body_json(make_work_items_page(
+                        &[(2, Some("Done"))],
+                        false,
+                        None,
+                    ))
+                }
+                _ => ResponseTemplate::new(500),
+            }
+        })
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.statuses.len(), 2);
+    assert!(result.statuses.contains_key(&1));
+    assert!(result.statuses.contains_key(&2));
+    assert_eq!(call_count.load(std::sync::atomic::Ordering::SeqCst), 3);
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_partial_errors_tracked() {
+    let server = MockServer::start().await;
+
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
+            "data": {
+                "project": {
+                    "workItems": {
+                        "nodes": [{"iid": "1", "widgets": [
+                            {"__typename": "WorkItemWidgetStatus", "status": {"name": "To do"}}
+                        ]}],
+                        "pageInfo": {"endCursor": null, "hasNextPage": false}
+                    }
+                }
+            },
+            "errors": [{"message": "Rate limit warning: approaching limit"}]
+        })))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.partial_error_count, 1);
+    assert_eq!(
+        result.first_partial_error.as_deref(),
+        Some("Rate limit warning: approaching limit")
+    );
+    assert_eq!(result.statuses.len(), 1);
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_empty_project() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
+            "data": {
+                "project": {
+                    "workItems": {
+                        "nodes": [],
+                        "pageInfo": {"endCursor": null, "hasNextPage": false}
+                    }
+                }
+            }
+        })))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert!(result.statuses.is_empty());
+    assert!(result.all_fetched_iids.is_empty());
+    assert!(result.unsupported_reason.is_none());
+    assert_eq!(result.partial_error_count, 0);
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_null_status_in_widget() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(make_null_status_widget_page(42)))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert!(
+        result.statuses.is_empty(),
+        "Null status should not be in map"
+    );
+    assert!(
+        result.all_fetched_iids.contains(&42),
+        "IID should still be tracked in all_fetched_iids"
+    );
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_non_numeric_iid_skipped() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(
+            ResponseTemplate::new(200).set_body_json(serde_json::json!({
+                "data": {
+                    "project": {
+                        "workItems": {
+                            "nodes": [
+                                {
+                                    "iid": "not_a_number",
+                                    "widgets": [{"__typename": "WorkItemWidgetStatus", "status": {"name": "To do"}}]
+                                },
+                                {
+                                    "iid": "7",
+                                    "widgets": [{"__typename": "WorkItemWidgetStatus", "status": {"name": "Done"}}]
+                                }
+                            ],
+                            "pageInfo": {"endCursor": null, "hasNextPage": false}
+                        }
+                    }
+                }
+            })),
+        )
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.statuses.len(), 1);
+    assert!(result.statuses.contains_key(&7));
+    assert_eq!(result.all_fetched_iids.len(), 1);
+}
+
+#[tokio::test]
+async fn test_fetch_statuses_null_cursor_with_has_next_aborts() {
+    let server = MockServer::start().await;
+    Mock::given(method("POST"))
+        .and(path("/api/graphql"))
+        .respond_with(ResponseTemplate::new(200).set_body_json(serde_json::json!({
+            "data": {
+                "project": {
+                    "workItems": {
+                        "nodes": [{"iid": "1", "widgets": []}],
+                        "pageInfo": {"endCursor": null, "hasNextPage": true}
+                    }
+                }
+            }
+        })))
+        .mount(&server)
+        .await;
+
+    let client = GraphqlClient::new(&server.uri(), "tok123");
+    let result = fetch_issue_statuses(&client, "group/project")
+        .await
+        .unwrap();
+
+    assert_eq!(result.all_fetched_iids.len(), 1);
+}
--- a/src/gitlab/transformers/discussion.rs
+++ b/src/gitlab/transformers/discussion.rs
@@ -30,6 +30,7 @@ pub struct NormalizedNote {
    pub project_id: i64,
    pub note_type: Option<String>,
    pub is_system: bool,
+    pub author_id: Option<i64>,
    pub author_username: String,
    pub body: String,
    pub created_at: i64,
@@ -160,6 +161,7 @@ fn transform_single_note(
        project_id: local_project_id,
        note_type: note.note_type.clone(),
        is_system: note.system,
+        author_id: Some(note.author.id),
        author_username: note.author.username.clone(),
        body: note.body.clone(),
        created_at: parse_timestamp(&note.created_at),
@@ -265,6 +267,7 @@ fn transform_single_note_strict(
        project_id: local_project_id,
        note_type: note.note_type.clone(),
        is_system: note.system,
+        author_id: Some(note.author.id),
        author_username: note.author.username.clone(),
        body: note.body.clone(),
        created_at,
--- a/src/gitlab/transformers/issue.rs
+++ b/src/gitlab/transformers/issue.rs
@@ -93,170 +93,5 @@ pub fn transform_issue(issue: &GitLabIssue) -> Result<IssueWithMetadata, Transfo
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::gitlab::types::{GitLabAuthor, GitLabMilestone};
-
-    fn make_test_issue() -> GitLabIssue {
-        GitLabIssue {
-            id: 12345,
-            iid: 42,
-            project_id: 100,
-            title: "Test issue".to_string(),
-            description: Some("Description here".to_string()),
-            state: "opened".to_string(),
-            created_at: "2024-01-15T10:00:00.000Z".to_string(),
-            updated_at: "2024-01-20T15:30:00.000Z".to_string(),
-            closed_at: None,
-            author: GitLabAuthor {
-                id: 1,
-                username: "testuser".to_string(),
-                name: "Test User".to_string(),
-            },
-            assignees: vec![],
-            labels: vec!["bug".to_string(), "priority::high".to_string()],
-            milestone: None,
-            due_date: None,
-            web_url: "https://gitlab.example.com/group/project/-/issues/42".to_string(),
-        }
-    }
-
-    #[test]
-    fn transforms_issue_with_all_fields() {
-        let issue = make_test_issue();
-        let result = transform_issue(&issue).unwrap();
-
-        assert_eq!(result.issue.gitlab_id, 12345);
-        assert_eq!(result.issue.iid, 42);
-        assert_eq!(result.issue.project_id, 100);
-        assert_eq!(result.issue.title, "Test issue");
-        assert_eq!(
-            result.issue.description,
-            Some("Description here".to_string())
-        );
-        assert_eq!(result.issue.state, "opened");
-        assert_eq!(result.issue.author_username, "testuser");
-        assert_eq!(
-            result.issue.web_url,
-            "https://gitlab.example.com/group/project/-/issues/42"
-        );
-    }
-
-    #[test]
-    fn handles_missing_description() {
-        let mut issue = make_test_issue();
-        issue.description = None;
-
-        let result = transform_issue(&issue).unwrap();
-        assert!(result.issue.description.is_none());
-    }
-
-    #[test]
-    fn extracts_label_names() {
-        let issue = make_test_issue();
-        let result = transform_issue(&issue).unwrap();
-
-        assert_eq!(result.label_names.len(), 2);
-        assert_eq!(result.label_names[0], "bug");
-        assert_eq!(result.label_names[1], "priority::high");
-    }
-
-    #[test]
-    fn handles_empty_labels() {
-        let mut issue = make_test_issue();
-        issue.labels = vec![];
-
-        let result = transform_issue(&issue).unwrap();
-        assert!(result.label_names.is_empty());
-    }
-
-    #[test]
-    fn parses_timestamps_to_ms_epoch() {
-        let issue = make_test_issue();
-        let result = transform_issue(&issue).unwrap();
-
-        assert_eq!(result.issue.created_at, 1705312800000);
-        assert_eq!(result.issue.updated_at, 1705764600000);
-    }
-
-    #[test]
-    fn handles_timezone_offset_timestamps() {
-        let mut issue = make_test_issue();
-        issue.created_at = "2024-01-15T05:00:00-05:00".to_string();
-
-        let result = transform_issue(&issue).unwrap();
-        assert_eq!(result.issue.created_at, 1705312800000);
-    }
-
-    #[test]
-    fn extracts_assignee_usernames() {
-        let mut issue = make_test_issue();
-        issue.assignees = vec![
-            GitLabAuthor {
-                id: 2,
-                username: "alice".to_string(),
-                name: "Alice".to_string(),
-            },
-            GitLabAuthor {
-                id: 3,
-                username: "bob".to_string(),
-                name: "Bob".to_string(),
-            },
-        ];
-
-        let result = transform_issue(&issue).unwrap();
-        assert_eq!(result.assignee_usernames.len(), 2);
-        assert_eq!(result.assignee_usernames[0], "alice");
-        assert_eq!(result.assignee_usernames[1], "bob");
-    }
-
-    #[test]
-    fn extracts_milestone_info() {
-        let mut issue = make_test_issue();
-        issue.milestone = Some(GitLabMilestone {
-            id: 500,
-            iid: 5,
-            project_id: Some(100),
-            title: "v1.0".to_string(),
-            description: Some("First release".to_string()),
-            state: Some("active".to_string()),
-            due_date: Some("2024-02-01".to_string()),
-            web_url: Some("https://gitlab.example.com/-/milestones/5".to_string()),
-        });
-
-        let result = transform_issue(&issue).unwrap();
-
-        assert_eq!(result.issue.milestone_title, Some("v1.0".to_string()));
-
-        let milestone = result.milestone.expect("should have milestone");
-        assert_eq!(milestone.gitlab_id, 500);
-        assert_eq!(milestone.iid, 5);
-        assert_eq!(milestone.project_id, 100);
-        assert_eq!(milestone.title, "v1.0");
-        assert_eq!(milestone.description, Some("First release".to_string()));
-        assert_eq!(milestone.state, Some("active".to_string()));
-        assert_eq!(milestone.due_date, Some("2024-02-01".to_string()));
-        assert_eq!(
-            milestone.web_url,
-            Some("https://gitlab.example.com/-/milestones/5".to_string())
-        );
-    }
-
-    #[test]
-    fn handles_missing_milestone() {
-        let issue = make_test_issue();
-        let result = transform_issue(&issue).unwrap();
-
-        assert!(result.issue.milestone_title.is_none());
-        assert!(result.milestone.is_none());
-    }
-
-    #[test]
-    fn extracts_due_date() {
-        let mut issue = make_test_issue();
-        issue.due_date = Some("2024-02-15".to_string());
-
-        let result = transform_issue(&issue).unwrap();
-        assert_eq!(result.issue.due_date, Some("2024-02-15".to_string()));
-    }
-}
+#[path = "issue_tests.rs"]
+mod tests;
--- a/src/gitlab/transformers/issue_tests.rs
+++ b/src/gitlab/transformers/issue_tests.rs
@@ -0,0 +1,165 @@
+use super::*;
+use crate::gitlab::types::{GitLabAuthor, GitLabMilestone};
+
+fn make_test_issue() -> GitLabIssue {
+    GitLabIssue {
+        id: 12345,
+        iid: 42,
+        project_id: 100,
+        title: "Test issue".to_string(),
+        description: Some("Description here".to_string()),
+        state: "opened".to_string(),
+        created_at: "2024-01-15T10:00:00.000Z".to_string(),
+        updated_at: "2024-01-20T15:30:00.000Z".to_string(),
+        closed_at: None,
+        author: GitLabAuthor {
+            id: 1,
+            username: "testuser".to_string(),
+            name: "Test User".to_string(),
+        },
+        assignees: vec![],
+        labels: vec!["bug".to_string(), "priority::high".to_string()],
+        milestone: None,
+        due_date: None,
+        web_url: "https://gitlab.example.com/group/project/-/issues/42".to_string(),
+    }
+}
+
+#[test]
+fn transforms_issue_with_all_fields() {
+    let issue = make_test_issue();
+    let result = transform_issue(&issue).unwrap();
+
+    assert_eq!(result.issue.gitlab_id, 12345);
+    assert_eq!(result.issue.iid, 42);
+    assert_eq!(result.issue.project_id, 100);
+    assert_eq!(result.issue.title, "Test issue");
+    assert_eq!(
+        result.issue.description,
+        Some("Description here".to_string())
+    );
+    assert_eq!(result.issue.state, "opened");
+    assert_eq!(result.issue.author_username, "testuser");
+    assert_eq!(
+        result.issue.web_url,
+        "https://gitlab.example.com/group/project/-/issues/42"
+    );
+}
+
+#[test]
+fn handles_missing_description() {
+    let mut issue = make_test_issue();
+    issue.description = None;
+
+    let result = transform_issue(&issue).unwrap();
+    assert!(result.issue.description.is_none());
+}
+
+#[test]
+fn extracts_label_names() {
+    let issue = make_test_issue();
+    let result = transform_issue(&issue).unwrap();
+
+    assert_eq!(result.label_names.len(), 2);
+    assert_eq!(result.label_names[0], "bug");
+    assert_eq!(result.label_names[1], "priority::high");
+}
+
+#[test]
+fn handles_empty_labels() {
+    let mut issue = make_test_issue();
+    issue.labels = vec![];
+
+    let result = transform_issue(&issue).unwrap();
+    assert!(result.label_names.is_empty());
+}
+
+#[test]
+fn parses_timestamps_to_ms_epoch() {
+    let issue = make_test_issue();
+    let result = transform_issue(&issue).unwrap();
+
+    assert_eq!(result.issue.created_at, 1705312800000);
+    assert_eq!(result.issue.updated_at, 1705764600000);
+}
+
+#[test]
+fn handles_timezone_offset_timestamps() {
+    let mut issue = make_test_issue();
+    issue.created_at = "2024-01-15T05:00:00-05:00".to_string();
+
+    let result = transform_issue(&issue).unwrap();
+    assert_eq!(result.issue.created_at, 1705312800000);
+}
+
+#[test]
+fn extracts_assignee_usernames() {
+    let mut issue = make_test_issue();
+    issue.assignees = vec![
+        GitLabAuthor {
+            id: 2,
+            username: "alice".to_string(),
+            name: "Alice".to_string(),
+        },
+        GitLabAuthor {
+            id: 3,
+            username: "bob".to_string(),
+            name: "Bob".to_string(),
+        },
+    ];
+
+    let result = transform_issue(&issue).unwrap();
+    assert_eq!(result.assignee_usernames.len(), 2);
+    assert_eq!(result.assignee_usernames[0], "alice");
+    assert_eq!(result.assignee_usernames[1], "bob");
+}
+
+#[test]
+fn extracts_milestone_info() {
+    let mut issue = make_test_issue();
+    issue.milestone = Some(GitLabMilestone {
+        id: 500,
+        iid: 5,
+        project_id: Some(100),
+        title: "v1.0".to_string(),
+        description: Some("First release".to_string()),
+        state: Some("active".to_string()),
+        due_date: Some("2024-02-01".to_string()),
+        web_url: Some("https://gitlab.example.com/-/milestones/5".to_string()),
+    });
+
+    let result = transform_issue(&issue).unwrap();
+
+    assert_eq!(result.issue.milestone_title, Some("v1.0".to_string()));
+
+    let milestone = result.milestone.expect("should have milestone");
+    assert_eq!(milestone.gitlab_id, 500);
+    assert_eq!(milestone.iid, 5);
+    assert_eq!(milestone.project_id, 100);
+    assert_eq!(milestone.title, "v1.0");
+    assert_eq!(milestone.description, Some("First release".to_string()));
+    assert_eq!(milestone.state, Some("active".to_string()));
+    assert_eq!(milestone.due_date, Some("2024-02-01".to_string()));
+    assert_eq!(
+        milestone.web_url,
+        Some("https://gitlab.example.com/-/milestones/5".to_string())
+    );
+}
+
+#[test]
+fn handles_missing_milestone() {
+    let issue = make_test_issue();
+    let result = transform_issue(&issue).unwrap();
+
+    assert!(result.issue.milestone_title.is_none());
+    assert!(result.milestone.is_none());
+}
+
+#[test]
+fn extracts_due_date() {
+    let mut issue = make_test_issue();
+    issue.due_date = Some("2024-02-15".to_string());
+
+    let result = transform_issue(&issue).unwrap();
+    assert_eq!(result.issue.due_date, Some("2024-02-15".to_string()));
+}
--- a/src/ingestion/dirty_tracker.rs
+++ b/src/ingestion/dirty_tracker.rs
@@ -124,158 +124,5 @@ pub fn record_dirty_error(
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-
-    fn setup_db() -> Connection {
-        let conn = Connection::open_in_memory().unwrap();
-        conn.execute_batch("
-            CREATE TABLE dirty_sources (
-                source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion')),
-                source_id INTEGER NOT NULL,
-                queued_at INTEGER NOT NULL,
-                attempt_count INTEGER NOT NULL DEFAULT 0,
-                last_attempt_at INTEGER,
-                last_error TEXT,
-                next_attempt_at INTEGER,
-                PRIMARY KEY(source_type, source_id)
-            );
-            CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
-        ").unwrap();
-        conn
-    }
-
-    #[test]
-    fn test_mark_dirty_inserts() {
-        let conn = setup_db();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-
-        let count: i64 = conn
-            .query_row("SELECT COUNT(*) FROM dirty_sources", [], |r| r.get(0))
-            .unwrap();
-        assert_eq!(count, 1);
-    }
-
-    #[test]
-    fn test_mark_dirty_tx_inserts() {
-        let mut conn = setup_db();
-        {
-            let tx = conn.transaction().unwrap();
-            mark_dirty_tx(&tx, SourceType::Issue, 1).unwrap();
-            tx.commit().unwrap();
-        }
-        let count: i64 = conn
-            .query_row("SELECT COUNT(*) FROM dirty_sources", [], |r| r.get(0))
-            .unwrap();
-        assert_eq!(count, 1);
-    }
-
-    #[test]
-    fn test_requeue_resets_backoff() {
-        let conn = setup_db();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        record_dirty_error(&conn, SourceType::Issue, 1, "test error").unwrap();
-
-        let attempt: i64 = conn
-            .query_row(
-                "SELECT attempt_count FROM dirty_sources WHERE source_id = 1",
-                [],
-                |r| r.get(0),
-            )
-            .unwrap();
-        assert_eq!(attempt, 1);
-
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        let attempt: i64 = conn
-            .query_row(
-                "SELECT attempt_count FROM dirty_sources WHERE source_id = 1",
-                [],
-                |r| r.get(0),
-            )
-            .unwrap();
-        assert_eq!(attempt, 0);
-
-        let next_at: Option<i64> = conn
-            .query_row(
-                "SELECT next_attempt_at FROM dirty_sources WHERE source_id = 1",
-                [],
-                |r| r.get(0),
-            )
-            .unwrap();
-        assert!(next_at.is_none());
-    }
-
-    #[test]
-    fn test_get_respects_backoff() {
-        let conn = setup_db();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        conn.execute(
-            "UPDATE dirty_sources SET next_attempt_at = 9999999999999 WHERE source_id = 1",
-            [],
-        )
-        .unwrap();
-
-        let results = get_dirty_sources(&conn).unwrap();
-        assert!(results.is_empty());
-    }
-
-    #[test]
-    fn test_get_orders_by_attempt_count() {
-        let conn = setup_db();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        conn.execute(
-            "UPDATE dirty_sources SET attempt_count = 2 WHERE source_id = 1",
-            [],
-        )
-        .unwrap();
-        mark_dirty(&conn, SourceType::Issue, 2).unwrap();
-
-        let results = get_dirty_sources(&conn).unwrap();
-        assert_eq!(results.len(), 2);
-        assert_eq!(results[0].1, 2);
-        assert_eq!(results[1].1, 1);
-    }
-
-    #[test]
-    fn test_batch_size_500() {
-        let conn = setup_db();
-        for i in 0..600 {
-            mark_dirty(&conn, SourceType::Issue, i).unwrap();
-        }
-        let results = get_dirty_sources(&conn).unwrap();
-        assert_eq!(results.len(), 500);
-    }
-
-    #[test]
-    fn test_clear_removes() {
-        let conn = setup_db();
-        mark_dirty(&conn, SourceType::Issue, 1).unwrap();
-        clear_dirty(&conn, SourceType::Issue, 1).unwrap();
-
-        let count: i64 = conn
-            .query_row("SELECT COUNT(*) FROM dirty_sources", [], |r| r.get(0))
-            .unwrap();
-        assert_eq!(count, 0);
-    }
-
-    #[test]
-    fn test_drain_loop() {
-        let conn = setup_db();
-        for i in 0..1200 {
-            mark_dirty(&conn, SourceType::Issue, i).unwrap();
-        }
-
-        let mut total = 0;
-        loop {
-            let batch = get_dirty_sources(&conn).unwrap();
-            if batch.is_empty() {
-                break;
-            }
-            for (st, id) in &batch {
-                clear_dirty(&conn, *st, *id).unwrap();
-            }
-            total += batch.len();
-        }
-        assert_eq!(total, 1200);
-    }
-}
+#[path = "dirty_tracker_tests.rs"]
+mod tests;
--- a/src/ingestion/dirty_tracker_tests.rs
+++ b/src/ingestion/dirty_tracker_tests.rs
@@ -0,0 +1,168 @@
+use super::*;
+
+fn setup_db() -> Connection {
+    let conn = Connection::open_in_memory().unwrap();
+    conn.execute_batch("
+        CREATE TABLE dirty_sources (
+            source_type TEXT NOT NULL CHECK (source_type IN ('issue','merge_request','discussion','note')),
+            source_id INTEGER NOT NULL,
+            queued_at INTEGER NOT NULL,
+            attempt_count INTEGER NOT NULL DEFAULT 0,
+            last_attempt_at INTEGER,
+            last_error TEXT,
+            next_attempt_at INTEGER,
+            PRIMARY KEY(source_type, source_id)
+        );
+        CREATE INDEX idx_dirty_sources_next_attempt ON dirty_sources(next_attempt_at);
+    ").unwrap();
+    conn
+}
+
+#[test]
+fn test_mark_dirty_inserts() {
+    let conn = setup_db();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+
+    let count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM dirty_sources", [], |r| r.get(0))
+        .unwrap();
+    assert_eq!(count, 1);
+}
+
+#[test]
+fn test_mark_dirty_tx_inserts() {
+    let mut conn = setup_db();
+    {
+        let tx = conn.transaction().unwrap();
+        mark_dirty_tx(&tx, SourceType::Issue, 1).unwrap();
+        tx.commit().unwrap();
+    }
+    let count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM dirty_sources", [], |r| r.get(0))
+        .unwrap();
+    assert_eq!(count, 1);
+}
+
+#[test]
+fn test_requeue_resets_backoff() {
+    let conn = setup_db();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    record_dirty_error(&conn, SourceType::Issue, 1, "test error").unwrap();
+
+    let attempt: i64 = conn
+        .query_row(
+            "SELECT attempt_count FROM dirty_sources WHERE source_id = 1",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert_eq!(attempt, 1);
+
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    let attempt: i64 = conn
+        .query_row(
+            "SELECT attempt_count FROM dirty_sources WHERE source_id = 1",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert_eq!(attempt, 0);
+
+    let next_at: Option<i64> = conn
+        .query_row(
+            "SELECT next_attempt_at FROM dirty_sources WHERE source_id = 1",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert!(next_at.is_none());
+}
+
+#[test]
+fn test_get_respects_backoff() {
+    let conn = setup_db();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    conn.execute(
+        "UPDATE dirty_sources SET next_attempt_at = 9999999999999 WHERE source_id = 1",
+        [],
+    )
+    .unwrap();
+
+    let results = get_dirty_sources(&conn).unwrap();
+    assert!(results.is_empty());
+}
+
+#[test]
+fn test_get_orders_by_attempt_count() {
+    let conn = setup_db();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    conn.execute(
+        "UPDATE dirty_sources SET attempt_count = 2 WHERE source_id = 1",
+        [],
+    )
+    .unwrap();
+    mark_dirty(&conn, SourceType::Issue, 2).unwrap();
+
+    let results = get_dirty_sources(&conn).unwrap();
+    assert_eq!(results.len(), 2);
+    assert_eq!(results[0].1, 2);
+    assert_eq!(results[1].1, 1);
+}
+
+#[test]
+fn test_batch_size_500() {
+    let conn = setup_db();
+    for i in 0..600 {
+        mark_dirty(&conn, SourceType::Issue, i).unwrap();
+    }
+    let results = get_dirty_sources(&conn).unwrap();
+    assert_eq!(results.len(), 500);
+}
+
+#[test]
+fn test_clear_removes() {
+    let conn = setup_db();
+    mark_dirty(&conn, SourceType::Issue, 1).unwrap();
+    clear_dirty(&conn, SourceType::Issue, 1).unwrap();
+
+    let count: i64 = conn
+        .query_row("SELECT COUNT(*) FROM dirty_sources", [], |r| r.get(0))
+        .unwrap();
+    assert_eq!(count, 0);
+}
+
+#[test]
+fn test_mark_dirty_note_type() {
+    let conn = setup_db();
+    mark_dirty(&conn, SourceType::Note, 42).unwrap();
+
+    let results = get_dirty_sources(&conn).unwrap();
+    assert_eq!(results.len(), 1);
+    assert_eq!(results[0].0, SourceType::Note);
+    assert_eq!(results[0].1, 42);
+
+    clear_dirty(&conn, SourceType::Note, 42).unwrap();
+    let results = get_dirty_sources(&conn).unwrap();
+    assert!(results.is_empty());
+}
+
+#[test]
+fn test_drain_loop() {
+    let conn = setup_db();
+    for i in 0..1200 {
+        mark_dirty(&conn, SourceType::Issue, i).unwrap();
+    }
+
+    let mut total = 0;
+    loop {
+        let batch = get_dirty_sources(&conn).unwrap();
+        if batch.is_empty() {
+            break;
+        }
+        for (st, id) in &batch {
+            clear_dirty(&conn, *st, *id).unwrap();
+        }
+        total += batch.len();
+    }
+    assert_eq!(total, 1200);
+}
--- a/src/ingestion/discussions.rs
+++ b/src/ingestion/discussions.rs
@@ -1,17 +1,26 @@
 use futures::StreamExt;
-use rusqlite::Connection;
+use rusqlite::{Connection, params};
 use tracing::{debug, warn};

 use crate::Config;
 use crate::core::error::Result;
 use crate::core::payloads::{StorePayloadOptions, store_payload};
+use crate::core::time::now_ms;
 use crate::documents::SourceType;
 use crate::gitlab::GitLabClient;
-use crate::gitlab::transformers::{NoteableRef, transform_discussion, transform_notes};
+use crate::gitlab::transformers::{
+    NormalizedNote, NoteableRef, transform_discussion, transform_notes,
+};
 use crate::ingestion::dirty_tracker;

 use super::issues::IssueForDiscussionSync;

+#[derive(Debug)]
+pub struct NoteUpsertOutcome {
+    pub local_note_id: i64,
+    pub changed_semantics: bool,
+}
+
 #[derive(Debug, Default)]
 pub struct IngestDiscussionsResult {
    pub discussions_fetched: usize,
@@ -80,6 +89,8 @@ async fn ingest_discussions_for_issue(
    let mut seen_discussion_ids: Vec<String> = Vec::new();
    let mut pagination_error: Option<crate::core::error::LoreError> = None;

+    let run_seen_at = now_ms();
+
    while let Some(disc_result) = discussions_stream.next().await {
        let gitlab_discussion = match disc_result {
            Ok(d) => d,
@@ -126,18 +137,29 @@ async fn ingest_discussions_for_issue(

        dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?;

+        // Mark child note documents dirty (they inherit parent metadata)
+        tx.execute(
+            "INSERT INTO dirty_sources (source_type, source_id, queued_at)
+             SELECT 'note', n.id, ?1
+             FROM notes n
+             WHERE n.discussion_id = ?2 AND n.is_system = 0
+             ON CONFLICT(source_type, source_id) DO UPDATE SET queued_at = excluded.queued_at, attempt_count = 0",
+            params![now_ms(), local_discussion_id],
+        )?;
+
        let notes = transform_notes(&gitlab_discussion, local_project_id);
        let notes_count = notes.len();

-        tx.execute(
-            "DELETE FROM notes WHERE discussion_id = ?",
-            [local_discussion_id],
-        )?;
-
        for note in notes {
-            insert_note(&tx, local_discussion_id, &note, None)?;
+            let outcome =
+                upsert_note_for_issue(&tx, local_discussion_id, &note, run_seen_at, None)?;
+            if !note.is_system && outcome.changed_semantics {
+                dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
+            }
        }

+        sweep_stale_issue_notes(&tx, local_discussion_id, run_seen_at)?;
+
        tx.commit()?;

        result.discussions_upserted += 1;
@@ -198,38 +220,182 @@ fn upsert_discussion(
    Ok(())
 }

-fn insert_note(
+fn upsert_note_for_issue(
    conn: &Connection,
    discussion_id: i64,
-    note: &crate::gitlab::transformers::NormalizedNote,
+    note: &NormalizedNote,
+    last_seen_at: i64,
    payload_id: Option<i64>,
-) -> Result<()> {
+) -> Result<NoteUpsertOutcome> {
+    // Pre-read for semantic change detection
+    let existing = conn
+        .query_row(
+            "SELECT id, body, note_type, resolved, resolved_by,
+                    position_old_path, position_new_path, position_old_line, position_new_line,
+                    position_type, position_line_range_start, position_line_range_end,
+                    position_base_sha, position_start_sha, position_head_sha
+             FROM notes WHERE gitlab_id = ?",
+            params![note.gitlab_id],
+            |row| {
+                Ok((
+                    row.get::<_, i64>(0)?,
+                    row.get::<_, String>(1)?,
+                    row.get::<_, Option<String>>(2)?,
+                    row.get::<_, bool>(3)?,
+                    row.get::<_, Option<String>>(4)?,
+                    row.get::<_, Option<String>>(5)?,
+                    row.get::<_, Option<String>>(6)?,
+                    row.get::<_, Option<i32>>(7)?,
+                    row.get::<_, Option<i32>>(8)?,
+                    row.get::<_, Option<String>>(9)?,
+                    row.get::<_, Option<i32>>(10)?,
+                    row.get::<_, Option<i32>>(11)?,
+                    row.get::<_, Option<String>>(12)?,
+                    row.get::<_, Option<String>>(13)?,
+                    row.get::<_, Option<String>>(14)?,
+                ))
+            },
+        )
+        .ok();
+
+    let changed_semantics = match &existing {
+        Some((
+            _id,
+            body,
+            note_type,
+            resolved,
+            resolved_by,
+            pos_old_path,
+            pos_new_path,
+            pos_old_line,
+            pos_new_line,
+            pos_type,
+            pos_range_start,
+            pos_range_end,
+            pos_base_sha,
+            pos_start_sha,
+            pos_head_sha,
+        )) => {
+            *body != note.body
+                || *note_type != note.note_type
+                || *resolved != note.resolved
+                || *resolved_by != note.resolved_by
+                || *pos_old_path != note.position_old_path
+                || *pos_new_path != note.position_new_path
+                || *pos_old_line != note.position_old_line
+                || *pos_new_line != note.position_new_line
+                || *pos_type != note.position_type
+                || *pos_range_start != note.position_line_range_start
+                || *pos_range_end != note.position_line_range_end
+                || *pos_base_sha != note.position_base_sha
+                || *pos_start_sha != note.position_start_sha
+                || *pos_head_sha != note.position_head_sha
+        }
+        None => true,
+    };
+
    conn.execute(
        "INSERT INTO notes (
            gitlab_id, discussion_id, project_id, note_type, is_system,
-            author_username, body, created_at, updated_at, last_seen_at,
-            position, resolvable, resolved, resolved_by, resolved_at, raw_payload_id
-        ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16)",
-        (
+            author_id, author_username, body, created_at, updated_at, last_seen_at,
+            position, resolvable, resolved, resolved_by, resolved_at,
+            position_old_path, position_new_path, position_old_line, position_new_line,
+            position_type, position_line_range_start, position_line_range_end,
+            position_base_sha, position_start_sha, position_head_sha,
+            raw_payload_id
+        ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18, ?19, ?20, ?21, ?22, ?23, ?24, ?25, ?26, ?27)
+        ON CONFLICT(gitlab_id) DO UPDATE SET
+            body = excluded.body,
+            note_type = excluded.note_type,
+            author_id = excluded.author_id,
+            updated_at = excluded.updated_at,
+            last_seen_at = excluded.last_seen_at,
+            resolvable = excluded.resolvable,
+            resolved = excluded.resolved,
+            resolved_by = excluded.resolved_by,
+            resolved_at = excluded.resolved_at,
+            position_old_path = excluded.position_old_path,
+            position_new_path = excluded.position_new_path,
+            position_old_line = excluded.position_old_line,
+            position_new_line = excluded.position_new_line,
+            position_type = excluded.position_type,
+            position_line_range_start = excluded.position_line_range_start,
+            position_line_range_end = excluded.position_line_range_end,
+            position_base_sha = excluded.position_base_sha,
+            position_start_sha = excluded.position_start_sha,
+            position_head_sha = excluded.position_head_sha,
+            raw_payload_id = COALESCE(excluded.raw_payload_id, raw_payload_id)",
+        params![
            note.gitlab_id,
            discussion_id,
            note.project_id,
            &note.note_type,
            note.is_system,
+            note.author_id,
            &note.author_username,
            &note.body,
            note.created_at,
            note.updated_at,
-            note.last_seen_at,
+            last_seen_at,
            note.position,
            note.resolvable,
            note.resolved,
            &note.resolved_by,
            note.resolved_at,
+            &note.position_old_path,
+            &note.position_new_path,
+            note.position_old_line,
+            note.position_new_line,
+            &note.position_type,
+            note.position_line_range_start,
+            note.position_line_range_end,
+            &note.position_base_sha,
+            &note.position_start_sha,
+            &note.position_head_sha,
            payload_id,
-        ),
+        ],
    )?;
-    Ok(())
+
+    let local_note_id: i64 = conn.query_row(
+        "SELECT id FROM notes WHERE gitlab_id = ?",
+        params![note.gitlab_id],
+        |row| row.get(0),
+    )?;
+
+    Ok(NoteUpsertOutcome {
+        local_note_id,
+        changed_semantics,
+    })
+}
+
+fn sweep_stale_issue_notes(
+    conn: &Connection,
+    discussion_id: i64,
+    last_seen_at: i64,
+) -> Result<usize> {
+    // Step 1: Delete note documents for stale notes
+    conn.execute(
+        "DELETE FROM documents WHERE source_type = 'note' AND source_id IN
+         (SELECT id FROM notes WHERE discussion_id = ?1 AND last_seen_at < ?2 AND is_system = 0)",
+        params![discussion_id, last_seen_at],
+    )?;
+
+    // Step 2: Delete dirty_sources entries for stale notes
+    conn.execute(
+        "DELETE FROM dirty_sources WHERE source_type = 'note' AND source_id IN
+         (SELECT id FROM notes WHERE discussion_id = ?1 AND last_seen_at < ?2 AND is_system = 0)",
+        params![discussion_id, last_seen_at],
+    )?;
+
+    // Step 3: Delete the stale notes themselves
+    let deleted = conn.execute(
+        "DELETE FROM notes WHERE discussion_id = ?1 AND last_seen_at < ?2",
+        params![discussion_id, last_seen_at],
+    )?;
+    if deleted > 0 {
+        debug!(discussion_id, deleted, "Swept stale issue notes");
+    }
+    Ok(deleted)
 }

 fn remove_stale_discussions(
@@ -301,14 +467,5 @@ fn update_issue_sync_timestamp(conn: &Connection, issue_id: i64, updated_at: i64
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn result_default_has_zero_counts() {
-        let result = IngestDiscussionsResult::default();
-        assert_eq!(result.discussions_fetched, 0);
-        assert_eq!(result.discussions_upserted, 0);
-        assert_eq!(result.notes_upserted, 0);
-    }
-}
+#[path = "discussions_tests.rs"]
+mod tests;
--- a/src/ingestion/discussions_tests.rs
+++ b/src/ingestion/discussions_tests.rs
@@ -0,0 +1,470 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use crate::gitlab::transformers::NormalizedNote;
+use std::path::Path;
+
+#[test]
+fn result_default_has_zero_counts() {
+    let result = IngestDiscussionsResult::default();
+    assert_eq!(result.discussions_fetched, 0);
+    assert_eq!(result.discussions_upserted, 0);
+    assert_eq!(result.notes_upserted, 0);
+}
+
+fn setup() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) \
+         VALUES (1, 'group/repo', 'https://gitlab.com/group/repo')",
+        [],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO issues (gitlab_id, iid, project_id, title, state, author_username, created_at, updated_at, last_seen_at) \
+         VALUES (100, 1, 1, 'Test Issue', 'opened', 'testuser', 1000, 2000, 3000)",
+        [],
+    )
+    .unwrap();
+
+    conn.execute(
+        "INSERT INTO discussions (gitlab_discussion_id, project_id, issue_id, noteable_type, individual_note, last_seen_at, resolvable, resolved) \
+         VALUES ('disc-1', 1, 1, 'Issue', 0, 3000, 0, 0)",
+        [],
+    )
+    .unwrap();
+
+    conn
+}
+
+fn get_discussion_id(conn: &Connection) -> i64 {
+    conn.query_row("SELECT id FROM discussions LIMIT 1", [], |row| row.get(0))
+        .unwrap()
+}
+
+#[allow(clippy::too_many_arguments)]
+fn make_note(
+    gitlab_id: i64,
+    project_id: i64,
+    body: &str,
+    note_type: Option<&str>,
+    created_at: i64,
+    updated_at: i64,
+    resolved: bool,
+    resolved_by: Option<&str>,
+) -> NormalizedNote {
+    NormalizedNote {
+        gitlab_id,
+        project_id,
+        note_type: note_type.map(String::from),
+        is_system: false,
+        author_id: None,
+        author_username: "testuser".to_string(),
+        body: body.to_string(),
+        created_at,
+        updated_at,
+        last_seen_at: updated_at,
+        position: 0,
+        resolvable: false,
+        resolved,
+        resolved_by: resolved_by.map(String::from),
+        resolved_at: None,
+        position_old_path: None,
+        position_new_path: None,
+        position_old_line: None,
+        position_new_line: None,
+        position_type: None,
+        position_line_range_start: None,
+        position_line_range_end: None,
+        position_base_sha: None,
+        position_start_sha: None,
+        position_head_sha: None,
+    }
+}
+
+#[test]
+fn test_issue_note_upsert_stable_id() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+    let last_seen_at = 5000;
+
+    let note1 = make_note(1001, 1, "First note", None, 1000, 2000, false, None);
+    let note2 = make_note(1002, 1, "Second note", None, 1000, 2000, false, None);
+
+    let out1 = upsert_note_for_issue(&conn, disc_id, &note1, last_seen_at, None).unwrap();
+    let out2 = upsert_note_for_issue(&conn, disc_id, &note2, last_seen_at, None).unwrap();
+    let id1 = out1.local_note_id;
+    let id2 = out2.local_note_id;
+
+    // Re-sync same gitlab_ids
+    let out1b = upsert_note_for_issue(&conn, disc_id, &note1, last_seen_at + 1, None).unwrap();
+    let out2b = upsert_note_for_issue(&conn, disc_id, &note2, last_seen_at + 1, None).unwrap();
+
+    assert_eq!(id1, out1b.local_note_id);
+    assert_eq!(id2, out2b.local_note_id);
+}
+
+#[test]
+fn test_issue_note_upsert_detects_body_change() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let note = make_note(2001, 1, "Original body", None, 1000, 2000, false, None);
+    upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    let mut changed = make_note(2001, 1, "Updated body", None, 1000, 3000, false, None);
+    changed.updated_at = 3000;
+    let outcome = upsert_note_for_issue(&conn, disc_id, &changed, 5001, None).unwrap();
+    assert!(outcome.changed_semantics);
+}
+
+#[test]
+fn test_issue_note_upsert_unchanged_returns_false() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let note = make_note(3001, 1, "Same body", None, 1000, 2000, false, None);
+    upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    // Re-sync identical note
+    let outcome = upsert_note_for_issue(&conn, disc_id, &note, 5001, None).unwrap();
+    assert!(!outcome.changed_semantics);
+}
+
+#[test]
+fn test_issue_note_upsert_updated_at_only_does_not_mark_semantic_change() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let note = make_note(4001, 1, "Body stays", None, 1000, 2000, false, None);
+    upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    // Only change updated_at (non-semantic field)
+    let mut same = make_note(4001, 1, "Body stays", None, 1000, 9999, false, None);
+    same.updated_at = 9999;
+    let outcome = upsert_note_for_issue(&conn, disc_id, &same, 5001, None).unwrap();
+    assert!(!outcome.changed_semantics);
+}
+
+#[test]
+fn test_issue_note_sweep_removes_stale() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let note1 = make_note(5001, 1, "Keep me", None, 1000, 2000, false, None);
+    let note2 = make_note(5002, 1, "Stale me", None, 1000, 2000, false, None);
+
+    upsert_note_for_issue(&conn, disc_id, &note1, 5000, None).unwrap();
+    upsert_note_for_issue(&conn, disc_id, &note2, 5000, None).unwrap();
+
+    // Re-sync only note1 with newer timestamp
+    upsert_note_for_issue(&conn, disc_id, &note1, 6000, None).unwrap();
+
+    // Sweep should remove note2 (last_seen_at=5000 < 6000)
+    let swept = sweep_stale_issue_notes(&conn, disc_id, 6000).unwrap();
+    assert_eq!(swept, 1);
+
+    let count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM notes WHERE discussion_id = ?",
+            [disc_id],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(count, 1);
+}
+
+#[test]
+fn test_issue_note_upsert_returns_local_id() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let note = make_note(6001, 1, "Check my ID", None, 1000, 2000, false, None);
+    let outcome = upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    // Verify the local_note_id matches what's in the DB
+    let db_id: i64 = conn
+        .query_row(
+            "SELECT id FROM notes WHERE gitlab_id = ?",
+            [6001_i64],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(outcome.local_note_id, db_id);
+}
+
+#[test]
+fn test_issue_note_upsert_captures_author_id() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let mut note = make_note(7001, 1, "With author", None, 1000, 2000, false, None);
+    note.author_id = Some(12345);
+
+    upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    let stored: Option<i64> = conn
+        .query_row(
+            "SELECT author_id FROM notes WHERE gitlab_id = ?",
+            [7001_i64],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(stored, Some(12345));
+}
+
+#[test]
+fn test_note_upsert_author_id_nullable() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let note = make_note(7002, 1, "No author id", None, 1000, 2000, false, None);
+    // author_id defaults to None in make_note
+
+    upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    let stored: Option<i64> = conn
+        .query_row(
+            "SELECT author_id FROM notes WHERE gitlab_id = ?",
+            [7002_i64],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(stored, None);
+}
+
+#[test]
+fn test_note_author_id_survives_username_change() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let mut note = make_note(7003, 1, "Original body", None, 1000, 2000, false, None);
+    note.author_id = Some(99999);
+    note.author_username = "oldname".to_string();
+
+    upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    // Re-sync with changed username, changed body, same author_id
+    let mut updated = make_note(7003, 1, "Updated body", None, 1000, 3000, false, None);
+    updated.author_id = Some(99999);
+    updated.author_username = "newname".to_string();
+
+    upsert_note_for_issue(&conn, disc_id, &updated, 5001, None).unwrap();
+
+    // author_id must survive the re-sync intact
+    let stored_id: Option<i64> = conn
+        .query_row(
+            "SELECT author_id FROM notes WHERE gitlab_id = ?",
+            [7003_i64],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(stored_id, Some(99999));
+}
+
+fn insert_note_document(conn: &Connection, note_local_id: i64) {
+    conn.execute(
+        "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) \
+         VALUES ('note', ?1, 1, 'note content', 'hash123')",
+        [note_local_id],
+    )
+    .unwrap();
+}
+
+fn insert_note_dirty_source(conn: &Connection, note_local_id: i64) {
+    conn.execute(
+        "INSERT INTO dirty_sources (source_type, source_id, queued_at) \
+         VALUES ('note', ?1, 1000)",
+        [note_local_id],
+    )
+    .unwrap();
+}
+
+fn count_note_documents(conn: &Connection, note_local_id: i64) -> i64 {
+    conn.query_row(
+        "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?",
+        [note_local_id],
+        |row| row.get(0),
+    )
+    .unwrap()
+}
+
+fn count_note_dirty_sources(conn: &Connection, note_local_id: i64) -> i64 {
+    conn.query_row(
+        "SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note' AND source_id = ?",
+        [note_local_id],
+        |row| row.get(0),
+    )
+    .unwrap()
+}
+
+#[test]
+fn test_issue_note_sweep_deletes_note_documents_immediately() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    // Insert 3 notes
+    let note1 = make_note(9001, 1, "Keep me", None, 1000, 2000, false, None);
+    let note2 = make_note(9002, 1, "Keep me too", None, 1000, 2000, false, None);
+    let note3 = make_note(9003, 1, "Stale me", None, 1000, 2000, false, None);
+
+    let out1 = upsert_note_for_issue(&conn, disc_id, &note1, 5000, None).unwrap();
+    let out2 = upsert_note_for_issue(&conn, disc_id, &note2, 5000, None).unwrap();
+    let out3 = upsert_note_for_issue(&conn, disc_id, &note3, 5000, None).unwrap();
+
+    // Add documents for all 3
+    insert_note_document(&conn, out1.local_note_id);
+    insert_note_document(&conn, out2.local_note_id);
+    insert_note_document(&conn, out3.local_note_id);
+
+    // Add dirty_sources for note3
+    insert_note_dirty_source(&conn, out3.local_note_id);
+
+    // Re-sync only notes 1 and 2 with newer timestamp
+    upsert_note_for_issue(&conn, disc_id, &note1, 6000, None).unwrap();
+    upsert_note_for_issue(&conn, disc_id, &note2, 6000, None).unwrap();
+
+    // Sweep should remove note3 and its document + dirty_source
+    sweep_stale_issue_notes(&conn, disc_id, 6000).unwrap();
+
+    // Stale note's document should be gone
+    assert_eq!(count_note_documents(&conn, out3.local_note_id), 0);
+    assert_eq!(count_note_dirty_sources(&conn, out3.local_note_id), 0);
+
+    // Kept notes' documents should survive
+    assert_eq!(count_note_documents(&conn, out1.local_note_id), 1);
+    assert_eq!(count_note_documents(&conn, out2.local_note_id), 1);
+}
+
+#[test]
+fn test_sweep_deletion_handles_note_without_document() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    let note = make_note(9004, 1, "No doc", None, 1000, 2000, false, None);
+    upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    // Don't insert any document -- sweep should still work without error
+    let swept = sweep_stale_issue_notes(&conn, disc_id, 6000).unwrap();
+    assert_eq!(swept, 1);
+}
+
+#[test]
+fn test_set_based_deletion_atomicity() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    // Insert a stale note with both document and dirty_source
+    let note = make_note(9005, 1, "Stale with deps", None, 1000, 2000, false, None);
+    let out = upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+    insert_note_document(&conn, out.local_note_id);
+    insert_note_dirty_source(&conn, out.local_note_id);
+
+    // Verify they exist before sweep
+    assert_eq!(count_note_documents(&conn, out.local_note_id), 1);
+    assert_eq!(count_note_dirty_sources(&conn, out.local_note_id), 1);
+
+    // The sweep function already runs inside a transaction (called from
+    // ingest_discussions_for_issue's tx). Simulate by wrapping in a transaction.
+    let tx = conn.unchecked_transaction().unwrap();
+    sweep_stale_issue_notes(&tx, disc_id, 6000).unwrap();
+    tx.commit().unwrap();
+
+    // All three DELETEs must have happened
+    assert_eq!(count_note_documents(&conn, out.local_note_id), 0);
+    assert_eq!(count_note_dirty_sources(&conn, out.local_note_id), 0);
+
+    let note_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM notes WHERE gitlab_id = ?",
+            [9005_i64],
+            |row| row.get(0),
+        )
+        .unwrap();
+    assert_eq!(note_count, 0);
+}
+
+fn count_dirty_notes(conn: &Connection) -> i64 {
+    conn.query_row(
+        "SELECT COUNT(*) FROM dirty_sources WHERE source_type = 'note'",
+        [],
+        |row| row.get(0),
+    )
+    .unwrap()
+}
+
+#[test]
+fn test_parent_title_change_marks_notes_dirty() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    // Insert two user notes and one system note
+    let note1 = make_note(10001, 1, "User note 1", None, 1000, 2000, false, None);
+    let note2 = make_note(10002, 1, "User note 2", None, 1000, 2000, false, None);
+    let mut sys_note = make_note(10003, 1, "System note", None, 1000, 2000, false, None);
+    sys_note.is_system = true;
+
+    let out1 = upsert_note_for_issue(&conn, disc_id, &note1, 5000, None).unwrap();
+    let out2 = upsert_note_for_issue(&conn, disc_id, &note2, 5000, None).unwrap();
+    upsert_note_for_issue(&conn, disc_id, &sys_note, 5000, None).unwrap();
+
+    // Clear any dirty_sources from individual note upserts
+    conn.execute("DELETE FROM dirty_sources WHERE source_type = 'note'", [])
+        .unwrap();
+    assert_eq!(count_dirty_notes(&conn), 0);
+
+    // Simulate parent title change triggering discussion re-ingest:
+    // update the issue title, then run the propagation SQL
+    conn.execute("UPDATE issues SET title = 'Changed Title' WHERE id = 1", [])
+        .unwrap();
+
+    // Run the propagation query (same as in ingestion code)
+    conn.execute(
+        "INSERT INTO dirty_sources (source_type, source_id, queued_at)
+         SELECT 'note', n.id, ?1
+         FROM notes n
+         WHERE n.discussion_id = ?2 AND n.is_system = 0
+         ON CONFLICT(source_type, source_id) DO UPDATE SET queued_at = excluded.queued_at, attempt_count = 0",
+        params![now_ms(), disc_id],
+    )
+    .unwrap();
+
+    // Both user notes should be dirty, system note should not
+    assert_eq!(count_dirty_notes(&conn), 2);
+    assert_eq!(count_note_dirty_sources(&conn, out1.local_note_id), 1);
+    assert_eq!(count_note_dirty_sources(&conn, out2.local_note_id), 1);
+}
+
+#[test]
+fn test_parent_label_change_marks_notes_dirty() {
+    let conn = setup();
+    let disc_id = get_discussion_id(&conn);
+
+    // Insert one user note
+    let note = make_note(11001, 1, "User note", None, 1000, 2000, false, None);
+    let out = upsert_note_for_issue(&conn, disc_id, &note, 5000, None).unwrap();
+
+    // Clear dirty_sources
+    conn.execute("DELETE FROM dirty_sources WHERE source_type = 'note'", [])
+        .unwrap();
+
+    // Simulate label change on parent issue (labels are part of issue metadata)
+    conn.execute("UPDATE issues SET updated_at = 9999 WHERE id = 1", [])
+        .unwrap();
+
+    // Run propagation query
+    conn.execute(
+        "INSERT INTO dirty_sources (source_type, source_id, queued_at)
+         SELECT 'note', n.id, ?1
+         FROM notes n
+         WHERE n.discussion_id = ?2 AND n.is_system = 0
+         ON CONFLICT(source_type, source_id) DO UPDATE SET queued_at = excluded.queued_at, attempt_count = 0",
+        params![now_ms(), disc_id],
+    )
+    .unwrap();
+
+    assert_eq!(count_dirty_notes(&conn), 1);
+    assert_eq!(count_note_dirty_sources(&conn, out.local_note_id), 1);
+}
--- a/src/ingestion/issues.rs
+++ b/src/ingestion/issues.rs
@@ -138,29 +138,6 @@ fn passes_cursor_filter_with_ts(gitlab_id: i64, issue_ts: i64, cursor: &SyncCurs
    true
 }

-#[cfg(test)]
-fn passes_cursor_filter(issue: &GitLabIssue, cursor: &SyncCursor) -> Result<bool> {
-    let Some(cursor_ts) = cursor.updated_at_cursor else {
-        return Ok(true);
-    };
-
-    let issue_ts = parse_timestamp(&issue.updated_at)?;
-
-    if issue_ts < cursor_ts {
-        return Ok(false);
-    }
-
-    if issue_ts == cursor_ts
-        && cursor
-            .tie_breaker_id
-            .is_some_and(|cursor_id| issue.id <= cursor_id)
-    {
-        return Ok(false);
-    }
-
-    Ok(true)
-}
-
 fn process_single_issue(
    conn: &Connection,
    config: &Config,
@@ -423,78 +400,5 @@ fn parse_timestamp(ts: &str) -> Result<i64> {
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::gitlab::types::GitLabAuthor;
-
-    fn make_test_issue(id: i64, updated_at: &str) -> GitLabIssue {
-        GitLabIssue {
-            id,
-            iid: id,
-            project_id: 100,
-            title: format!("Issue {}", id),
-            description: None,
-            state: "opened".to_string(),
-            created_at: "2024-01-01T00:00:00.000Z".to_string(),
-            updated_at: updated_at.to_string(),
-            closed_at: None,
-            author: GitLabAuthor {
-                id: 1,
-                username: "test".to_string(),
-                name: "Test".to_string(),
-            },
-            assignees: vec![],
-            labels: vec![],
-            milestone: None,
-            due_date: None,
-            web_url: "https://example.com".to_string(),
-        }
-    }
-
-    #[test]
-    fn cursor_filter_allows_newer_issues() {
-        let cursor = SyncCursor {
-            updated_at_cursor: Some(1705312800000),
-            tie_breaker_id: Some(100),
-        };
-
-        let issue = make_test_issue(101, "2024-01-16T10:00:00.000Z");
-        assert!(passes_cursor_filter(&issue, &cursor).unwrap_or(false));
-    }
-
-    #[test]
-    fn cursor_filter_blocks_older_issues() {
-        let cursor = SyncCursor {
-            updated_at_cursor: Some(1705312800000),
-            tie_breaker_id: Some(100),
-        };
-
-        let issue = make_test_issue(99, "2024-01-14T10:00:00.000Z");
-        assert!(!passes_cursor_filter(&issue, &cursor).unwrap_or(true));
-    }
-
-    #[test]
-    fn cursor_filter_uses_tie_breaker_for_same_timestamp() {
-        let cursor = SyncCursor {
-            updated_at_cursor: Some(1705312800000),
-            tie_breaker_id: Some(100),
-        };
-
-        let issue1 = make_test_issue(101, "2024-01-15T10:00:00.000Z");
-        assert!(passes_cursor_filter(&issue1, &cursor).unwrap_or(false));
-
-        let issue2 = make_test_issue(100, "2024-01-15T10:00:00.000Z");
-        assert!(!passes_cursor_filter(&issue2, &cursor).unwrap_or(true));
-
-        let issue3 = make_test_issue(99, "2024-01-15T10:00:00.000Z");
-        assert!(!passes_cursor_filter(&issue3, &cursor).unwrap_or(true));
-    }
-
-    #[test]
-    fn cursor_filter_allows_all_when_no_cursor() {
-        let cursor = SyncCursor::default();
-
-        let issue = make_test_issue(1, "2020-01-01T00:00:00.000Z");
-        assert!(passes_cursor_filter(&issue, &cursor).unwrap_or(false));
-    }
-}
+#[path = "issues_tests.rs"]
+mod tests;
--- a/src/ingestion/issues_tests.rs
+++ b/src/ingestion/issues_tests.rs
@@ -0,0 +1,95 @@
+use super::*;
+use crate::gitlab::types::GitLabAuthor;
+
+fn passes_cursor_filter(issue: &GitLabIssue, cursor: &SyncCursor) -> Result<bool> {
+    let Some(cursor_ts) = cursor.updated_at_cursor else {
+        return Ok(true);
+    };
+
+    let issue_ts = parse_timestamp(&issue.updated_at)?;
+
+    if issue_ts < cursor_ts {
+        return Ok(false);
+    }
+
+    if issue_ts == cursor_ts
+        && cursor
+            .tie_breaker_id
+            .is_some_and(|cursor_id| issue.id <= cursor_id)
+    {
+        return Ok(false);
+    }
+
+    Ok(true)
+}
+
+fn make_test_issue(id: i64, updated_at: &str) -> GitLabIssue {
+    GitLabIssue {
+        id,
+        iid: id,
+        project_id: 100,
+        title: format!("Issue {}", id),
+        description: None,
+        state: "opened".to_string(),
+        created_at: "2024-01-01T00:00:00.000Z".to_string(),
+        updated_at: updated_at.to_string(),
+        closed_at: None,
+        author: GitLabAuthor {
+            id: 1,
+            username: "test".to_string(),
+            name: "Test".to_string(),
+        },
+        assignees: vec![],
+        labels: vec![],
+        milestone: None,
+        due_date: None,
+        web_url: "https://example.com".to_string(),
+    }
+}
+
+#[test]
+fn cursor_filter_allows_newer_issues() {
+    let cursor = SyncCursor {
+        updated_at_cursor: Some(1705312800000),
+        tie_breaker_id: Some(100),
+    };
+
+    let issue = make_test_issue(101, "2024-01-16T10:00:00.000Z");
+    assert!(passes_cursor_filter(&issue, &cursor).unwrap_or(false));
+}
+
+#[test]
+fn cursor_filter_blocks_older_issues() {
+    let cursor = SyncCursor {
+        updated_at_cursor: Some(1705312800000),
+        tie_breaker_id: Some(100),
+    };
+
+    let issue = make_test_issue(99, "2024-01-14T10:00:00.000Z");
+    assert!(!passes_cursor_filter(&issue, &cursor).unwrap_or(true));
+}
+
+#[test]
+fn cursor_filter_uses_tie_breaker_for_same_timestamp() {
+    let cursor = SyncCursor {
+        updated_at_cursor: Some(1705312800000),
+        tie_breaker_id: Some(100),
+    };
+
+    let issue1 = make_test_issue(101, "2024-01-15T10:00:00.000Z");
+    assert!(passes_cursor_filter(&issue1, &cursor).unwrap_or(false));
+
+    let issue2 = make_test_issue(100, "2024-01-15T10:00:00.000Z");
+    assert!(!passes_cursor_filter(&issue2, &cursor).unwrap_or(true));
+
+    let issue3 = make_test_issue(99, "2024-01-15T10:00:00.000Z");
+    assert!(!passes_cursor_filter(&issue3, &cursor).unwrap_or(true));
+}
+
+#[test]
+fn cursor_filter_allows_all_when_no_cursor() {
+    let cursor = SyncCursor::default();
+
+    let issue = make_test_issue(1, "2020-01-01T00:00:00.000Z");
+    assert!(passes_cursor_filter(&issue, &cursor).unwrap_or(false));
+}
--- a/src/ingestion/mr_diffs.rs
+++ b/src/ingestion/mr_diffs.rs
@@ -66,207 +66,5 @@ pub fn upsert_mr_file_changes(
 }

 #[cfg(test)]
-mod tests {
-    use super::*;
-    use crate::core::db::{create_connection, run_migrations};
-    use std::path::Path;
-
-    fn setup() -> Connection {
-        let conn = create_connection(Path::new(":memory:")).unwrap();
-        run_migrations(&conn).unwrap();
-
-        // Insert a test project
-        conn.execute(
-            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/repo', 'https://gitlab.com/group/repo')",
-            [],
-        ).unwrap();
-
-        // Insert a test MR
-        conn.execute(
-            "INSERT INTO merge_requests (gitlab_id, iid, project_id, title, state, draft, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at) \
-             VALUES (100, 1, 1, 'Test MR', 'merged', 0, 'feature', 'main', 'testuser', 1000, 2000, 3000)",
-            [],
-        ).unwrap();
-
-        conn
-    }
-
-    #[test]
-    fn test_derive_change_type_added() {
-        let diff = GitLabMrDiff {
-            old_path: String::new(),
-            new_path: "src/new.rs".to_string(),
-            new_file: true,
-            renamed_file: false,
-            deleted_file: false,
-        };
-        assert_eq!(derive_change_type(&diff), "added");
-    }
-
-    #[test]
-    fn test_derive_change_type_renamed() {
-        let diff = GitLabMrDiff {
-            old_path: "src/old.rs".to_string(),
-            new_path: "src/new.rs".to_string(),
-            new_file: false,
-            renamed_file: true,
-            deleted_file: false,
-        };
-        assert_eq!(derive_change_type(&diff), "renamed");
-    }
-
-    #[test]
-    fn test_derive_change_type_deleted() {
-        let diff = GitLabMrDiff {
-            old_path: "src/gone.rs".to_string(),
-            new_path: "src/gone.rs".to_string(),
-            new_file: false,
-            renamed_file: false,
-            deleted_file: true,
-        };
-        assert_eq!(derive_change_type(&diff), "deleted");
-    }
-
-    #[test]
-    fn test_derive_change_type_modified() {
-        let diff = GitLabMrDiff {
-            old_path: "src/lib.rs".to_string(),
-            new_path: "src/lib.rs".to_string(),
-            new_file: false,
-            renamed_file: false,
-            deleted_file: false,
-        };
-        assert_eq!(derive_change_type(&diff), "modified");
-    }
-
-    #[test]
-    fn test_upsert_inserts_file_changes() {
-        let conn = setup();
-        let diffs = [
-            GitLabMrDiff {
-                old_path: String::new(),
-                new_path: "src/new.rs".to_string(),
-                new_file: true,
-                renamed_file: false,
-                deleted_file: false,
-            },
-            GitLabMrDiff {
-                old_path: "src/lib.rs".to_string(),
-                new_path: "src/lib.rs".to_string(),
-                new_file: false,
-                renamed_file: false,
-                deleted_file: false,
-            },
-        ];
-
-        let inserted = upsert_mr_file_changes(&conn, 1, 1, &diffs).unwrap();
-        assert_eq!(inserted, 2);
-
-        let count: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM mr_file_changes WHERE merge_request_id = 1",
-                [],
-                |r| r.get(0),
-            )
-            .unwrap();
-        assert_eq!(count, 2);
-    }
-
-    #[test]
-    fn test_upsert_replaces_existing() {
-        let conn = setup();
-        let diffs_v1 = [GitLabMrDiff {
-            old_path: String::new(),
-            new_path: "src/old.rs".to_string(),
-            new_file: true,
-            renamed_file: false,
-            deleted_file: false,
-        }];
-        upsert_mr_file_changes(&conn, 1, 1, &diffs_v1).unwrap();
-
-        let diffs_v2 = [
-            GitLabMrDiff {
-                old_path: "src/a.rs".to_string(),
-                new_path: "src/a.rs".to_string(),
-                new_file: false,
-                renamed_file: false,
-                deleted_file: false,
-            },
-            GitLabMrDiff {
-                old_path: "src/b.rs".to_string(),
-                new_path: "src/b.rs".to_string(),
-                new_file: false,
-                renamed_file: false,
-                deleted_file: false,
-            },
-        ];
-        let inserted = upsert_mr_file_changes(&conn, 1, 1, &diffs_v2).unwrap();
-        assert_eq!(inserted, 2);
-
-        let count: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM mr_file_changes WHERE merge_request_id = 1",
-                [],
-                |r| r.get(0),
-            )
-            .unwrap();
-        assert_eq!(count, 2);
-
-        // The old "src/old.rs" should be gone
-        let old_count: i64 = conn
-            .query_row(
-                "SELECT COUNT(*) FROM mr_file_changes WHERE new_path = 'src/old.rs'",
-                [],
-                |r| r.get(0),
-            )
-            .unwrap();
-        assert_eq!(old_count, 0);
-    }
-
-    #[test]
-    fn test_renamed_stores_old_path() {
-        let conn = setup();
-        let diffs = [GitLabMrDiff {
-            old_path: "src/old_name.rs".to_string(),
-            new_path: "src/new_name.rs".to_string(),
-            new_file: false,
-            renamed_file: true,
-            deleted_file: false,
-        }];
-
-        upsert_mr_file_changes(&conn, 1, 1, &diffs).unwrap();
-
-        let (old_path, change_type): (Option<String>, String) = conn
-            .query_row(
-                "SELECT old_path, change_type FROM mr_file_changes WHERE new_path = 'src/new_name.rs'",
-                [],
-                |r| Ok((r.get(0)?, r.get(1)?)),
-            )
-            .unwrap();
-        assert_eq!(old_path.as_deref(), Some("src/old_name.rs"));
-        assert_eq!(change_type, "renamed");
-    }
-
-    #[test]
-    fn test_non_renamed_has_null_old_path() {
-        let conn = setup();
-        let diffs = [GitLabMrDiff {
-            old_path: "src/lib.rs".to_string(),
-            new_path: "src/lib.rs".to_string(),
-            new_file: false,
-            renamed_file: false,
-            deleted_file: false,
-        }];
-
-        upsert_mr_file_changes(&conn, 1, 1, &diffs).unwrap();
-
-        let old_path: Option<String> = conn
-            .query_row(
-                "SELECT old_path FROM mr_file_changes WHERE new_path = 'src/lib.rs'",
-                [],
-                |r| r.get(0),
-            )
-            .unwrap();
-        assert!(old_path.is_none());
-    }
-}
+#[path = "mr_diffs_tests.rs"]
+mod tests;
--- a/src/ingestion/mr_diffs_tests.rs
+++ b/src/ingestion/mr_diffs_tests.rs
@@ -0,0 +1,202 @@
+use super::*;
+use crate::core::db::{create_connection, run_migrations};
+use std::path::Path;
+
+fn setup() -> Connection {
+    let conn = create_connection(Path::new(":memory:")).unwrap();
+    run_migrations(&conn).unwrap();
+
+    // Insert a test project
+    conn.execute(
+        "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) VALUES (1, 'group/repo', 'https://gitlab.com/group/repo')",
+        [],
+    ).unwrap();
+
+    // Insert a test MR
+    conn.execute(
+        "INSERT INTO merge_requests (gitlab_id, iid, project_id, title, state, draft, source_branch, target_branch, author_username, created_at, updated_at, last_seen_at) \
+         VALUES (100, 1, 1, 'Test MR', 'merged', 0, 'feature', 'main', 'testuser', 1000, 2000, 3000)",
+        [],
+    ).unwrap();
+
+    conn
+}
+
+#[test]
+fn test_derive_change_type_added() {
+    let diff = GitLabMrDiff {
+        old_path: String::new(),
+        new_path: "src/new.rs".to_string(),
+        new_file: true,
+        renamed_file: false,
+        deleted_file: false,
+    };
+    assert_eq!(derive_change_type(&diff), "added");
+}
+
+#[test]
+fn test_derive_change_type_renamed() {
+    let diff = GitLabMrDiff {
+        old_path: "src/old.rs".to_string(),
+        new_path: "src/new.rs".to_string(),
+        new_file: false,
+        renamed_file: true,
+        deleted_file: false,
+    };
+    assert_eq!(derive_change_type(&diff), "renamed");
+}
+
+#[test]
+fn test_derive_change_type_deleted() {
+    let diff = GitLabMrDiff {
+        old_path: "src/gone.rs".to_string(),
+        new_path: "src/gone.rs".to_string(),
+        new_file: false,
+        renamed_file: false,
+        deleted_file: true,
+    };
+    assert_eq!(derive_change_type(&diff), "deleted");
+}
+
+#[test]
+fn test_derive_change_type_modified() {
+    let diff = GitLabMrDiff {
+        old_path: "src/lib.rs".to_string(),
+        new_path: "src/lib.rs".to_string(),
+        new_file: false,
+        renamed_file: false,
+        deleted_file: false,
+    };
+    assert_eq!(derive_change_type(&diff), "modified");
+}
+
+#[test]
+fn test_upsert_inserts_file_changes() {
+    let conn = setup();
+    let diffs = [
+        GitLabMrDiff {
+            old_path: String::new(),
+            new_path: "src/new.rs".to_string(),
+            new_file: true,
+            renamed_file: false,
+            deleted_file: false,
+        },
+        GitLabMrDiff {
+            old_path: "src/lib.rs".to_string(),
+            new_path: "src/lib.rs".to_string(),
+            new_file: false,
+            renamed_file: false,
+            deleted_file: false,
+        },
+    ];
+
+    let inserted = upsert_mr_file_changes(&conn, 1, 1, &diffs).unwrap();
+    assert_eq!(inserted, 2);
+
+    let count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM mr_file_changes WHERE merge_request_id = 1",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert_eq!(count, 2);
+}
+
+#[test]
+fn test_upsert_replaces_existing() {
+    let conn = setup();
+    let diffs_v1 = [GitLabMrDiff {
+        old_path: String::new(),
+        new_path: "src/old.rs".to_string(),
+        new_file: true,
+        renamed_file: false,
+        deleted_file: false,
+    }];
+    upsert_mr_file_changes(&conn, 1, 1, &diffs_v1).unwrap();
+
+    let diffs_v2 = [
+        GitLabMrDiff {
+            old_path: "src/a.rs".to_string(),
+            new_path: "src/a.rs".to_string(),
+            new_file: false,
+            renamed_file: false,
+            deleted_file: false,
+        },
+        GitLabMrDiff {
+            old_path: "src/b.rs".to_string(),
+            new_path: "src/b.rs".to_string(),
+            new_file: false,
+            renamed_file: false,
+            deleted_file: false,
+        },
+    ];
+    let inserted = upsert_mr_file_changes(&conn, 1, 1, &diffs_v2).unwrap();
+    assert_eq!(inserted, 2);
+
+    let count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM mr_file_changes WHERE merge_request_id = 1",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert_eq!(count, 2);
+
+    // The old "src/old.rs" should be gone
+    let old_count: i64 = conn
+        .query_row(
+            "SELECT COUNT(*) FROM mr_file_changes WHERE new_path = 'src/old.rs'",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert_eq!(old_count, 0);
+}
+
+#[test]
+fn test_renamed_stores_old_path() {
+    let conn = setup();
+    let diffs = [GitLabMrDiff {
+        old_path: "src/old_name.rs".to_string(),
+        new_path: "src/new_name.rs".to_string(),
+        new_file: false,
+        renamed_file: true,
+        deleted_file: false,
+    }];
+
+    upsert_mr_file_changes(&conn, 1, 1, &diffs).unwrap();
+
+    let (old_path, change_type): (Option<String>, String) = conn
+        .query_row(
+            "SELECT old_path, change_type FROM mr_file_changes WHERE new_path = 'src/new_name.rs'",
+            [],
+            |r| Ok((r.get(0)?, r.get(1)?)),
+        )
+        .unwrap();
+    assert_eq!(old_path.as_deref(), Some("src/old_name.rs"));
+    assert_eq!(change_type, "renamed");
+}
+
+#[test]
+fn test_non_renamed_has_null_old_path() {
+    let conn = setup();
+    let diffs = [GitLabMrDiff {
+        old_path: "src/lib.rs".to_string(),
+        new_path: "src/lib.rs".to_string(),
+        new_file: false,
+        renamed_file: false,
+        deleted_file: false,
+    }];
+
+    upsert_mr_file_changes(&conn, 1, 1, &diffs).unwrap();
+
+    let old_path: Option<String> = conn
+        .query_row(
+            "SELECT old_path FROM mr_file_changes WHERE new_path = 'src/lib.rs'",
+            [],
+            |r| r.get(0),
+        )
+        .unwrap();
+    assert!(old_path.is_none());
+}
--- a/src/ingestion/mr_discussions.rs
+++ b/src/ingestion/mr_discussions.rs
@@ -14,6 +14,7 @@ use crate::gitlab::transformers::{
 };
 use crate::gitlab::types::GitLabDiscussion;
 use crate::ingestion::dirty_tracker;
+use crate::ingestion::discussions::NoteUpsertOutcome;

 use super::merge_requests::MrForDiscussionSync;

@@ -161,6 +162,16 @@ pub fn write_prefetched_mr_discussions(

        dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?;

+        // Mark child note documents dirty (they inherit parent metadata)
+        tx.execute(
+            "INSERT INTO dirty_sources (source_type, source_id, queued_at)
+             SELECT 'note', n.id, ?1
+             FROM notes n
+             WHERE n.discussion_id = ?2 AND n.is_system = 0
+             ON CONFLICT(source_type, source_id) DO UPDATE SET queued_at = excluded.queued_at, attempt_count = 0",
+            params![now_ms(), local_discussion_id],
+        )?;
+
        for note in &disc.notes {
            let should_store_payload = !note.is_system
                || note.position_new_path.is_some()
@@ -187,7 +198,11 @@ pub fn write_prefetched_mr_discussions(
                None
            };

-            upsert_note(&tx, local_discussion_id, note, run_seen_at, note_payload_id)?;
+            let outcome =
+                upsert_note(&tx, local_discussion_id, note, run_seen_at, note_payload_id)?;
+            if !note.is_system && outcome.changed_semantics {
+                dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
+            }
        }

        tx.commit()?;
@@ -361,6 +376,16 @@ async fn ingest_discussions_for_mr(

        dirty_tracker::mark_dirty_tx(&tx, SourceType::Discussion, local_discussion_id)?;

+        // Mark child note documents dirty (they inherit parent metadata)
+        tx.execute(
+            "INSERT INTO dirty_sources (source_type, source_id, queued_at)
+             SELECT 'note', n.id, ?1
+             FROM notes n
+             WHERE n.discussion_id = ?2 AND n.is_system = 0
+             ON CONFLICT(source_type, source_id) DO UPDATE SET queued_at = excluded.queued_at, attempt_count = 0",
+            params![now_ms(), local_discussion_id],
+        )?;
+
        for note in &notes {
            let should_store_payload = !note.is_system
                || note.position_new_path.is_some()
@@ -390,7 +415,11 @@ async fn ingest_discussions_for_mr(
                None
            };

-            upsert_note(&tx, local_discussion_id, note, run_seen_at, note_payload_id)?;
+            let outcome =
+                upsert_note(&tx, local_discussion_id, note, run_seen_at, note_payload_id)?;
+            if !note.is_system && outcome.changed_semantics {
+                dirty_tracker::mark_dirty_tx(&tx, SourceType::Note, outcome.local_note_id)?;
+            }
        }

        tx.commit()?;
@@ -473,19 +502,87 @@ fn upsert_note(
    note: &NormalizedNote,
    last_seen_at: i64,
    payload_id: Option<i64>,
-) -> Result<()> {
+) -> Result<NoteUpsertOutcome> {
+    // Pre-read for semantic change detection
+    let existing = conn
+        .query_row(
+            "SELECT id, body, note_type, resolved, resolved_by,
+                    position_old_path, position_new_path, position_old_line, position_new_line,
+                    position_type, position_line_range_start, position_line_range_end,
+                    position_base_sha, position_start_sha, position_head_sha
+             FROM notes WHERE gitlab_id = ?",
+            params![note.gitlab_id],
+            |row| {
+                Ok((
+                    row.get::<_, i64>(0)?,
+                    row.get::<_, String>(1)?,
+                    row.get::<_, Option<String>>(2)?,
+                    row.get::<_, bool>(3)?,
+                    row.get::<_, Option<String>>(4)?,
+                    row.get::<_, Option<String>>(5)?,
+                    row.get::<_, Option<String>>(6)?,
+                    row.get::<_, Option<i32>>(7)?,
+                    row.get::<_, Option<i32>>(8)?,
+                    row.get::<_, Option<String>>(9)?,
+                    row.get::<_, Option<i32>>(10)?,
+                    row.get::<_, Option<i32>>(11)?,
+                    row.get::<_, Option<String>>(12)?,
+                    row.get::<_, Option<String>>(13)?,
+                    row.get::<_, Option<String>>(14)?,
+                ))
+            },
+        )
+        .ok();
+
+    let changed_semantics = match &existing {
+        Some((
+            _id,
+            body,
+            note_type,
+            resolved,
+            resolved_by,
+            pos_old_path,
+            pos_new_path,
+            pos_old_line,
+            pos_new_line,
+            pos_type,
+            pos_range_start,
+            pos_range_end,
+            pos_base_sha,
+            pos_start_sha,
+            pos_head_sha,
+        )) => {
+            *body != note.body
+                || *note_type != note.note_type
+                || *resolved != note.resolved
+                || *resolved_by != note.resolved_by
+                || *pos_old_path != note.position_old_path
+                || *pos_new_path != note.position_new_path
+                || *pos_old_line != note.position_old_line
+                || *pos_new_line != note.position_new_line
+                || *pos_type != note.position_type
+                || *pos_range_start != note.position_line_range_start
+                || *pos_range_end != note.position_line_range_end
+                || *pos_base_sha != note.position_base_sha
+                || *pos_start_sha != note.position_start_sha
+                || *pos_head_sha != note.position_head_sha
+        }
+        None => true,
+    };
+
    conn.execute(
        "INSERT INTO notes (
            gitlab_id, discussion_id, project_id, note_type, is_system,
-            author_username, body, created_at, updated_at, last_seen_at,
+            author_id, author_username, body, created_at, updated_at, last_seen_at,
            position, resolvable, resolved, resolved_by, resolved_at,
            position_old_path, position_new_path, position_old_line, position_new_line,
            position_type, position_line_range_start, position_line_range_end,
            position_base_sha, position_start_sha, position_head_sha,
            raw_payload_id
-        ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18, ?19, ?20, ?21, ?22, ?23, ?24, ?25, ?26)
+        ) VALUES (?1, ?2, ?3, ?4, ?5, ?6, ?7, ?8, ?9, ?10, ?11, ?12, ?13, ?14, ?15, ?16, ?17, ?18, ?19, ?20, ?21, ?22, ?23, ?24, ?25, ?26, ?27)
        ON CONFLICT(gitlab_id) DO UPDATE SET
            note_type = excluded.note_type,
+            author_id = excluded.author_id,
            body = excluded.body,
            updated_at = excluded.updated_at,
            last_seen_at = excluded.last_seen_at,
@@ -510,6 +607,7 @@ fn upsert_note(
            note.project_id,
            &note.note_type,
            note.is_system,
+            note.author_id,
            &note.author_username,
            &note.body,
            note.created_at,
@@ -533,7 +631,17 @@ fn upsert_note(
            payload_id,
        ],
    )?;
-    Ok(())
+
+    let local_note_id: i64 = conn.query_row(
+        "SELECT id FROM notes WHERE gitlab_id = ?",
+        params![note.gitlab_id],
+        |row| row.get(0),
+    )?;
+
+    Ok(NoteUpsertOutcome {
+        local_note_id,
+        changed_semantics,
+    })
 }

 fn sweep_stale_discussions(conn: &Connection, local_mr_id: i64, run_seen_at: i64) -> Result<usize> {
@@ -554,13 +662,36 @@ fn sweep_stale_notes(
    local_mr_id: i64,
    run_seen_at: i64,
 ) -> Result<usize> {
+    // Step 1: Delete note documents for stale notes
+    conn.execute(
+        "DELETE FROM documents WHERE source_type = 'note' AND source_id IN
+         (SELECT id FROM notes
+          WHERE project_id = ?1
+            AND discussion_id IN (SELECT id FROM discussions WHERE merge_request_id = ?2)
+            AND last_seen_at < ?3
+            AND is_system = 0)",
+        params![local_project_id, local_mr_id, run_seen_at],
+    )?;
+
+    // Step 2: Delete dirty_sources entries for stale notes
+    conn.execute(
+        "DELETE FROM dirty_sources WHERE source_type = 'note' AND source_id IN
+         (SELECT id FROM notes
+          WHERE project_id = ?1
+            AND discussion_id IN (SELECT id FROM discussions WHERE merge_request_id = ?2)
+            AND last_seen_at < ?3
+            AND is_system = 0)",
+        params![local_project_id, local_mr_id, run_seen_at],
+    )?;
+
+    // Step 3: Delete the stale notes themselves
    let deleted = conn.execute(
        "DELETE FROM notes
-         WHERE project_id = ?
+         WHERE project_id = ?1
           AND discussion_id IN (
-             SELECT id FROM discussions WHERE merge_request_id = ?
+             SELECT id FROM discussions WHERE merge_request_id = ?2
           )
-           AND last_seen_at < ?",
+           AND last_seen_at < ?3",
        params![local_project_id, local_mr_id, run_seen_at],
    )?;
    if deleted > 0 {
@@ -604,6 +735,8 @@ fn clear_sync_health_error(conn: &Connection, local_mr_id: i64) -> Result<()> {
 #[cfg(test)]
 mod tests {
    use super::*;
+    use crate::core::db::{create_connection, run_migrations};
+    use std::path::Path;

    #[test]
    fn result_default_has_zero_counts() {
@@ -621,4 +754,153 @@ mod tests {
        let result = IngestMrDiscussionsResult::default();
        assert!(!result.pagination_succeeded);
    }
+
+    fn setup_mr() -> Connection {
+        let conn = create_connection(Path::new(":memory:")).unwrap();
+        run_migrations(&conn).unwrap();
+
+        conn.execute(
+            "INSERT INTO projects (gitlab_project_id, path_with_namespace, web_url) \
+             VALUES (1, 'group/repo', 'https://gitlab.com/group/repo')",
+            [],
+        )
+        .unwrap();
+
+        conn.execute(
+            "INSERT INTO merge_requests (gitlab_id, iid, project_id, title, state, \
+             author_username, source_branch, target_branch, created_at, updated_at, last_seen_at) \
+             VALUES (200, 1, 1, 'Test MR', 'opened', 'testuser', 'feat', 'main', 1000, 2000, 3000)",
+            [],
+        )
+        .unwrap();
+
+        conn.execute(
+            "INSERT INTO discussions (gitlab_discussion_id, project_id, merge_request_id, noteable_type, \
+             individual_note, last_seen_at, resolvable, resolved) \
+             VALUES ('mr-disc-1', 1, 1, 'MergeRequest', 0, 3000, 0, 0)",
+            [],
+        )
+        .unwrap();
+
+        conn
+    }
+
+    fn get_mr_discussion_id(conn: &Connection) -> i64 {
+        conn.query_row("SELECT id FROM discussions LIMIT 1", [], |row| row.get(0))
+            .unwrap()
+    }
+
+    #[allow(clippy::too_many_arguments)]
+    fn make_mr_note(
+        gitlab_id: i64,
+        project_id: i64,
+        body: &str,
+        note_type: Option<&str>,
+        created_at: i64,
+        updated_at: i64,
+        resolved: bool,
+        resolved_by: Option<&str>,
+    ) -> NormalizedNote {
+        NormalizedNote {
+            gitlab_id,
+            project_id,
+            note_type: note_type.map(String::from),
+            is_system: false,
+            author_id: None,
+            author_username: "testuser".to_string(),
+            body: body.to_string(),
+            created_at,
+            updated_at,
+            last_seen_at: updated_at,
+            position: 0,
+            resolvable: false,
+            resolved,
+            resolved_by: resolved_by.map(String::from),
+            resolved_at: None,
+            position_old_path: None,
+            position_new_path: None,
+            position_old_line: None,
+            position_new_line: None,
+            position_type: None,
+            position_line_range_start: None,
+            position_line_range_end: None,
+            position_base_sha: None,
+            position_start_sha: None,
+            position_head_sha: None,
+        }
+    }
+
+    #[test]
+    fn test_mr_note_upsert_captures_author_id() {
+        let conn = setup_mr();
+        let disc_id = get_mr_discussion_id(&conn);
+
+        let mut note = make_mr_note(8001, 1, "MR note", None, 1000, 2000, false, None);
+        note.author_id = Some(12345);
+
+        upsert_note(&conn, disc_id, &note, 5000, None).unwrap();
+
+        let stored: Option<i64> = conn
+            .query_row(
+                "SELECT author_id FROM notes WHERE gitlab_id = ?",
+                [8001_i64],
+                |row| row.get(0),
+            )
+            .unwrap();
+        assert_eq!(stored, Some(12345));
+    }
+
+    fn insert_note_document(conn: &Connection, note_local_id: i64) {
+        conn.execute(
+            "INSERT INTO documents (source_type, source_id, project_id, content_text, content_hash) \
+             VALUES ('note', ?1, 1, 'note content', 'hash123')",
+            [note_local_id],
+        )
+        .unwrap();
+    }
+
+    fn count_note_documents(conn: &Connection, note_local_id: i64) -> i64 {
+        conn.query_row(
+            "SELECT COUNT(*) FROM documents WHERE source_type = 'note' AND source_id = ?",
+            [note_local_id],
+            |row| row.get(0),
+        )
+        .unwrap()
+    }
+
+    #[test]
+    fn test_mr_note_sweep_deletes_note_documents_immediately() {
+        let conn = setup_mr();
+        let disc_id = get_mr_discussion_id(&conn);
+        let local_project_id = 1;
+        let local_mr_id = 1;
+
+        // Insert 3 notes
+        let note1 = make_mr_note(8101, 1, "Keep", None, 1000, 2000, false, None);
+        let note2 = make_mr_note(8102, 1, "Keep too", None, 1000, 2000, false, None);
+        let note3 = make_mr_note(8103, 1, "Stale", None, 1000, 2000, false, None);
+
+        let out1 = upsert_note(&conn, disc_id, &note1, 5000, None).unwrap();
+        let out2 = upsert_note(&conn, disc_id, &note2, 5000, None).unwrap();
+        let out3 = upsert_note(&conn, disc_id, &note3, 5000, None).unwrap();
+
+        // Add documents for all 3
+        insert_note_document(&conn, out1.local_note_id);
+        insert_note_document(&conn, out2.local_note_id);
+        insert_note_document(&conn, out3.local_note_id);
+
+        // Re-sync only notes 1 and 2
+        upsert_note(&conn, disc_id, &note1, 6000, None).unwrap();
+        upsert_note(&conn, disc_id, &note2, 6000, None).unwrap();
+
+        // Sweep stale notes
+        sweep_stale_notes(&conn, local_project_id, local_mr_id, 6000).unwrap();
+
+        // Stale note's document should be gone
+        assert_eq!(count_note_documents(&conn, out3.local_note_id), 0);
+
+        // Kept notes' documents should survive
+        assert_eq!(count_note_documents(&conn, out1.local_note_id), 1);
+        assert_eq!(count_note_documents(&conn, out2.local_note_id), 1);
+    }
 }
--- a/src/ingestion/orchestrator.rs
+++ b/src/ingestion/orchestrator.rs
@@ -640,6 +640,24 @@ pub async fn ingest_project_merge_requests_with_progress(
        );
    }

+    let desc_refs = crate::core::note_parser::extract_refs_from_descriptions(conn, project_id)?;
+    if desc_refs.inserted > 0 || desc_refs.skipped_unresolvable > 0 {
+        debug!(
+            inserted = desc_refs.inserted,
+            unresolvable = desc_refs.skipped_unresolvable,
+            "Extracted cross-references from descriptions"
+        );
+    }
+
+    let user_note_refs = crate::core::note_parser::extract_refs_from_user_notes(conn, project_id)?;
+    if user_note_refs.inserted > 0 || user_note_refs.skipped_unresolvable > 0 {
+        debug!(
+            inserted = user_note_refs.inserted,
+            unresolvable = user_note_refs.skipped_unresolvable,
+            "Extracted cross-references from user notes"
+        );
+    }
+
    {
        let enqueued = enqueue_mr_closes_issues_jobs(conn, project_id)?;
        if enqueued > 0 {
--- a/src/main.rs
+++ b/src/main.rs
@@ -11,23 +11,25 @@ use lore::Config;
 use lore::cli::autocorrect::{self, CorrectionResult};
 use lore::cli::commands::{
    IngestDisplay, InitInputs, InitOptions, InitResult, ListFilters, MrListFilters,
-    SearchCliFilters, SyncOptions, TimelineParams, open_issue_in_browser, open_mr_in_browser,
-    print_count, print_count_json, print_doctor_results, print_drift_human, print_drift_json,
-    print_dry_run_preview, print_dry_run_preview_json, print_embed, print_embed_json,
-    print_event_count, print_event_count_json, print_generate_docs, print_generate_docs_json,
-    print_ingest_summary, print_ingest_summary_json, print_list_issues, print_list_issues_json,
-    print_list_mrs, print_list_mrs_json, print_search_results, print_search_results_json,
-    print_show_issue, print_show_issue_json, print_show_mr, print_show_mr_json, print_stats,
-    print_stats_json, print_sync, print_sync_json, print_sync_status, print_sync_status_json,
-    print_timeline, print_timeline_json_with_meta, print_who_human, print_who_json, run_auth_test,
-    run_count, run_count_events, run_doctor, run_drift, run_embed, run_generate_docs, run_ingest,
-    run_ingest_dry_run, run_init, run_list_issues, run_list_mrs, run_search, run_show_issue,
-    run_show_mr, run_stats, run_sync, run_sync_status, run_timeline, run_who,
+    NoteListFilters, SearchCliFilters, SyncOptions, TimelineParams, open_issue_in_browser,
+    open_mr_in_browser, print_count, print_count_json, print_doctor_results, print_drift_human,
+    print_drift_json, print_dry_run_preview, print_dry_run_preview_json, print_embed,
+    print_embed_json, print_event_count, print_event_count_json, print_generate_docs,
+    print_generate_docs_json, print_ingest_summary, print_ingest_summary_json, print_list_issues,
+    print_list_issues_json, print_list_mrs, print_list_mrs_json, print_list_notes,
+    print_list_notes_csv, print_list_notes_json, print_list_notes_jsonl, print_search_results,
+    print_search_results_json, print_show_issue, print_show_issue_json, print_show_mr,
+    print_show_mr_json, print_stats, print_stats_json, print_sync, print_sync_json,
+    print_sync_status, print_sync_status_json, print_timeline, print_timeline_json_with_meta,
+    print_who_human, print_who_json, query_notes, run_auth_test, run_count, run_count_events,
+    run_doctor, run_drift, run_embed, run_generate_docs, run_ingest, run_ingest_dry_run, run_init,
+    run_list_issues, run_list_mrs, run_search, run_show_issue, run_show_mr, run_stats, run_sync,
+    run_sync_status, run_timeline, run_who,
 };
 use lore::cli::robot::{RobotMeta, strip_schemas};
 use lore::cli::{
    Cli, Commands, CountArgs, EmbedArgs, GenerateDocsArgs, IngestArgs, IssuesArgs, MrsArgs,
-    SearchArgs, StatsArgs, SyncArgs, TimelineArgs, WhoArgs,
+    NotesArgs, SearchArgs, StatsArgs, SyncArgs, TimelineArgs, WhoArgs,
 };
 use lore::core::db::{
    LATEST_SCHEMA_VERSION, create_connection, get_schema_version, run_migrations,
@@ -173,10 +175,13 @@ async fn main() {
        }
        Some(Commands::Issues(args)) => handle_issues(cli.config.as_deref(), args, robot_mode),
        Some(Commands::Mrs(args)) => handle_mrs(cli.config.as_deref(), args, robot_mode),
+        Some(Commands::Notes(args)) => handle_notes(cli.config.as_deref(), args, robot_mode),
        Some(Commands::Search(args)) => {
            handle_search(cli.config.as_deref(), args, robot_mode).await
        }
-        Some(Commands::Timeline(args)) => handle_timeline(cli.config.as_deref(), args, robot_mode),
+        Some(Commands::Timeline(args)) => {
+            handle_timeline(cli.config.as_deref(), args, robot_mode).await
+        }
        Some(Commands::Who(args)) => handle_who(cli.config.as_deref(), args, robot_mode),
        Some(Commands::Drift {
            entity_type,
@@ -646,27 +651,37 @@ fn extract_invalid_value_context(e: &clap::Error) -> (Option<String>, Option<Vec

 /// Phase 4: Suggest similar command using fuzzy matching
 fn suggest_similar_command(invalid: &str) -> String {
-    const VALID_COMMANDS: &[&str] = &[
-        "issues",
-        "mrs",
-        "search",
-        "sync",
-        "ingest",
-        "count",
-        "status",
-        "auth",
-        "doctor",
-        "version",
-        "init",
-        "stats",
-        "generate-docs",
-        "embed",
-        "migrate",
-        "health",
-        "robot-docs",
-        "completions",
-        "timeline",
-        "who",
+    // Primary commands + common aliases for fuzzy matching
+    const VALID_COMMANDS: &[(&str, &str)] = &[
+        ("issues", "issues"),
+        ("issue", "issues"),
+        ("mrs", "mrs"),
+        ("mr", "mrs"),
+        ("merge-requests", "mrs"),
+        ("search", "search"),
+        ("find", "search"),
+        ("query", "search"),
+        ("sync", "sync"),
+        ("ingest", "ingest"),
+        ("count", "count"),
+        ("status", "status"),
+        ("auth", "auth"),
+        ("doctor", "doctor"),
+        ("version", "version"),
+        ("init", "init"),
+        ("stats", "stats"),
+        ("stat", "stats"),
+        ("generate-docs", "generate-docs"),
+        ("embed", "embed"),
+        ("migrate", "migrate"),
+        ("health", "health"),
+        ("robot-docs", "robot-docs"),
+        ("completions", "completions"),
+        ("timeline", "timeline"),
+        ("who", "who"),
+        ("notes", "notes"),
+        ("note", "notes"),
+        ("drift", "drift"),
    ];

    let invalid_lower = invalid.to_lowercase();
@@ -674,19 +689,43 @@ fn suggest_similar_command(invalid: &str) -> String {
    // Find the best match using Jaro-Winkler similarity
    let best_match = VALID_COMMANDS
        .iter()
-        .map(|cmd| (*cmd, jaro_winkler(&invalid_lower, cmd)))
+        .map(|(alias, canonical)| (*canonical, jaro_winkler(&invalid_lower, alias)))
        .max_by(|a, b| a.1.partial_cmp(&b.1).unwrap_or(std::cmp::Ordering::Equal));

    if let Some((cmd, score)) = best_match
        && score > 0.7
    {
+        let example = command_example(cmd);
        return format!(
-            "Did you mean 'lore {}'? Run 'lore robot-docs' for all commands",
-            cmd
+            "Did you mean 'lore {cmd}'? Example: {example}. Run 'lore robot-docs' for all commands"
        );
    }

-    "Run 'lore robot-docs' for valid commands".to_string()
+    "Run 'lore robot-docs' for valid commands. Common: issues, mrs, search, sync, timeline, who"
+        .to_string()
+}
+
+/// Return a contextual usage example for a command.
+fn command_example(cmd: &str) -> &'static str {
+    match cmd {
+        "issues" => "lore --robot issues -n 10",
+        "mrs" => "lore --robot mrs -n 10",
+        "search" => "lore --robot search \"auth bug\"",
+        "sync" => "lore --robot sync",
+        "ingest" => "lore --robot ingest issues",
+        "notes" => "lore --robot notes --for-issue 123",
+        "count" => "lore --robot count issues",
+        "status" => "lore --robot status",
+        "stats" => "lore --robot stats",
+        "timeline" => "lore --robot timeline \"auth flow\"",
+        "who" => "lore --robot who --path src/",
+        "health" => "lore --robot health",
+        "generate-docs" => "lore --robot generate-docs",
+        "embed" => "lore --robot embed",
+        "robot-docs" => "lore robot-docs",
+        "init" => "lore init",
+        _ => "lore --robot <command>",
+    }
 }

 fn handle_issues(
@@ -801,6 +840,59 @@ fn handle_mrs(
    Ok(())
 }

+fn handle_notes(
+    config_override: Option<&str>,
+    args: NotesArgs,
+    robot_mode: bool,
+) -> Result<(), Box<dyn std::error::Error>> {
+    let start = std::time::Instant::now();
+    let config = Config::load(config_override)?;
+    let db_path = get_db_path(config.storage.db_path.as_deref());
+    let conn = create_connection(&db_path)?;
+
+    let order = if args.asc { "asc" } else { "desc" };
+    let filters = NoteListFilters {
+        limit: args.limit,
+        project: args.project,
+        author: args.author,
+        note_type: args.note_type,
+        include_system: args.include_system,
+        for_issue_iid: args.for_issue,
+        for_mr_iid: args.for_mr,
+        note_id: args.note_id,
+        gitlab_note_id: args.gitlab_note_id,
+        discussion_id: args.discussion_id,
+        since: args.since,
+        until: args.until,
+        path: args.path,
+        contains: args.contains,
+        resolution: args.resolution,
+        sort: args.sort,
+        order: order.to_string(),
+    };
+
+    let result = query_notes(&conn, &filters, &config)?;
+
+    let format = if robot_mode && args.format == "table" {
+        "json"
+    } else {
+        &args.format
+    };
+
+    match format {
+        "json" => print_list_notes_json(
+            &result,
+            start.elapsed().as_millis() as u64,
+            args.fields.as_deref(),
+        ),
+        "jsonl" => print_list_notes_jsonl(&result),
+        "csv" => print_list_notes_csv(&result),
+        _ => print_list_notes(&result),
+    }
+
+    Ok(())
+}
+
 async fn handle_ingest(
    config_override: Option<&str>,
    args: IngestArgs,
@@ -1707,7 +1799,7 @@ async fn handle_stats(
    Ok(())
 }

-fn handle_timeline(
+async fn handle_timeline(
    config_override: Option<&str>,
    args: TimelineArgs,
    robot_mode: bool,
@@ -1726,9 +1818,10 @@ fn handle_timeline(
        max_seeds: args.max_seeds,
        max_entities: args.max_entities,
        max_evidence: args.max_evidence,
+        robot_mode,
    };

-    let result = run_timeline(&config, &params)?;
+    let result = run_timeline(&config, &params).await?;

    if robot_mode {
        print_timeline_json_with_meta(
@@ -1770,6 +1863,12 @@ async fn handle_search(
        limit: args.limit,
    };

+    let spinner = lore::cli::progress::stage_spinner(
+        1,
+        1,
+        &format!("Searching ({})...", args.mode),
+        robot_mode,
+    );
    let start = std::time::Instant::now();
    let response = run_search(
        &config,
@@ -1781,6 +1880,7 @@ async fn handle_search(
    )
    .await?;
    let elapsed_ms = start.elapsed().as_millis() as u64;
+    spinner.finish_and_clear();

    if robot_mode {
        print_search_results_json(&response, elapsed_ms, args.fields.as_deref());
@@ -2069,6 +2169,8 @@ struct RobotDocsData {
    commands: serde_json::Value,
    /// Deprecated command aliases (old -> new)
    aliases: serde_json::Value,
+    /// Pre-clap error tolerance: what the CLI auto-corrects
+    error_tolerance: serde_json::Value,
    exit_codes: serde_json::Value,
    /// Error codes emitted by clap parse failures
    clap_error_codes: serde_json::Value,
@@ -2279,13 +2381,17 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
            "example": "lore completions bash > ~/.local/share/bash-completion/completions/lore"
        },
        "timeline": {
-            "description": "Chronological timeline of events matching a keyword query",
+            "description": "Chronological timeline of events matching a keyword query or entity reference",
            "flags": ["<QUERY>", "-p/--project", "--since <duration>", "--depth <n>", "--expand-mentions", "-n/--limit", "--fields <list>", "--max-seeds", "--max-entities", "--max-evidence"],
-            "example": "lore --robot timeline '<keyword>' --since 30d",
+            "query_syntax": {
+                "search": "Any text -> hybrid search seeding (FTS + vector)",
+                "entity_direct": "issue:N, i:N, mr:N, m:N -> direct entity seeding (no search, no Ollama)"
+            },
+            "example": "lore --robot timeline issue:42",
            "response_schema": {
                "ok": "bool",
                "data": {"entities": "[{type:string, iid:int, title:string, project_path:string}]", "events": "[{timestamp:string, type:string, entity_type:string, entity_iid:int, detail:string}]", "total_events": "int"},
-                "meta": {"elapsed_ms": "int"}
+                "meta": {"elapsed_ms": "int", "search_mode": "string (hybrid|lexical|direct)"}
            },
            "fields_presets": {"minimal": ["timestamp", "type", "entity_iid", "detail"]}
        },
@@ -2317,6 +2423,17 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
                "active_minimal": ["entity_type", "iid", "title", "participants"]
            }
        },
+        "notes": {
+            "description": "List notes from discussions with rich filtering",
+            "flags": ["--limit/-n <N>", "--author/-a <username>", "--note-type <type>", "--contains <text>", "--for-issue <iid>", "--for-mr <iid>", "-p/--project <path>", "--since <period>", "--until <period>", "--path <filepath>", "--resolution <any|unresolved|resolved>", "--sort <created|updated>", "--asc", "--include-system", "--note-id <id>", "--gitlab-note-id <id>", "--discussion-id <id>", "--format <table|json|jsonl|csv>", "--fields <list|minimal>", "--open"],
+            "robot_flags": ["--format json", "--fields minimal"],
+            "example": "lore --robot notes --author jdefting --since 1y --format json --fields minimal",
+            "response_schema": {
+                "ok": "bool",
+                "data": {"notes": "[NoteListRowJson]", "total_count": "int", "showing": "int"},
+                "meta": {"elapsed_ms": "int"}
+            }
+        },
        "robot-docs": {
            "description": "This command (agent self-discovery manifest)",
            "flags": ["--brief"],
@@ -2338,6 +2455,7 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
            "search: FTS5 + vector hybrid search across all entities",
            "who: Expert/workload/reviews analysis per file path or person",
            "timeline: Chronological event reconstruction across entities",
+            "notes: Rich note listing with author, type, resolution, path, and discussion filters",
            "stats: Database statistics with document/note/discussion counts",
            "count: Entity counts with state breakdowns",
            "embed: Generate vector embeddings for semantic search via Ollama"
@@ -2407,12 +2525,54 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e

    // Phase 3: Deprecated command aliases
    let aliases = serde_json::json!({
-        "list issues": "issues",
-        "list mrs": "mrs",
-        "show issue <IID>": "issues <IID>",
-        "show mr <IID>": "mrs <IID>",
-        "auth-test": "auth",
-        "sync-status": "status"
+        "deprecated_commands": {
+            "list issues": "issues",
+            "list mrs": "mrs",
+            "show issue <IID>": "issues <IID>",
+            "show mr <IID>": "mrs <IID>",
+            "auth-test": "auth",
+            "sync-status": "status"
+        },
+        "command_aliases": {
+            "issue": "issues",
+            "mr": "mrs",
+            "merge-requests": "mrs",
+            "merge-request": "mrs",
+            "note": "notes",
+            "find": "search",
+            "query": "search",
+            "stat": "stats",
+            "st": "status"
+        },
+        "pre_clap_aliases": {
+            "note": "Underscore/no-separator forms auto-corrected before parsing",
+            "merge_requests": "mrs",
+            "merge_request": "mrs",
+            "mergerequests": "mrs",
+            "mergerequest": "mrs",
+            "generate_docs": "generate-docs",
+            "generatedocs": "generate-docs",
+            "gendocs": "generate-docs",
+            "gen-docs": "generate-docs",
+            "robot_docs": "robot-docs",
+            "robotdocs": "robot-docs"
+        },
+        "prefix_matching": "Enabled via infer_subcommands. Unambiguous prefixes work: 'iss' -> issues, 'time' -> timeline, 'sea' -> search"
+    });
+
+    let error_tolerance = serde_json::json!({
+        "note": "The CLI auto-corrects common mistakes before parsing. Corrections are applied silently with a teaching note on stderr.",
+        "auto_corrections": [
+            {"type": "single_dash_long_flag", "example": "-robot -> --robot", "mode": "all"},
+            {"type": "case_normalization", "example": "--Robot -> --robot, --State -> --state", "mode": "all"},
+            {"type": "flag_prefix", "example": "--proj -> --project (when unambiguous)", "mode": "all"},
+            {"type": "fuzzy_flag", "example": "--projct -> --project", "mode": "all (threshold 0.9 in robot, 0.8 in human)"},
+            {"type": "subcommand_alias", "example": "merge_requests -> mrs, robotdocs -> robot-docs", "mode": "all"},
+            {"type": "value_normalization", "example": "--state Opened -> --state opened", "mode": "all"},
+            {"type": "value_fuzzy", "example": "--state opend -> --state opened", "mode": "all"},
+            {"type": "prefix_matching", "example": "lore iss -> lore issues, lore time -> lore timeline", "mode": "all (via clap infer_subcommands)"}
+        ],
+        "teaching_notes": "Auto-corrections emit a JSON warning on stderr: {\"warning\":{\"type\":\"ARG_CORRECTED\",\"corrections\":[...],\"teaching\":[...]}}"
    });

    // Phase 3: Clap error codes (emitted by handle_clap_error)
@@ -2451,6 +2611,7 @@ fn handle_robot_docs(robot_mode: bool, brief: bool) -> Result<(), Box<dyn std::e
            quick_start,
            commands,
            aliases,
+            error_tolerance,
            exit_codes,
            clap_error_codes,
            error_format: "stderr JSON: {\"error\":{\"code\":\"...\",\"message\":\"...\",\"suggestion\":\"...\",\"actions\":[\"...\"]}}".to_string(),
--- a/src/search/filters.rs
+++ b/src/search/filters.rs
@@ -1,4 +1,5 @@
 use crate::core::error::Result;
+use crate::core::path_resolver::escape_like;
 use crate::documents::SourceType;
 use rusqlite::Connection;

@@ -43,12 +44,6 @@ impl SearchFilters {
    }
 }

-fn escape_like(s: &str) -> String {
-    s.replace('\\', "\\\\")
-        .replace('%', "\\%")
-        .replace('_', "\\_")
-}
-
 pub fn apply_filters(
    conn: &Connection,
    document_ids: &[i64],
--- a/src/search/fts.rs
+++ b/src/search/fts.rs
@@ -52,12 +52,18 @@ pub fn to_fts_query(raw: &str, mode: FtsQueryMode) -> String {
                return String::new();
            }

+            // FTS5 boolean operators are case-sensitive uppercase keywords.
+            // Pass them through unquoted so users can write "switch AND health".
+            const FTS5_OPERATORS: &[&str] = &["AND", "OR", "NOT", "NEAR"];
+
            let mut result = String::with_capacity(trimmed.len() + 20);
            for (i, token) in trimmed.split_whitespace().enumerate() {
                if i > 0 {
                    result.push(' ');
                }
-                if let Some(stem) = token.strip_suffix('*')
+                if FTS5_OPERATORS.contains(&token) {
+                    result.push_str(token);
+                } else if let Some(stem) = token.strip_suffix('*')
                    && !stem.is_empty()
                    && stem.chars().all(|c| c.is_alphanumeric() || c == '_')
                {
--- a/src/search/vector.rs
+++ b/src/search/vector.rs
@@ -40,6 +40,17 @@ fn max_chunks_per_document(conn: &Connection) -> Result<i64> {
        .unwrap_or(1))
 }

+/// sqlite-vec hard limit for KNN `k` parameter.
+const SQLITE_VEC_KNN_MAX: usize = 4_096;
+
+/// Compute the KNN k value from the requested limit and the max chunks per
+/// document. The result is guaranteed to never exceed [`SQLITE_VEC_KNN_MAX`].
+fn compute_knn_k(limit: usize, max_chunks_per_doc: i64) -> usize {
+    let max_chunks = max_chunks_per_doc.unsigned_abs().max(1) as usize;
+    let multiplier = (max_chunks * 3 / 2 + 1).clamp(8, 200);
+    (limit * multiplier).min(SQLITE_VEC_KNN_MAX)
+}
+
 pub fn search_vector(
    conn: &Connection,
    query_embedding: &[f32],
@@ -55,8 +66,7 @@ pub fn search_vector(
        .collect();

    let max_chunks = max_chunks_per_document(conn)?.max(1);
-    let multiplier = ((max_chunks.unsigned_abs() as usize * 3 / 2) + 1).clamp(8, 200);
-    let k = (limit * multiplier).min(10_000);
+    let k = compute_knn_k(limit, max_chunks);

    let mut stmt = conn.prepare(
        "SELECT rowid, distance
@@ -124,6 +134,52 @@ mod tests {
        assert_eq!(results.len(), 2);
    }

+    #[test]
+    fn test_knn_k_never_exceeds_sqlite_vec_limit() {
+        for limit in [1, 10, 50, 100, 500, 1000, 1500, 2000, 5000] {
+            for max_chunks in [1, 2, 5, 10, 50, 100, 200, 500, 1000] {
+                let k = compute_knn_k(limit, max_chunks);
+                assert!(
+                    k <= SQLITE_VEC_KNN_MAX,
+                    "k={k} exceeded limit for limit={limit}, max_chunks={max_chunks}"
+                );
+            }
+        }
+    }
+
+    #[test]
+    fn test_knn_k_reproduces_original_bug_scenario() {
+        let k = compute_knn_k(1500, 1);
+        assert!(
+            k <= SQLITE_VEC_KNN_MAX,
+            "k={k} exceeded 4096 at RECALL_CAP with 1 chunk"
+        );
+    }
+
+    #[test]
+    fn test_knn_k_small_limit_uses_minimum_multiplier() {
+        let k = compute_knn_k(10, 1);
+        assert_eq!(k, 80);
+    }
+
+    #[test]
+    fn test_knn_k_high_chunks_caps_multiplier() {
+        let k = compute_knn_k(10, 200);
+        assert_eq!(k, 2000);
+    }
+
+    #[test]
+    fn test_knn_k_zero_max_chunks_treated_as_one() {
+        let k = compute_knn_k(10, 0);
+        assert_eq!(k, 80);
+    }
+
+    #[test]
+    fn test_knn_k_negative_max_chunks_uses_absolute() {
+        let k = compute_knn_k(10, -5);
+        assert_eq!(k, compute_knn_k(10, 5));
+    }
+
    fn search_vector_dedup(rows: Vec<(i64, f64)>, limit: usize) -> Vec<VectorResult> {
        let mut best: HashMap<i64, f64> = HashMap::new();
        for (rowid, distance) in rows {
--- a/tests/timeline_pipeline_tests.rs
+++ b/tests/timeline_pipeline_tests.rs
@@ -108,8 +108,8 @@ fn insert_label_event(

 /// Full pipeline: seed -> expand -> collect for a scenario with an issue
 /// that has a closing MR, state changes, and label events.
-#[test]
-fn pipeline_seed_expand_collect_end_to_end() {
+#[tokio::test]
+async fn pipeline_seed_expand_collect_end_to_end() {
    let conn = setup_db();
    let project_id = insert_project(&conn, "group/project");

@@ -149,7 +149,9 @@ fn pipeline_seed_expand_collect_end_to_end() {
    insert_label_event(&conn, project_id, Some(issue_id), "bug", 1500);

    // SEED: find entities matching "authentication"
-    let seed_result = seed_timeline(&conn, "authentication", None, None, 50, 10).unwrap();
+    let seed_result = seed_timeline(&conn, None, "authentication", None, None, 50, 10)
+        .await
+        .unwrap();
    assert!(
        !seed_result.seed_entities.is_empty(),
        "Seed should find at least one entity"
@@ -175,6 +177,7 @@ fn pipeline_seed_expand_collect_end_to_end() {
        &seed_result.seed_entities,
        &expand_result.expanded_entities,
        &seed_result.evidence_notes,
+        &seed_result.matched_discussions,
        None,
        1000,
    )
@@ -213,12 +216,14 @@ fn pipeline_seed_expand_collect_end_to_end() {
 }

 /// Verify the pipeline handles an empty FTS result gracefully.
-#[test]
-fn pipeline_empty_query_produces_empty_result() {
+#[tokio::test]
+async fn pipeline_empty_query_produces_empty_result() {
    let conn = setup_db();
    let _project_id = insert_project(&conn, "group/project");

-    let seed_result = seed_timeline(&conn, "", None, None, 50, 10).unwrap();
+    let seed_result = seed_timeline(&conn, None, "", None, None, 50, 10)
+        .await
+        .unwrap();
    assert!(seed_result.seed_entities.is_empty());

    let expand_result = expand_timeline(&conn, &seed_result.seed_entities, 1, false, 100).unwrap();
@@ -229,6 +234,7 @@ fn pipeline_empty_query_produces_empty_result() {
        &seed_result.seed_entities,
        &expand_result.expanded_entities,
        &seed_result.evidence_notes,
+        &seed_result.matched_discussions,
        None,
        1000,
    )
@@ -237,8 +243,8 @@ fn pipeline_empty_query_produces_empty_result() {
 }

 /// Verify since filter propagates through the full pipeline.
-#[test]
-fn pipeline_since_filter_excludes_old_events() {
+#[tokio::test]
+async fn pipeline_since_filter_excludes_old_events() {
    let conn = setup_db();
    let project_id = insert_project(&conn, "group/project");

@@ -255,7 +261,9 @@ fn pipeline_since_filter_excludes_old_events() {
    insert_state_event(&conn, project_id, Some(issue_id), None, "closed", 2000);
    insert_state_event(&conn, project_id, Some(issue_id), None, "reopened", 8000);

-    let seed_result = seed_timeline(&conn, "deploy", None, None, 50, 10).unwrap();
+    let seed_result = seed_timeline(&conn, None, "deploy", None, None, 50, 10)
+        .await
+        .unwrap();
    let expand_result = expand_timeline(&conn, &seed_result.seed_entities, 0, false, 100).unwrap();

    // Collect with since=5000: should exclude Created(1000) and closed(2000)
@@ -264,6 +272,7 @@ fn pipeline_since_filter_excludes_old_events() {
        &seed_result.seed_entities,
        &expand_result.expanded_entities,
        &seed_result.evidence_notes,
+        &seed_result.matched_discussions,
        Some(5000),
        1000,
    )
@@ -274,8 +283,8 @@ fn pipeline_since_filter_excludes_old_events() {
 }

 /// Verify unresolved references use Option<i64> for target_iid.
-#[test]
-fn pipeline_unresolved_refs_have_optional_iid() {
+#[tokio::test]
+async fn pipeline_unresolved_refs_have_optional_iid() {
    let conn = setup_db();
    let project_id = insert_project(&conn, "group/project");

@@ -302,7 +311,9 @@ fn pipeline_unresolved_refs_have_optional_iid() {
    )
    .unwrap();

-    let seed_result = seed_timeline(&conn, "cross project", None, None, 50, 10).unwrap();
+    let seed_result = seed_timeline(&conn, None, "cross project", None, None, 50, 10)
+        .await
+        .unwrap();
    let expand_result = expand_timeline(&conn, &seed_result.seed_entities, 1, false, 100).unwrap();

    assert_eq!(expand_result.unresolved_references.len(), 2);
Author	SHA1	Message	Date
teernisse	159c490ad7	docs: update README with notes, drift, error tolerance, scoring config, and expanded command reference Major additions: - lore notes command: full documentation of rich note querying with filters (author, type, path, resolution, time range, body substring), sort/format options, field selection, and browser opening - lore drift command: discussion divergence detection documentation - Error Tolerance section: table of all 8 auto-correction types with examples and mode behavior, stderr JSON warning format, fuzzy suggestion format for unrecognized commands - Command Aliases table: primary commands and their accepted aliases - scoring config section: all weight/half-life/decay parameters for the who-expert scoring engine (authorWeight, reviewerWeight, noteBonus, half-life periods, closedMrMultiplier, excludedUsernames) Updates to existing sections: - Timeline: entity-direct seeding syntax (issue:N, i:N, mr:N, m:N), hybrid search pipeline description replacing pure FTS5, discussion thread collection, --fields flag, numbered progress spinners - Search: --after/--updated-after renamed to --since/--updated-since, progress spinner behavior, note type filter - Who: --explain-score, --as-of, --include-bots, --all-history, --detail - Sync: --no-file-changes flag - Robot-docs: --brief flag - Field selection: expanded to note which commands support --fields	2026-02-13 17:27:59 -05:00
teernisse	e0041ed4d9	feat(cli): improve error recovery with alias-aware suggestions and error tolerance manifest Two related improvements to agent ergonomics in main.rs: 1. suggest_similar_command now matches against aliases (issue->issues, mr->mrs, find->search, stat->stats, note->notes, etc.) and provides contextual usage examples via a new command_example() helper, so agents get actionable recovery hints like "Did you mean 'lore mrs'? Example: lore --robot mrs -n 10" instead of just the command name. 2. robot-docs now includes an error_tolerance section documenting every auto-correction the CLI performs: types (single_dash_long_flag, case_normalization, flag_prefix, fuzzy_flag, subcommand_alias, value_normalization, value_fuzzy, prefix_matching), examples, and mode behavior (threshold differences). Also expands the aliases section with command_aliases and pre_clap_aliases maps for complete agent self-discovery. Together these ensure agents can programmatically discover and recover from any CLI input error without human intervention.	2026-02-13 17:27:49 -05:00
teernisse	a34751bd47	feat(autocorrect): expand pre-clap correction to 3-phase pipeline with subcommand aliases, value normalization, and flag prefix matching Three-phase pipeline replacing the single-pass correction: - Phase A: Subcommand alias correction — handles forms clap can't express (merge_requests, mergerequests, robotdocs, generatedocs, gen-docs, etc.) via case-insensitive alias map lookup. - Phase B: Per-arg flag corrections — adds unambiguous prefix expansion (--proj -> --project) alongside existing single-dash, case, and fuzzy rules. New FlagPrefix rule with 0.95 confidence. - Phase C: Enum value normalization — auto-corrects casing, prefixes, and typos for flags with known valid values. Handles both --flag value and --flag=value forms. Respects POSIX -- option terminator. Changes strict/robot mode from disabling fuzzy matching entirely to using a higher threshold (0.9 vs 0.8), still catching obvious typos like --projct while avoiding speculative corrections that mislead agents. New CorrectionRule variants: SubcommandAlias, ValueNormalization, ValueFuzzy, FlagPrefix. Each has a corresponding teaching note. Comprehensive test coverage for all new correction types including subcommand aliases, value normalization (case, prefix, fuzzy, eq-form), flag prefix (ambiguous rejection, eq-value preservation), and updated strict mode behavior.	2026-02-13 17:27:39 -05:00
teernisse	0aecbf33c0	feat(xref): extract cross-references from descriptions, user notes, and fix system note regex - Fix MENTIONED_RE/CLOSED_BY_RE to match real GitLab format ('mentioned in issue #N' / 'mentioned in merge request !N') - Add GITLAB_URL_RE + parse_url_refs() for full URL extraction - Add extract_refs_from_descriptions() -> source_method='description_parse' - Add extract_refs_from_user_notes() -> source_method='note_parse' - Wire both into orchestrator after system note extraction - 36 tests: regex fix, URL parsing, integration, idempotency	2026-02-13 17:19:36 -05:00
teernisse	c10471ddb9	feat(timeline): add entity-direct seeding (issue:N, mr:N syntax) Adds issue:N / i:N / mr:N / m:N query syntax to bypass hybrid search and seed the timeline directly from a known entity. All discussions for the entity are gathered without needing Ollama. - parse_timeline_query() detects entity-direct patterns - resolve_entity_by_iid() resolves IID to EntityRef with ambiguity handling - seed_timeline_direct() gathers all discussions for the entity - 20 new tests (5 resolve, 6 direct seed, 9 parse) - Updated CLI help text and robot-docs manifest	2026-02-13 15:22:45 -05:00
teernisse	cbce4c9f59	release: v0.8.2	2026-02-13 15:01:28 -05:00
teernisse	94435c37f0	perf(timeline): hoist prepared statement outside discussion thread loop Moves the conn.prepare() call for fetching discussion notes outside the per-discussion loop in collect_discussion_threads(). The SQL is identical for every iteration, so preparing it once and rebinding parameters avoids redundant statement compilation on each matched discussion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 14:56:40 -05:00
teernisse	59f65b127a	fix(search): pass FTS5 boolean operators through unquoted FTS5 boolean operators (AND, OR, NOT, NEAR) are case-sensitive uppercase keywords that must appear unquoted in the query string. Previously, the user-friendly query builder would double-quote every token, causing queries like "switch AND health" to search for the literal word "AND" instead of using it as a boolean conjunction. Adds a FTS5_OPERATORS constant and checks each token against it before quoting, allowing natural boolean search syntax to work as expected. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 14:56:29 -05:00
teernisse	f36e900570	feat(cli): add pipeline progress spinners to timeline and search Adds numbered stage spinners ([1/3], [2/3], [3/3]) to the timeline pipeline stages (seed, expand, collect) so users see activity during longer queries. TimelineParams gains a robot_mode field to suppress spinners in JSON output mode. Adds a [1/1] spinner to the search command for consistency, using the shared stage_spinner from cli/progress. Also refactors wrap_snippet() to delegate to wrap_text() with a 4-line cap, eliminating the duplicated word-wrapping logic. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 14:56:19 -05:00
teernisse	e2efc61beb	refactor(cli): extract stage_spinner to shared progress module Moves stage_spinner() from a private function in sync.rs to a pub function in cli/progress.rs so it can be reused by the timeline and search commands. The function creates a numbered spinner (e.g. [1/3]) for pipeline stages, returning a hidden no-op bar in robot mode to keep caller code path-uniform. sync.rs now imports from crate::cli::progress::stage_spinner instead of defining its own copy. Adds unit tests for robot mode (hidden bar), human mode (prefix/message properties), and prefix formatting. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 14:56:10 -05:00
teernisse	2da1a228b3	feat(timeline): collect and render full discussion threads Implements the downstream consumption of matched discussions from the seed phase, completing the discussion thread feature across collect, CLI, and integration tests. Collect phase (timeline_collect.rs): - New collect_discussion_threads() function assembles full threads by querying notes for each matched discussion_id, filtering out system notes (is_system = 0), ordering chronologically, and capping at THREAD_MAX_NOTES with a synthetic "[N more notes not shown]" summary note - build_entity_lookup() creates a (type, id) -> (iid, path) map from seed and expanded entities to provide display metadata for thread events - Thread timestamp is set to the first note's created_at for correct chronological interleaving with other timeline events - collect_events() gains a matched_discussions parameter; threads are collected after entity events and before evidence note merging CLI rendering (cli/commands/timeline.rs): - Human mode: threads render with box-drawing borders, bold @author tags, date-stamped notes, and word-wrapped bodies (60 char width) - Robot mode: DiscussionThread serializes as discussion_thread kind with note_count, full notes array (note_id, author, body, ISO created_at) - THREAD tag in yellow for human event tag styling - TimelineMeta gains discussion_threads_included count Tests: - 8 new collect tests: basic thread assembly, system note filtering, empty thread skipping, body truncation to THREAD_NOTE_MAX_CHARS, note cap with synthetic summary, timestamp from first note, chronological sort position, and deduplication of duplicate discussion_ids - Integration tests updated for new collect_events signature Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 14:18:36 -05:00
teernisse	0e65202778	feat(timeline): add DiscussionThread types and seed-phase discussion matching Introduces the foundation for full discussion thread support in the timeline pipeline. Adds three new domain types to timeline.rs: - ThreadNote: individual note within a thread (id, author, body, timestamp) - MatchedDiscussion: tracks discussions matched during seeding with their parent entity (issue or MR) for downstream collection - DiscussionThread variant on TimelineEventType: carries a full thread of notes, sorted between NoteEvidence and CrossReferenced Moves truncate_to_chars() from timeline_seed.rs to timeline.rs as pub(crate) for reuse by the collect phase. Adds THREAD_NOTE_MAX_CHARS (2000) and THREAD_MAX_NOTES (50) constants. Upgrades the seed SQL in resolve_documents_to_entities() to resolve note documents to their parent discussion via an additional LEFT JOIN chain (notes -> discussions), using COALESCE to unify the entity resolution path for both discussion and note source types. SeedResult gains a matched_discussions field that captures deduplicated discussion matches. Tests cover: discussion matching from discussion docs, note-to-parent resolution, deduplication of same discussion across multiple docs, and correct parent entity type (issue vs MR). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 14:18:18 -05:00
teernisse	f439c42b3d	chore: add gitignore for mock-seed, roam CI workflow, formatting - Add tools/mock-seed/ to .gitignore - Add .github/workflows/roam.yml CI workflow - Add .roam/fitness.yaml architectural fitness rules - Rustfmt formatting fixes in show.rs and vector.rs - Beads sync Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 13:50:30 -05:00
teernisse	4f3ec72923	feat(timeline): upgrade seed phase to hybrid search Replace FTS-only seed entity discovery with hybrid search (FTS + vector via RRF), using the same search_hybrid infrastructure as the search command. Falls back gracefully to FTS-only when Ollama is unavailable. Changes: - seed_timeline() now accepts OllamaClient, delegates to search_hybrid - New resolve_documents_to_entities() replaces find_seed_entities() - SeedResult gains search_mode field tracking actual mode used - TimelineResult carries search_mode through to JSON renderer - run_timeline wires up OllamaClient from config - handle_timeline made async for the hybrid search await - Tests updated for new function signatures Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 13:50:24 -05:00
teernisse	e6771709f1	refactor(core): extract path_resolver module, fix old_path matching in who Extract shared path resolution logic from who.rs into a new core::path_resolver module for cross-module reuse. Functions moved: escape_like, normalize_repo_path, PathQuery, SuffixResult, build_path_query, suffix_probe. Duplicate escape_like copies removed from list.rs, project.rs, and filters.rs — all now import from path_resolver. Additionally fixes two bugs in query_expert_details() and query_overlap() where only position_new_path was checked (missing old_path matches for renamed files) and state filter excluded 'closed' MRs despite the main scoring query including them with a decay multiplier. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 13:50:14 -05:00
Taylor Eernisse	8c86b0dfd7	release: v0.8.1	2026-02-13 11:12:31 -05:00
teernisse	6e55b2470d	bugfix: DB column and size issues	2026-02-13 11:11:35 -05:00
Taylor Eernisse	b05922d60b	release: v0.8.0	2026-02-13 10:59:05 -05:00
Taylor Eernisse	11fe02fac9	docs: add proposed code file reorganization plan Planning document for the ongoing test extraction and code organization effort. Covers module-by-module analysis, proposed file splits, and phased execution plan. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 10:54:56 -05:00
Taylor Eernisse	48fbd4bfdb	feat(core): add file rename chain resolver with depth-bounded BFS New module: core::file_history with resolve_rename_chain() that traces a file path through its rename history in mr_file_changes using bidirectional BFS (forward: old_path->new_path, backward: new_path->old_path). Key design decisions: - Depth-bounded BFS: each queue entry carries its distance from the origin, so max_hops correctly limits by graph distance (not by total nodes discovered). This matters for branching rename graphs where a file was renamed differently in parallel MRs. - Cycle-safe: visited set prevents infinite loops from circular renames. - Project-scoped: queries are always scoped to a single project_id. - Deterministic: output is sorted for stable results. Tests cover: linear chains (forward/backward), cycles, max_hops=0, depth-bounded linear chains, branching renames, diamond patterns, and cross-project isolation (9 tests total). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 10:54:41 -05:00
Taylor Eernisse	9786ef27f5	refactor(core/time): extract parse_since_from for deterministic time parsing Factor out parse_since_from(input, reference_ms) so callers can compute relative durations against a fixed reference timestamp instead of always using now(). The existing parse_since() now delegates to it with now_ms(). Enables testable and reproducible time-relative queries for features like timeline --as-of and who --as-of. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 10:54:20 -05:00
Taylor Eernisse	7e0e6a91f2	refactor: extract unit tests into separate _tests.rs files Move inline #[cfg(test)] mod tests { ... } blocks from 22 source files into dedicated _tests.rs companion files, wired via: #[cfg(test)] #[path = "module_tests.rs"] mod tests; This keeps implementation-focused source files leaner and more scannable while preserving full access to private items through `use super::*;`. Modules extracted: core: db, note_parser, payloads, project, references, sync_run, timeline_collect, timeline_expand, timeline_seed cli: list (55 tests), who (75 tests) documents: extractor (43 tests), regenerator embedding: change_detector, chunking gitlab: graphql (wiremock async tests), transformers/issue ingestion: dirty_tracker, discussions, issues, mr_diffs Also adds conflicts_with("explain_score") to the --detail flag in the who command to prevent mutually exclusive flags from being combined. All 629 unit tests pass. No behavior changes. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 10:54:02 -05:00
Taylor Eernisse	5c2df3df3b	chore(beads): sync issue tracker Export latest bead state to JSONL. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-13 10:53:33 -05:00
teernisse	94c8613420	feat(bd-226s): implement time-decay expert scoring model Replace flat-weight expertise scoring with exponential half-life decay, split reviewer signals (participated vs assigned-only), dual-path rename awareness, and new CLI flags (--as-of, --explain-score, --include-bots, --all-history). Changes: - ScoringConfig: 8 new fields with validation (config.rs) - half_life_decay() and normalize_query_path() pure functions (who.rs) - CTE-based SQL with dual-path matching, mr_activity, reviewer_participation (who.rs) - Rust-side decay aggregation with deterministic f64 ordering (who.rs) - Path resolution probes check old_path columns (who.rs) - Migration 026: 5 new indexes for dual-path and reviewer participation - Default --since changed from 6m to 24m - 31 new tests (example-based + invariant), 621 total who tests passing - Autocorrect registry updated with new flags Closes: bd-226s, bd-2w1p, bd-1soz, bd-18dn, bd-2ao4, bd-2yu5, bd-1b50, bd-1hoq, bd-1h3f, bd-13q8, bd-11mg, bd-1vti, bd-1j5o	2026-02-12 15:44:55 -05:00
teernisse	ad4dd6e855	release: v0.7.0	2026-02-12 13:31:57 -05:00
teernisse	83cd16c918	feat: implement per-note search and document pipeline - Add SourceType::Note with extract_note_document() and ParentMetadataCache - Migration 022: composite indexes for notes queries + author_id column - Migration 024: table rebuild adding 'note' to CHECK constraints, defense triggers - Migration 025: backfill existing non-system notes into dirty queue - Add lore notes CLI command with 17 filter options (author, path, resolution, etc.) - Support table/json/jsonl/csv output formats with field selection - Wire note dirty tracking through discussion and MR discussion ingestion - Fix test_migration_024_preserves_existing_data off-by-one (tested wrong migration) - Fix upsert_document_inner returning false for label/path-only changes	2026-02-12 13:31:24 -05:00