Three issues in categoryAliasList/categoryMatcher:
1. categoryAliasList appended raw synonyms without deduplication—the
addAlias helper already handles lowering and dedup, so route synonyms
through it instead of direct append.
2. categoryMatcher.matches had a fast-path that returned false when the
input contained no separators (-_ space), skipping the normalization
step entirely. This caused legitimate matches like "frozen foods" vs
"frozen" to fail when the input was a simple word that needed plural
stripping to match.
3. normalizeCategory unconditionally replaced underscores/hyphens and
re-joined fields even for inputs without separators. Gate the
separator logic behind a ContainsAny check, and use direct slice
indexing instead of TrimSuffix for the plural stripping.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
- extend filter.Options with sort mode support and keep Apply as a single-pass pipeline with limit behavior preserved for unsorted flows
- add sort normalization and two ordering strategies:
* savings: rank by computed DealScore with deterministic title tie-break
* ending: rank by earliest parsed end date, then DealScore fallback
- introduce DealScore heuristics that combine BOGO weighting, dollar-off extraction, and percentage extraction from savings/deal-info text
- add category synonym matcher that supports:
* direct case-insensitive matches
* canonical group synonym expansion (e.g. veggies -> produce)
* normalized fallback for hyphen/underscore/plural variants without breaking exact unknown-category matching
- include explicit tests for synonym matching, hyphenated category handling, unknown plural exact matching, and sort ordering behavior
- keep allocation-sensitive behavior intact while adding matcher precomputation and fast-path checks
Replace the multi-pass where() chain in Apply() with a single loop that
evaluates all filter predicates per item and skips immediately on first
mismatch. This eliminates N intermediate slice allocations (one per
active filter) and avoids re-scanning the full dataset for each filter
dimension.
Key changes in filter.go:
- Single loop with continue-on-mismatch for BOGO, category, department,
and query filters — combined categories check scans item.Categories
once for both BOGO and category instead of twice
- Pre-allocate result slice capped at min(len(items), opts.Limit) to
avoid grow-and-copy churn
- Fast-path bypass when no filters are active (just apply limit)
- Break early once limit is reached instead of filtering everything
and truncating after
- Remove the now-unused where() helper function
- Add early-return fast paths to CleanText() for the common case where
input contains no HTML entities or newlines, avoiding unnecessary
html.UnescapeString and ReplaceAll calls
Test coverage:
- filter_equivalence_test.go (new): Reference implementation of the
original multi-pass algorithm with 500 randomized test cases verifying
behavioral equivalence. Includes allocation budget guardrail (<=80
allocs/op for 1k items) to catch accidental regression to multi-pass.
Benchmarks for new vs legacy reference on identical workload.
- filter_test.go: Benchmark comparisons for CleanText on plain text
(fast path) vs escaped HTML (full path), new vs legacy.
Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Composable filter pipeline that processes SavingItem slices through
chained predicates: BOGO detection (category match), exact category
match, substring department match, and keyword search across title
and description fields. All text matching is case-insensitive.
Includes utility functions for HTML entity unescaping (CleanText),
nil-safe string pointer dereferencing (Deref), and case-insensitive
slice membership (ContainsIgnoreCase). An optional limit truncates
results after all filters are applied. Tests cover each filter in
isolation, combined filters, nil field safety, and the Categories
aggregation helper.