pubcli

Author	SHA1	Message	Date
teernisse	28479071ae	Fix category synonym matching: deduplicate aliases, remove broken fast-path Three issues in categoryAliasList/categoryMatcher: 1. categoryAliasList appended raw synonyms without deduplication—the addAlias helper already handles lowering and dedup, so route synonyms through it instead of direct append. 2. categoryMatcher.matches had a fast-path that returned false when the input contained no separators (-_ space), skipping the normalization step entirely. This caused legitimate matches like "frozen foods" vs "frozen" to fail when the input was a simple word that needed plural stripping to match. 3. normalizeCategory unconditionally replaced underscores/hyphens and re-joined fields even for inputs without separators. Gate the separator logic behind a ContainsAny check, and use direct slice indexing instead of TrimSuffix for the plural stripping. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 01:31:50 -05:00
teernisse	eb2328b768	Enhance filter pipeline with synonym-aware categories and deal sorting - extend filter.Options with sort mode support and keep Apply as a single-pass pipeline with limit behavior preserved for unsorted flows - add sort normalization and two ordering strategies: * savings: rank by computed DealScore with deterministic title tie-break * ending: rank by earliest parsed end date, then DealScore fallback - introduce DealScore heuristics that combine BOGO weighting, dollar-off extraction, and percentage extraction from savings/deal-info text - add category synonym matcher that supports: * direct case-insensitive matches * canonical group synonym expansion (e.g. veggies -> produce) * normalized fallback for hyphen/underscore/plural variants without breaking exact unknown-category matching - include explicit tests for synonym matching, hyphenated category handling, unknown plural exact matching, and sort ordering behavior - keep allocation-sensitive behavior intact while adding matcher precomputation and fast-path checks	2026-02-23 00:26:55 -05:00
teernisse	df0af4a5f8	Rewrite filter.Apply as single-pass with early-exit and pre-allocation Replace the multi-pass where() chain in Apply() with a single loop that evaluates all filter predicates per item and skips immediately on first mismatch. This eliminates N intermediate slice allocations (one per active filter) and avoids re-scanning the full dataset for each filter dimension. Key changes in filter.go: - Single loop with continue-on-mismatch for BOGO, category, department, and query filters — combined categories check scans item.Categories once for both BOGO and category instead of twice - Pre-allocate result slice capped at min(len(items), opts.Limit) to avoid grow-and-copy churn - Fast-path bypass when no filters are active (just apply limit) - Break early once limit is reached instead of filtering everything and truncating after - Remove the now-unused where() helper function - Add early-return fast paths to CleanText() for the common case where input contains no HTML entities or newlines, avoiding unnecessary html.UnescapeString and ReplaceAll calls Test coverage: - filter_equivalence_test.go (new): Reference implementation of the original multi-pass algorithm with 500 randomized test cases verifying behavioral equivalence. Includes allocation budget guardrail (<=80 allocs/op for 1k items) to catch accidental regression to multi-pass. Benchmarks for new vs legacy reference on identical workload. - filter_test.go: Benchmark comparisons for CleanText on plain text (fast path) vs escaped HTML (full path), new vs legacy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>	2026-02-23 00:11:38 -05:00
teernisse	12eb55f4b8	Add deal filtering engine with BOGO, category, department, and keyword support Composable filter pipeline that processes SavingItem slices through chained predicates: BOGO detection (category match), exact category match, substring department match, and keyword search across title and description fields. All text matching is case-insensitive. Includes utility functions for HTML entity unescaping (CleanText), nil-safe string pointer dereferencing (Deref), and case-insensitive slice membership (ContainsIgnoreCase). An optional limit truncates results after all filters are applied. Tests cover each filter in isolation, combined filters, nil field safety, and the Categories aggregation helper.	2026-02-22 21:41:46 -05:00

4 Commits