Documentation
¶
Overview ¶
Package meta implements the meta-engine orchestrator that automatically selects the optimal regex execution strategy.
The meta-engine coordinates three engines:
- Prefilter: fast literal-based candidate finding (optional)
- Lazy DFA: deterministic finite automaton with on-demand state construction
- NFA (PikeVM): nondeterministic fallback for complex patterns or DFA cache overflow
Strategy selection is based on:
- Pattern complexity (NFA size, literal quality)
- Prefilter availability (good literals enable fast filtering)
- DFA suitability (patterns without alternations benefit most)
The meta-engine provides the public API for pattern compilation and matching, hiding the complexity of multi-engine coordination from users.
Index ¶
- Variables
- func StrategyReason(strategy Strategy, n *nfa.NFA, literals *literal.Seq, config Config) string
- type CompileError
- type Config
- type ConfigError
- type Engine
- func (e *Engine) Count(haystack []byte, n int) int
- func (e *Engine) Find(haystack []byte) *Match
- func (e *Engine) FindAllSubmatch(haystack []byte, n int) []*MatchWithCaptures
- func (e *Engine) FindSubmatch(haystack []byte) *MatchWithCaptures
- func (e *Engine) IsMatch(haystack []byte) bool
- func (e *Engine) NumCaptures() int
- func (e *Engine) ResetStats()
- func (e *Engine) Stats() Stats
- func (e *Engine) Strategy() Strategy
- func (e *Engine) SubexpNames() []string
- type Match
- type MatchWithCaptures
- func (m *MatchWithCaptures) AllGroupStrings() []string
- func (m *MatchWithCaptures) AllGroups() [][]byte
- func (m *MatchWithCaptures) Bytes() []byte
- func (m *MatchWithCaptures) End() int
- func (m *MatchWithCaptures) Group(index int) []byte
- func (m *MatchWithCaptures) GroupIndex(index int) []int
- func (m *MatchWithCaptures) GroupString(index int) string
- func (m *MatchWithCaptures) NumCaptures() int
- func (m *MatchWithCaptures) Start() int
- func (m *MatchWithCaptures) String() string
- type ReverseAnchoredSearcher
- type ReverseInnerSearcher
- type ReverseSuffixSearcher
- type Stats
- type Strategy
Constants ¶
This section is empty.
Variables ¶
var ErrNoInnerLiterals = errors.New("no inner literals available for ReverseInner strategy")
ErrNoInnerLiterals indicates that no inner literals could be extracted for ReverseInner strategy. This is not a fatal error - it just means ReverseInner optimization cannot be used.
var ErrNoPrefilter = errors.New("no prefilter available for suffix literals")
ErrNoPrefilter indicates that no prefilter could be built for suffix literals. This is not a fatal error - it just means ReverseSuffix optimization cannot be used.
Functions ¶
func StrategyReason ¶
StrategyReason provides a human-readable explanation for strategy selection.
This is useful for debugging and performance tuning.
Example:
strategy := meta.SelectStrategy(nfa, literals, config)
reason := meta.StrategyReason(strategy, nfa, literals, config)
log.Printf("Using %s: %s", strategy, reason)
Types ¶
type CompileError ¶
CompileError represents a pattern compilation error.
func (*CompileError) Error ¶
func (e *CompileError) Error() string
Error implements the error interface.
func (*CompileError) Unwrap ¶
func (e *CompileError) Unwrap() error
Unwrap returns the underlying error.
type Config ¶
type Config struct {
// EnableDFA enables the Lazy DFA engine.
// When false, only NFA (PikeVM) is used.
// Default: true
EnableDFA bool
// EnablePrefilter enables literal-based prefiltering.
// When false, no prefilter is used even if literals are available.
// Default: true
EnablePrefilter bool
// MaxDFAStates sets the maximum number of DFA states to cache.
// Larger values use more memory but reduce NFA fallback frequency.
// Default: 10000
MaxDFAStates uint32
// DeterminizationLimit caps the number of NFA states per DFA state.
// This prevents exponential blowup in patterns like (a*)*b.
// Default: 1000
DeterminizationLimit int
// MinLiteralLen is the minimum length for prefilter literals.
// Shorter literals may have too many false positives.
// Default: 2
MinLiteralLen int
// MaxLiterals limits the number of literals to extract for prefiltering.
// Default: 64
MaxLiterals int
// MaxRecursionDepth limits recursion during NFA compilation.
// Default: 100
MaxRecursionDepth int
}
Config controls meta-engine behavior and performance characteristics.
Configuration options affect:
- Strategy selection (which engine to use)
- Cache sizes (DFA state cache)
- Limits (determinization, recursion)
- Prefilter enablement
Example:
config := meta.DefaultConfig() config.EnableDFA = false // Force NFA-only execution engine := meta.NewEngine(nfa, config)
func DefaultConfig ¶
func DefaultConfig() Config
DefaultConfig returns a configuration with sensible defaults.
Defaults are tuned for typical regex patterns with a balance between performance and memory usage:
- DFA enabled with 10K state cache (moderate memory usage)
- Prefilter enabled (5-50x speedup on patterns with literals)
- Conservative determinization limit (prevents exponential blowup)
- Reasonable recursion depth (handles nested patterns)
Example:
config := meta.DefaultConfig() // Use as-is or customize specific options config.MaxDFAStates = 50000 // Increase cache for better hit rate
func (Config) Validate ¶
Validate checks if the configuration is valid. Returns an error if any parameter is out of range.
Valid ranges:
- MaxDFAStates: 1 to 1,000,000
- DeterminizationLimit: 10 to 100,000
- MinLiteralLen: 1 to 64
- MaxLiterals: 1 to 1,000
- MaxRecursionDepth: 10 to 1,000
Example:
config := meta.Config{MaxDFAStates: 0} // Invalid!
if err := config.Validate(); err != nil {
log.Fatal(err)
}
type ConfigError ¶
ConfigError represents an invalid configuration parameter.
func (*ConfigError) Error ¶
func (e *ConfigError) Error() string
Error implements the error interface.
type Engine ¶
type Engine struct {
// contains filtered or unexported fields
}
Engine is the meta-engine that orchestrates all regex execution strategies.
The Engine:
- Analyzes the pattern and extracts literals
- Selects the optimal strategy (NFA, DFA, or both)
- Builds prefilter (if literals available)
- Coordinates search across engines
Thread safety: Not thread-safe. Each goroutine should use its own Engine instance. The underlying NFA is immutable and can be shared, but Engine state is mutable.
Example:
// Compile pattern
engine, err := meta.Compile("(foo|bar)\\d+")
if err != nil {
return err
}
// Search
haystack := []byte("test foo123 end")
match := engine.Find(haystack)
if match != nil {
println(match.String()) // "foo123"
}
func Compile ¶
Compile compiles a regex pattern string into an executable Engine.
Steps:
- Parse pattern using regexp/syntax
- Compile to NFA
- Extract literals (prefixes, suffixes)
- Build prefilter (if good literals exist)
- Select strategy
- Build DFA (if strategy requires it)
Returns an error if:
- Pattern syntax is invalid
- Pattern is too complex (recursion limit exceeded)
- Configuration is invalid
Example:
engine, err := meta.Compile("hello.*world")
if err != nil {
log.Fatal(err)
}
func CompileRegexp ¶
CompileRegexp compiles a parsed syntax.Regexp with default configuration.
This is useful when you already have a parsed regexp from another source.
Example:
re, _ := syntax.Parse("hello", syntax.Perl)
engine, err := meta.CompileRegexp(re, meta.DefaultConfig())
func CompileWithConfig ¶
CompileWithConfig compiles a pattern with custom configuration.
Example:
config := meta.DefaultConfig()
config.MaxDFAStates = 50000 // Increase cache
engine, err := meta.CompileWithConfig("(a|b|c)*", config)
func (*Engine) Count ¶ added in v0.4.0
Count returns the number of non-overlapping matches in the haystack.
This is optimized for counting without allocating result slices. Uses early termination for boolean checks at each step. If n > 0, counts at most n matches. If n <= 0, counts all matches.
Example:
engine, _ := meta.Compile(`\d+`)
count := engine.Count([]byte("1 2 3 4 5"), -1)
// count == 5
func (*Engine) Find ¶
Find returns the first match in the haystack, or nil if no match.
The search algorithm depends on the selected strategy:
UseNFA: PikeVM search directly UseDFA: Prefilter (if available) → DFA → NFA fallback UseBoth: Try DFA, fallback to NFA on cache full
Example:
engine, _ := meta.Compile("hello")
match := engine.Find([]byte("say hello world"))
if match != nil {
println(match.String()) // "hello"
}
func (*Engine) FindAllSubmatch ¶ added in v0.4.0
func (e *Engine) FindAllSubmatch(haystack []byte, n int) []*MatchWithCaptures
FindAllSubmatch returns all successive matches with capture group information. If n > 0, returns at most n matches. If n <= 0, returns all matches.
Example:
engine, _ := meta.Compile(`(\w+)@(\w+)\.(\w+)`)
matches := engine.FindAllSubmatch([]byte("[email protected] [email protected]"), -1)
// len(matches) == 2
func (*Engine) FindSubmatch ¶ added in v0.2.0
func (e *Engine) FindSubmatch(haystack []byte) *MatchWithCaptures
FindSubmatch returns the first match with capture group information. Returns nil if no match is found.
Group 0 is always the entire match. Groups 1+ are explicit capture groups. Unmatched optional groups will have nil values.
When a one-pass DFA is available (for anchored patterns), this method is 10-20x faster than PikeVM for capture group extraction.
Example:
engine, _ := meta.Compile(`(\w+)@(\w+)\.(\w+)`)
match := engine.FindSubmatch([]byte("[email protected]"))
if match != nil {
fmt.Println(match.Group(0)) // "[email protected]"
fmt.Println(match.Group(1)) // "user"
fmt.Println(match.Group(2)) // "example"
fmt.Println(match.Group(3)) // "com"
}
func (*Engine) IsMatch ¶
IsMatch returns true if the pattern matches anywhere in the haystack.
This is optimized for boolean matching:
- Uses early termination (returns immediately on first match)
- Avoids Match object creation
- Uses DFA.IsMatch when available (2-10x faster than Find)
Example:
engine, _ := meta.Compile("hello")
if engine.IsMatch([]byte("say hello world")) {
println("matches!")
}
func (*Engine) NumCaptures ¶ added in v0.2.0
NumCaptures returns the number of capture groups in the pattern. Group 0 is the entire match, groups 1+ are explicit captures.
func (*Engine) ResetStats ¶
func (e *Engine) ResetStats()
ResetStats resets execution statistics to zero.
func (*Engine) Stats ¶
Stats returns execution statistics.
Useful for performance analysis and debugging.
Example:
stats := engine.Stats()
println("NFA searches:", stats.NFASearches)
println("DFA searches:", stats.DFASearches)
func (*Engine) Strategy ¶
Strategy returns the execution strategy selected for this engine.
Example:
strategy := engine.Strategy() println(strategy.String()) // "UseDFA"
func (*Engine) SubexpNames ¶ added in v0.5.0
SubexpNames returns the names of capture groups in the pattern. Index 0 is always "" (entire match). Named groups return their names, unnamed groups return "". This matches stdlib regexp.Regexp.SubexpNames() behavior.
type Match ¶
type Match struct {
// contains filtered or unexported fields
}
Match represents a successful regex match with position information.
A Match contains:
- Start position (inclusive)
- End position (exclusive)
- Reference to the original haystack
Note: This is a simple match without capture group support. Capture groups will be added in a future version.
Example:
match := &Match{start: 5, end: 11, haystack: []byte("test foo123 end")}
println(match.String()) // "foo123"
println(match.Start(), match.End()) // 5, 11
func NewMatch ¶
NewMatch creates a new Match from start and end positions.
Parameters:
- start: inclusive start position in haystack
- end: exclusive end position in haystack
- haystack: the original byte buffer that was searched
The haystack is stored by reference (not copied) for efficiency. Callers must ensure the haystack remains valid for the lifetime of the Match.
Example:
haystack := []byte("hello world")
match := meta.NewMatch(0, 5, haystack) // "hello"
func (*Match) Bytes ¶
Bytes returns the matched bytes as a slice.
The returned slice is a view into the original haystack (not a copy). Callers should copy the bytes if they need to retain them after the haystack is modified or deallocated.
Example:
match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(string(match.Bytes())) // "foo123"
func (*Match) Contains ¶
Contains returns true if the given position is within the match range.
Parameters:
- pos: position to check (must be >= 0)
Returns true if start <= pos < end.
Example:
match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Contains(7)) // true
println(match.Contains(11)) // false (exclusive end)
func (*Match) End ¶
End returns the exclusive end position of the match.
Example:
match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.End()) // 11
func (*Match) IsEmpty ¶
IsEmpty returns true if the match has zero length.
Empty matches can occur with patterns like "" or "(?:)" that match without consuming input.
Example:
match := meta.NewMatch(5, 5, []byte("test"))
println(match.IsEmpty()) // true
func (*Match) Len ¶
Len returns the length of the match in bytes.
Example:
match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Len()) // 6 (11 - 5)
func (*Match) Start ¶
Start returns the inclusive start position of the match.
Example:
match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Start()) // 5
type MatchWithCaptures ¶ added in v0.2.0
type MatchWithCaptures struct {
// contains filtered or unexported fields
}
MatchWithCaptures represents a match with capture group information. Group 0 is always the entire match, groups 1+ are explicit capture groups.
func NewMatchWithCaptures ¶ added in v0.2.0
func NewMatchWithCaptures(haystack []byte, captures [][]int) *MatchWithCaptures
NewMatchWithCaptures creates a MatchWithCaptures from capture positions.
func (*MatchWithCaptures) AllGroupStrings ¶ added in v0.2.0
func (m *MatchWithCaptures) AllGroupStrings() []string
AllGroupStrings returns all captured groups as strings.
func (*MatchWithCaptures) AllGroups ¶ added in v0.2.0
func (m *MatchWithCaptures) AllGroups() [][]byte
AllGroups returns all captured groups as byte slices. Result[0] is the entire match, result[i] may be nil if group i was not captured.
func (*MatchWithCaptures) Bytes ¶ added in v0.2.0
func (m *MatchWithCaptures) Bytes() []byte
Bytes returns the matched bytes for group 0 (entire match).
func (*MatchWithCaptures) End ¶ added in v0.2.0
func (m *MatchWithCaptures) End() int
End returns the end position of the entire match (group 0).
func (*MatchWithCaptures) Group ¶ added in v0.2.0
func (m *MatchWithCaptures) Group(index int) []byte
Group returns the captured text for the specified group. Group 0 is the entire match. Returns nil if group was not captured.
func (*MatchWithCaptures) GroupIndex ¶ added in v0.2.0
func (m *MatchWithCaptures) GroupIndex(index int) []int
GroupIndex returns [start, end] positions for the specified group. Returns nil if group was not captured.
func (*MatchWithCaptures) GroupString ¶ added in v0.2.0
func (m *MatchWithCaptures) GroupString(index int) string
GroupString returns the captured text as string for the specified group.
func (*MatchWithCaptures) NumCaptures ¶ added in v0.2.0
func (m *MatchWithCaptures) NumCaptures() int
NumCaptures returns the number of capture groups (including group 0).
func (*MatchWithCaptures) Start ¶ added in v0.2.0
func (m *MatchWithCaptures) Start() int
Start returns the start position of the entire match (group 0).
func (*MatchWithCaptures) String ¶ added in v0.2.0
func (m *MatchWithCaptures) String() string
String returns the matched text for group 0 (entire match).
type ReverseAnchoredSearcher ¶ added in v0.4.0
type ReverseAnchoredSearcher struct {
// contains filtered or unexported fields
}
ReverseAnchoredSearcher performs reverse search for patterns anchored at end.
This strategy is used for patterns like "abc$" or "pattern.*suffix$" where the pattern must match at the end of the haystack. Instead of trying to match from every position in the haystack (O(n) attempts), we search backward from the end of the haystack (O(1) attempt).
Algorithm:
- Build reverse NFA from forward NFA
- Build reverse DFA from reverse NFA
- Search backward from end of haystack using reverse DFA
- If match found, convert reverse positions to forward positions
Performance:
- Forward search (naive): O(n*m) where n=haystack length, m=pattern length
- Reverse search: O(m) - only one search attempt from the end
- Speedup: ~n/m (e.g., 1000x for 1MB haystack and 1KB pattern)
Example:
// Pattern "Easy1$" on 1MB data // Forward: 340 seconds (tries match at every position) // Reverse: ~1 millisecond (one match attempt from end)
func NewReverseAnchoredSearcher ¶ added in v0.4.0
func NewReverseAnchoredSearcher(forwardNFA *nfa.NFA, config lazy.Config) (*ReverseAnchoredSearcher, error)
NewReverseAnchoredSearcher creates a reverse searcher from forward NFA.
Parameters:
- forwardNFA: the compiled forward NFA
- config: DFA configuration for reverse DFA cache
Returns nil if reverse DFA cannot be built (falls back to forward search).
func (*ReverseAnchoredSearcher) Find ¶ added in v0.4.0
func (s *ReverseAnchoredSearcher) Find(haystack []byte) *Match
Find searches backward from end of haystack and returns the match.
Algorithm:
- Quick check with reverse DFA (zero-allocation backward scan)
- If DFA confirms match, reverse bytes for PikeVM to get exact bounds
- Convert reverse positions back to forward positions
For a pattern ending with $, the reverse NFA is anchored at start (^). PikeVM on reverse NFA requires reversed bytes to find exact match bounds.
Example:
Forward pattern: "abc$" Forward haystack: "xxxabc" Reverse haystack: "cbaxxx" Reverse pattern: "^cba" Match in reverse: [0:3] = "cba" Convert to forward: [3:6] = "abc"
func (*ReverseAnchoredSearcher) IsMatch ¶ added in v0.4.0
func (s *ReverseAnchoredSearcher) IsMatch(haystack []byte) bool
IsMatch checks if the pattern matches at the end of haystack.
This is optimized for boolean matching:
- Uses reverse DFA for fast rejection
- ZERO-ALLOCATION: backward scan without byte reversal
- No Match object allocation
- Early termination
type ReverseInnerSearcher ¶ added in v0.8.0
type ReverseInnerSearcher struct {
// contains filtered or unexported fields
}
ReverseInnerSearcher performs inner literal prefilter + bidirectional DFA search.
This strategy is used for patterns with inner literals like `prefix.*inner.*suffix` where:
- The pattern has a good inner literal for prefiltering
- Has wildcards/repetitions both BEFORE and AFTER the inner literal
- Can use bidirectional DFA to find exact match bounds
Algorithm:
- Extract inner literals from pattern
- Build prefilter for inner literals
- Search algorithm: a. Prefilter finds inner literal candidates in haystack b. For each candidate at position P: - Reverse DFA scans backward from P to find match START - Forward DFA scans forward from P+innerLen to find match END c. Return leftmost-longest match
Performance:
- Forward search: O(n*m) where n=haystack length, m=pattern length
- ReverseInner: O(k*(m1+m2)) where k=number of inner candidates, m1=prefix length, m2=suffix length
- Speedup: 10-100x for patterns like `ERROR.*connection.*timeout` on large haystacks
Example:
// Pattern `ERROR.*connection.*timeout` on 1MB log with 5 "connection" occurrences // Forward: tries pattern match at every position (~1M attempts) // ReverseInner: prefilter finds 5 "connection" positions, bidirectional DFA verifies (~5 attempts) // Speedup: ~200,000x
func NewReverseInnerSearcher ¶ added in v0.8.0
func NewReverseInnerSearcher( fullNFA *nfa.NFA, innerInfo *literal.InnerLiteralInfo, config lazy.Config, ) (*ReverseInnerSearcher, error)
NewReverseInnerSearcher creates a reverse inner searcher using AST splitting.
The key optimization (from rust-regex):
- Build reverse NFA from PREFIX AST only (not full pattern)
- Build forward NFA from SUFFIX AST only (not full pattern)
- This enables true bidirectional search with 10-100x speedup
Requirements:
- InnerLiteralInfo must have PrefixAST and SuffixAST populated
- Inner literal must have wildcards both before and after
- Prefilter must be available
Parameters:
- fullNFA: the compiled NFA for the full pattern (for fallback)
- innerInfo: extracted inner literal info with split AST
- config: DFA configuration for reverse/forward DFA cache
Returns error if reverse inner optimization cannot be applied:
- ErrNoInnerLiterals: no inner literals available
- ErrNoPrefilter: prefilter could not be built
- DFA compilation errors
func (*ReverseInnerSearcher) Find ¶ added in v0.8.0
func (s *ReverseInnerSearcher) Find(haystack []byte) *Match
Find searches using inner literal prefilter + bidirectional DFA and returns the match.
Algorithm (leftmost-longest/greedy semantics):
- Use prefilter to find ALL inner literal candidates
- For each candidate at position P: a. Reverse DFA scans backward from P to find match START b. Forward DFA scans forward from P+innerLen to find match END
- Track leftmost-longest match: - Leftmost: earliest start position - Longest: if same start, choose longest end
- Return the best match found
Performance:
- ZERO PikeVM calls - uses DFA exclusively
- Bidirectional DFA scan finds both match start and end efficiently
- Prefilter reduces search space dramatically
Example (bidirectional matching):
Pattern: `ERROR.*connection.*timeout` Haystack: "ERROR: connection lost due to connection timeout" Inner literal: "connection" 1. Prefilter finds "connection" at position 7, then at position 32 2. Candidate 1 (pos=7): a. Reverse DFA from pos=7 backward → finds start=0 (ERROR) b. Forward DFA from pos=17 forward → finds end=24 (lost, no timeout) c. No valid match (pattern requires "timeout" after) 3. Candidate 2 (pos=32): a. Reverse DFA from pos=32 backward → finds start=0 (ERROR) b. Forward DFA from pos=42 forward → finds end=49 (timeout) c. Valid match [0:49] 4. Return [0:49] = "ERROR: connection lost due to connection timeout"
func (*ReverseInnerSearcher) IsMatch ¶ added in v0.8.0
func (s *ReverseInnerSearcher) IsMatch(haystack []byte) bool
IsMatch checks if the pattern matches using inner prefilter + bidirectional DFA.
This is optimized for boolean matching:
- Uses prefilter for fast candidate finding
- Uses bidirectional DFA for fast verification
- No Match object allocation
- Early termination on first match
- ZERO PikeVM calls - DFA confirmation is sufficient
Algorithm:
- Prefilter finds inner literal candidates
- For each candidate: a. Reverse DFA checks if we can reach inner from start b. Forward DFA checks if we can reach end from inner
- Return true on first valid match
type ReverseSuffixSearcher ¶ added in v0.6.0
type ReverseSuffixSearcher struct {
// contains filtered or unexported fields
}
ReverseSuffixSearcher performs suffix literal prefilter + reverse DFA search.
This strategy is used for patterns with literal suffixes like `.*\.txt` where:
- The pattern is NOT anchored at start (^)
- Has a good suffix literal for prefiltering
- Can use reverse DFA to verify the prefix pattern
Algorithm:
- Extract suffix literals from pattern
- Build prefilter for suffix literals
- Search algorithm: a. Prefilter finds suffix candidates in haystack b. For each candidate: - Build reverse search from haystack start to suffix end - Use reverse DFA to verify prefix pattern - If match, use forward DFA to find full match end c. Return first match
Performance:
- Forward naive search: O(n*m) where n=haystack length, m=pattern length
- ReverseSuffix: O(k*m) where k=number of suffix candidates (usually k << n)
- Speedup: 10-100x for patterns like `.*\.txt` on large haystacks
Example:
// Pattern `.*\.txt` on 1MB data with 10 `.txt` occurrences // Forward: tries pattern match at every position (~1M attempts) // ReverseSuffix: prefilter finds 10 `.txt` positions, reverse DFA verifies (~10 attempts) // Speedup: ~100,000x
func NewReverseSuffixSearcher ¶ added in v0.6.0
func NewReverseSuffixSearcher( forwardNFA *nfa.NFA, suffixLiterals *literal.Seq, config lazy.Config, ) (*ReverseSuffixSearcher, error)
NewReverseSuffixSearcher creates a reverse suffix searcher from forward NFA.
Requirements:
- Pattern must have good suffix literals
- Pattern must NOT be start-anchored (^)
- Prefilter must be available
Parameters:
- forwardNFA: the compiled forward NFA
- suffixLiterals: extracted suffix literals from pattern
- config: DFA configuration for reverse DFA cache
Returns nil if reverse suffix optimization cannot be applied.
func (*ReverseSuffixSearcher) Find ¶ added in v0.6.0
func (s *ReverseSuffixSearcher) Find(haystack []byte) *Match
Find searches using suffix literal prefilter + reverse DFA and returns the match.
Algorithm (leftmost-longest/greedy semantics):
- Use prefilter to find ALL suffix literal candidates
- For the FIRST candidate that produces a valid match: a. Use reverse DFA SearchReverse to find match START (leftmost) b. Record this as the leftmost match start
- Continue scanning for more candidates with the SAME match start a. Keep track of the longest match end (greedy)
- Return the match with leftmost start and longest end
Performance:
- ZERO PikeVM calls - uses DFA exclusively
- Single reverse DFA scan finds both match validity AND start position
- Multiple candidates scanned for greedy semantics
Example (greedy matching):
Pattern: `.*\.txt` Haystack: "a.txt.txt" Suffix literal: `.txt` 1. Prefilter finds `.txt` at position 1, then at position 5 2. Candidate 1 (pos=1): SearchReverse returns 0, match = [0:5] 3. Candidate 2 (pos=5): SearchReverse returns 0, match = [0:9] (longer!) 4. Return [0:9] = "a.txt.txt" (greedy)
func (*ReverseSuffixSearcher) IsMatch ¶ added in v0.6.0
func (s *ReverseSuffixSearcher) IsMatch(haystack []byte) bool
IsMatch checks if the pattern matches using suffix prefilter + reverse DFA.
This is optimized for boolean matching:
- Uses prefilter for fast candidate finding
- Uses reverse DFA for fast prefix verification
- No Match object allocation
- Early termination on first match
- ZERO PikeVM calls - reverse DFA confirmation is sufficient
type Stats ¶
type Stats struct {
// NFASearches counts NFA (PikeVM) searches
NFASearches uint64
// DFASearches counts DFA searches
DFASearches uint64
// OnePassSearches counts OnePass DFA searches (for FindSubmatch)
OnePassSearches uint64
// PrefilterHits counts successful prefilter matches
PrefilterHits uint64
// PrefilterMisses counts prefilter candidates that didn't match
PrefilterMisses uint64
// DFACacheFull counts times DFA fell back to NFA due to cache full
DFACacheFull uint64
}
Stats tracks execution statistics for performance analysis.
type Strategy ¶
type Strategy int
Strategy represents the execution strategy for regex matching.
The meta-engine chooses between:
- UseNFA: use PikeVM exclusively (simple patterns, no cache needed)
- UseDFA: use Lazy DFA with NFA fallback (complex patterns, good literals)
- UseBoth: adaptive strategy (try DFA first, fallback to NFA on cache full)
Strategy selection is automatic based on pattern analysis.
const ( // UseNFA uses only the NFA (PikeVM) engine. // Selected for: // - Very small NFAs (< 20 states) where DFA overhead isn't worth it // - Patterns without literals where DFA has no advantage // - When EnableDFA is false in config UseNFA Strategy = iota // UseDFA uses Lazy DFA with NFA fallback on cache overflow. // Selected for: // - Large NFAs (> 100 states) where DFA is essential // - Patterns with good literals (prefilter + DFA is fastest) // - Simple patterns (no alternations) where DFA doesn't blow up UseDFA // UseBoth uses adaptive strategy: try DFA, fallback to NFA if cache fills. // Selected for: // - Medium-sized NFAs (20-100 states) // - Patterns with some literals but complex structure // - Default when pattern characteristics are unclear UseBoth // UseReverseAnchored uses reverse DFA search for patterns anchored at end. // Selected for: // - Patterns with $ or \z anchor (end of text) // - NOT also anchored at start (^) // - Searches backward from end of haystack // - Converts O(n*m) to O(m) for end-anchored patterns UseReverseAnchored // UseReverseSuffix uses suffix literal prefilter + reverse DFA search. // Selected for: // - Patterns with literal suffix (e.g., `.*\.txt`) // - NOT start-anchored (^) // - Has good suffix literal for prefiltering // - Speedup: 10-100x for patterns like `.*\.txt` UseReverseSuffix // UseOnePass uses one-pass DFA for anchored patterns with capture groups. // Selected for: // - Pattern is always anchored (^ or implicit anchor) // - Pattern is "one-pass" (no ambiguity in matching paths) // - Pattern has capture groups (otherwise lazy DFA is faster) // - Speedup: 10-20x over PikeVM for capture group extraction // - Only used for FindSubmatch, not Find UseOnePass // UseReverseInner uses inner literal prefilter + bidirectional DFA search. // Selected for: // - Patterns with inner literal (e.g., `prefix.*inner.*suffix`) // - NOT start-anchored (^) or end-anchored ($) // - Has good inner literal for prefiltering // - NO good prefix or suffix literals (otherwise prefer UseDFA/UseReverseSuffix) // - Has wildcards both before AND after inner literal // - Speedup: 10-100x for patterns like `ERROR.*connection.*timeout` UseReverseInner )
func SelectStrategy ¶
SelectStrategy analyzes the NFA and literals to choose the best execution strategy.
Algorithm:
- If end-anchored ($ or \z) and not start-anchored → UseReverseAnchored
- If DFA disabled in config → UseNFA
- If NFA is tiny (< 20 states) → UseNFA (DFA overhead not worth it)
- If good literals exist → UseDFA (prefilter + DFA is fastest)
- If NFA is large (> 100 states) → UseDFA (essential for performance)
- Otherwise → UseBoth (adaptive)
"Good literals" means:
- At least one literal exists
- Longest common prefix (LCP) length >= MinLiteralLen
- This enables effective prefiltering
Parameters:
- n: the compiled NFA to analyze
- re: the parsed regexp (for anchor detection, can be nil)
- literals: extracted prefix literals (can be nil)
- config: meta-engine configuration
Example:
strategy := meta.SelectStrategy(nfa, re, literals, config)
switch strategy {
case meta.UseNFA:
// Use PikeVM only
case meta.UseDFA:
// Use Lazy DFA
case meta.UseReverseAnchored:
// Use reverse search
case meta.UseBoth:
// Adaptive
}