meta

package
v0.8.3 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Dec 4, 2025 License: MIT Imports: 7 Imported by: 0

Documentation

Overview

Package meta implements the meta-engine orchestrator that automatically selects the optimal regex execution strategy.

The meta-engine coordinates three engines:

  • Prefilter: fast literal-based candidate finding (optional)
  • Lazy DFA: deterministic finite automaton with on-demand state construction
  • NFA (PikeVM): nondeterministic fallback for complex patterns or DFA cache overflow

Strategy selection is based on:

  • Pattern complexity (NFA size, literal quality)
  • Prefilter availability (good literals enable fast filtering)
  • DFA suitability (patterns without alternations benefit most)

The meta-engine provides the public API for pattern compilation and matching, hiding the complexity of multi-engine coordination from users.

Index

Constants

This section is empty.

Variables

View Source
var ErrNoInnerLiterals = errors.New("no inner literals available for ReverseInner strategy")

ErrNoInnerLiterals indicates that no inner literals could be extracted for ReverseInner strategy. This is not a fatal error - it just means ReverseInner optimization cannot be used.

View Source
var ErrNoPrefilter = errors.New("no prefilter available for suffix literals")

ErrNoPrefilter indicates that no prefilter could be built for suffix literals. This is not a fatal error - it just means ReverseSuffix optimization cannot be used.

Functions

func StrategyReason

func StrategyReason(strategy Strategy, n *nfa.NFA, literals *literal.Seq, config Config) string

StrategyReason provides a human-readable explanation for strategy selection.

This is useful for debugging and performance tuning.

Example:

strategy := meta.SelectStrategy(nfa, literals, config)
reason := meta.StrategyReason(strategy, nfa, literals, config)
log.Printf("Using %s: %s", strategy, reason)

Types

type CompileError

type CompileError struct {
	Pattern string
	Err     error
}

CompileError represents a pattern compilation error.

func (*CompileError) Error

func (e *CompileError) Error() string

Error implements the error interface.

func (*CompileError) Unwrap

func (e *CompileError) Unwrap() error

Unwrap returns the underlying error.

type Config

type Config struct {
	// EnableDFA enables the Lazy DFA engine.
	// When false, only NFA (PikeVM) is used.
	// Default: true
	EnableDFA bool

	// EnablePrefilter enables literal-based prefiltering.
	// When false, no prefilter is used even if literals are available.
	// Default: true
	EnablePrefilter bool

	// MaxDFAStates sets the maximum number of DFA states to cache.
	// Larger values use more memory but reduce NFA fallback frequency.
	// Default: 10000
	MaxDFAStates uint32

	// DeterminizationLimit caps the number of NFA states per DFA state.
	// This prevents exponential blowup in patterns like (a*)*b.
	// Default: 1000
	DeterminizationLimit int

	// MinLiteralLen is the minimum length for prefilter literals.
	// Shorter literals may have too many false positives.
	// Default: 2
	MinLiteralLen int

	// MaxLiterals limits the number of literals to extract for prefiltering.
	// Default: 64
	MaxLiterals int

	// MaxRecursionDepth limits recursion during NFA compilation.
	// Default: 100
	MaxRecursionDepth int
}

Config controls meta-engine behavior and performance characteristics.

Configuration options affect:

  • Strategy selection (which engine to use)
  • Cache sizes (DFA state cache)
  • Limits (determinization, recursion)
  • Prefilter enablement

Example:

config := meta.DefaultConfig()
config.EnableDFA = false // Force NFA-only execution
engine := meta.NewEngine(nfa, config)

func DefaultConfig

func DefaultConfig() Config

DefaultConfig returns a configuration with sensible defaults.

Defaults are tuned for typical regex patterns with a balance between performance and memory usage:

  • DFA enabled with 10K state cache (moderate memory usage)
  • Prefilter enabled (5-50x speedup on patterns with literals)
  • Conservative determinization limit (prevents exponential blowup)
  • Reasonable recursion depth (handles nested patterns)

Example:

config := meta.DefaultConfig()
// Use as-is or customize specific options
config.MaxDFAStates = 50000 // Increase cache for better hit rate

func (Config) Validate

func (c Config) Validate() error

Validate checks if the configuration is valid. Returns an error if any parameter is out of range.

Valid ranges:

  • MaxDFAStates: 1 to 1,000,000
  • DeterminizationLimit: 10 to 100,000
  • MinLiteralLen: 1 to 64
  • MaxLiterals: 1 to 1,000
  • MaxRecursionDepth: 10 to 1,000

Example:

config := meta.Config{MaxDFAStates: 0} // Invalid!
if err := config.Validate(); err != nil {
    log.Fatal(err)
}

type ConfigError

type ConfigError struct {
	Field   string
	Message string
}

ConfigError represents an invalid configuration parameter.

func (*ConfigError) Error

func (e *ConfigError) Error() string

Error implements the error interface.

type Engine

type Engine struct {
	// contains filtered or unexported fields
}

Engine is the meta-engine that orchestrates all regex execution strategies.

The Engine:

  1. Analyzes the pattern and extracts literals
  2. Selects the optimal strategy (NFA, DFA, or both)
  3. Builds prefilter (if literals available)
  4. Coordinates search across engines

Thread safety: Not thread-safe. Each goroutine should use its own Engine instance. The underlying NFA is immutable and can be shared, but Engine state is mutable.

Example:

// Compile pattern
engine, err := meta.Compile("(foo|bar)\\d+")
if err != nil {
    return err
}

// Search
haystack := []byte("test foo123 end")
match := engine.Find(haystack)
if match != nil {
    println(match.String()) // "foo123"
}

func Compile

func Compile(pattern string) (*Engine, error)

Compile compiles a regex pattern string into an executable Engine.

Steps:

  1. Parse pattern using regexp/syntax
  2. Compile to NFA
  3. Extract literals (prefixes, suffixes)
  4. Build prefilter (if good literals exist)
  5. Select strategy
  6. Build DFA (if strategy requires it)

Returns an error if:

  • Pattern syntax is invalid
  • Pattern is too complex (recursion limit exceeded)
  • Configuration is invalid

Example:

engine, err := meta.Compile("hello.*world")
if err != nil {
    log.Fatal(err)
}

func CompileRegexp

func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error)

CompileRegexp compiles a parsed syntax.Regexp with default configuration.

This is useful when you already have a parsed regexp from another source.

Example:

re, _ := syntax.Parse("hello", syntax.Perl)
engine, err := meta.CompileRegexp(re, meta.DefaultConfig())

func CompileWithConfig

func CompileWithConfig(pattern string, config Config) (*Engine, error)

CompileWithConfig compiles a pattern with custom configuration.

Example:

config := meta.DefaultConfig()
config.MaxDFAStates = 50000 // Increase cache
engine, err := meta.CompileWithConfig("(a|b|c)*", config)

func (*Engine) Count added in v0.4.0

func (e *Engine) Count(haystack []byte, n int) int

Count returns the number of non-overlapping matches in the haystack.

This is optimized for counting without allocating result slices. Uses early termination for boolean checks at each step. If n > 0, counts at most n matches. If n <= 0, counts all matches.

Example:

engine, _ := meta.Compile(`\d+`)
count := engine.Count([]byte("1 2 3 4 5"), -1)
// count == 5

func (*Engine) Find

func (e *Engine) Find(haystack []byte) *Match

Find returns the first match in the haystack, or nil if no match.

The search algorithm depends on the selected strategy:

UseNFA:   PikeVM search directly
UseDFA:   Prefilter (if available) → DFA → NFA fallback
UseBoth:  Try DFA, fallback to NFA on cache full

Example:

engine, _ := meta.Compile("hello")
match := engine.Find([]byte("say hello world"))
if match != nil {
    println(match.String()) // "hello"
}

func (*Engine) FindAllSubmatch added in v0.4.0

func (e *Engine) FindAllSubmatch(haystack []byte, n int) []*MatchWithCaptures

FindAllSubmatch returns all successive matches with capture group information. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

engine, _ := meta.Compile(`(\w+)@(\w+)\.(\w+)`)
matches := engine.FindAllSubmatch([]byte("[email protected] [email protected]"), -1)
// len(matches) == 2

func (*Engine) FindSubmatch added in v0.2.0

func (e *Engine) FindSubmatch(haystack []byte) *MatchWithCaptures

FindSubmatch returns the first match with capture group information. Returns nil if no match is found.

Group 0 is always the entire match. Groups 1+ are explicit capture groups. Unmatched optional groups will have nil values.

When a one-pass DFA is available (for anchored patterns), this method is 10-20x faster than PikeVM for capture group extraction.

Example:

engine, _ := meta.Compile(`(\w+)@(\w+)\.(\w+)`)
match := engine.FindSubmatch([]byte("[email protected]"))
if match != nil {
    fmt.Println(match.Group(0)) // "[email protected]"
    fmt.Println(match.Group(1)) // "user"
    fmt.Println(match.Group(2)) // "example"
    fmt.Println(match.Group(3)) // "com"
}

func (*Engine) IsMatch

func (e *Engine) IsMatch(haystack []byte) bool

IsMatch returns true if the pattern matches anywhere in the haystack.

This is optimized for boolean matching:

  • Uses early termination (returns immediately on first match)
  • Avoids Match object creation
  • Uses DFA.IsMatch when available (2-10x faster than Find)

Example:

engine, _ := meta.Compile("hello")
if engine.IsMatch([]byte("say hello world")) {
    println("matches!")
}

func (*Engine) NumCaptures added in v0.2.0

func (e *Engine) NumCaptures() int

NumCaptures returns the number of capture groups in the pattern. Group 0 is the entire match, groups 1+ are explicit captures.

func (*Engine) ResetStats

func (e *Engine) ResetStats()

ResetStats resets execution statistics to zero.

func (*Engine) Stats

func (e *Engine) Stats() Stats

Stats returns execution statistics.

Useful for performance analysis and debugging.

Example:

stats := engine.Stats()
println("NFA searches:", stats.NFASearches)
println("DFA searches:", stats.DFASearches)

func (*Engine) Strategy

func (e *Engine) Strategy() Strategy

Strategy returns the execution strategy selected for this engine.

Example:

strategy := engine.Strategy()
println(strategy.String()) // "UseDFA"

func (*Engine) SubexpNames added in v0.5.0

func (e *Engine) SubexpNames() []string

SubexpNames returns the names of capture groups in the pattern. Index 0 is always "" (entire match). Named groups return their names, unnamed groups return "". This matches stdlib regexp.Regexp.SubexpNames() behavior.

type Match

type Match struct {
	// contains filtered or unexported fields
}

Match represents a successful regex match with position information.

A Match contains:

  • Start position (inclusive)
  • End position (exclusive)
  • Reference to the original haystack

Note: This is a simple match without capture group support. Capture groups will be added in a future version.

Example:

match := &Match{start: 5, end: 11, haystack: []byte("test foo123 end")}
println(match.String()) // "foo123"
println(match.Start(), match.End()) // 5, 11

func NewMatch

func NewMatch(start, end int, haystack []byte) *Match

NewMatch creates a new Match from start and end positions.

Parameters:

  • start: inclusive start position in haystack
  • end: exclusive end position in haystack
  • haystack: the original byte buffer that was searched

The haystack is stored by reference (not copied) for efficiency. Callers must ensure the haystack remains valid for the lifetime of the Match.

Example:

haystack := []byte("hello world")
match := meta.NewMatch(0, 5, haystack) // "hello"

func (*Match) Bytes

func (m *Match) Bytes() []byte

Bytes returns the matched bytes as a slice.

The returned slice is a view into the original haystack (not a copy). Callers should copy the bytes if they need to retain them after the haystack is modified or deallocated.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(string(match.Bytes())) // "foo123"

func (*Match) Contains

func (m *Match) Contains(pos int) bool

Contains returns true if the given position is within the match range.

Parameters:

  • pos: position to check (must be >= 0)

Returns true if start <= pos < end.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Contains(7))  // true
println(match.Contains(11)) // false (exclusive end)

func (*Match) End

func (m *Match) End() int

End returns the exclusive end position of the match.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.End()) // 11

func (*Match) IsEmpty

func (m *Match) IsEmpty() bool

IsEmpty returns true if the match has zero length.

Empty matches can occur with patterns like "" or "(?:)" that match without consuming input.

Example:

match := meta.NewMatch(5, 5, []byte("test"))
println(match.IsEmpty()) // true

func (*Match) Len

func (m *Match) Len() int

Len returns the length of the match in bytes.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Len()) // 6 (11 - 5)

func (*Match) Start

func (m *Match) Start() int

Start returns the inclusive start position of the match.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Start()) // 5

func (*Match) String

func (m *Match) String() string

String returns the matched text as a string.

This allocates a new string by copying the matched bytes. For zero-allocation access, use Bytes() instead.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.String()) // "foo123"

type MatchWithCaptures added in v0.2.0

type MatchWithCaptures struct {
	// contains filtered or unexported fields
}

MatchWithCaptures represents a match with capture group information. Group 0 is always the entire match, groups 1+ are explicit capture groups.

func NewMatchWithCaptures added in v0.2.0

func NewMatchWithCaptures(haystack []byte, captures [][]int) *MatchWithCaptures

NewMatchWithCaptures creates a MatchWithCaptures from capture positions.

func (*MatchWithCaptures) AllGroupStrings added in v0.2.0

func (m *MatchWithCaptures) AllGroupStrings() []string

AllGroupStrings returns all captured groups as strings.

func (*MatchWithCaptures) AllGroups added in v0.2.0

func (m *MatchWithCaptures) AllGroups() [][]byte

AllGroups returns all captured groups as byte slices. Result[0] is the entire match, result[i] may be nil if group i was not captured.

func (*MatchWithCaptures) Bytes added in v0.2.0

func (m *MatchWithCaptures) Bytes() []byte

Bytes returns the matched bytes for group 0 (entire match).

func (*MatchWithCaptures) End added in v0.2.0

func (m *MatchWithCaptures) End() int

End returns the end position of the entire match (group 0).

func (*MatchWithCaptures) Group added in v0.2.0

func (m *MatchWithCaptures) Group(index int) []byte

Group returns the captured text for the specified group. Group 0 is the entire match. Returns nil if group was not captured.

func (*MatchWithCaptures) GroupIndex added in v0.2.0

func (m *MatchWithCaptures) GroupIndex(index int) []int

GroupIndex returns [start, end] positions for the specified group. Returns nil if group was not captured.

func (*MatchWithCaptures) GroupString added in v0.2.0

func (m *MatchWithCaptures) GroupString(index int) string

GroupString returns the captured text as string for the specified group.

func (*MatchWithCaptures) NumCaptures added in v0.2.0

func (m *MatchWithCaptures) NumCaptures() int

NumCaptures returns the number of capture groups (including group 0).

func (*MatchWithCaptures) Start added in v0.2.0

func (m *MatchWithCaptures) Start() int

Start returns the start position of the entire match (group 0).

func (*MatchWithCaptures) String added in v0.2.0

func (m *MatchWithCaptures) String() string

String returns the matched text for group 0 (entire match).

type ReverseAnchoredSearcher added in v0.4.0

type ReverseAnchoredSearcher struct {
	// contains filtered or unexported fields
}

ReverseAnchoredSearcher performs reverse search for patterns anchored at end.

This strategy is used for patterns like "abc$" or "pattern.*suffix$" where the pattern must match at the end of the haystack. Instead of trying to match from every position in the haystack (O(n) attempts), we search backward from the end of the haystack (O(1) attempt).

Algorithm:

  1. Build reverse NFA from forward NFA
  2. Build reverse DFA from reverse NFA
  3. Search backward from end of haystack using reverse DFA
  4. If match found, convert reverse positions to forward positions

Performance:

  • Forward search (naive): O(n*m) where n=haystack length, m=pattern length
  • Reverse search: O(m) - only one search attempt from the end
  • Speedup: ~n/m (e.g., 1000x for 1MB haystack and 1KB pattern)

Example:

// Pattern "Easy1$" on 1MB data
// Forward: 340 seconds (tries match at every position)
// Reverse: ~1 millisecond (one match attempt from end)

func NewReverseAnchoredSearcher added in v0.4.0

func NewReverseAnchoredSearcher(forwardNFA *nfa.NFA, config lazy.Config) (*ReverseAnchoredSearcher, error)

NewReverseAnchoredSearcher creates a reverse searcher from forward NFA.

Parameters:

  • forwardNFA: the compiled forward NFA
  • config: DFA configuration for reverse DFA cache

Returns nil if reverse DFA cannot be built (falls back to forward search).

func (*ReverseAnchoredSearcher) Find added in v0.4.0

func (s *ReverseAnchoredSearcher) Find(haystack []byte) *Match

Find searches backward from end of haystack and returns the match.

Algorithm:

  1. Quick check with reverse DFA (zero-allocation backward scan)
  2. If DFA confirms match, reverse bytes for PikeVM to get exact bounds
  3. Convert reverse positions back to forward positions

For a pattern ending with $, the reverse NFA is anchored at start (^). PikeVM on reverse NFA requires reversed bytes to find exact match bounds.

Example:

Forward pattern: "abc$"
Forward haystack: "xxxabc"
Reverse haystack: "cbaxxx"
Reverse pattern: "^cba"
Match in reverse: [0:3] = "cba"
Convert to forward: [3:6] = "abc"

func (*ReverseAnchoredSearcher) IsMatch added in v0.4.0

func (s *ReverseAnchoredSearcher) IsMatch(haystack []byte) bool

IsMatch checks if the pattern matches at the end of haystack.

This is optimized for boolean matching:

  • Uses reverse DFA for fast rejection
  • ZERO-ALLOCATION: backward scan without byte reversal
  • No Match object allocation
  • Early termination

type ReverseInnerSearcher added in v0.8.0

type ReverseInnerSearcher struct {
	// contains filtered or unexported fields
}

ReverseInnerSearcher performs inner literal prefilter + bidirectional DFA search.

This strategy is used for patterns with inner literals like `prefix.*inner.*suffix` where:

  • The pattern has a good inner literal for prefiltering
  • Has wildcards/repetitions both BEFORE and AFTER the inner literal
  • Can use bidirectional DFA to find exact match bounds

Algorithm:

  1. Extract inner literals from pattern
  2. Build prefilter for inner literals
  3. Search algorithm: a. Prefilter finds inner literal candidates in haystack b. For each candidate at position P: - Reverse DFA scans backward from P to find match START - Forward DFA scans forward from P+innerLen to find match END c. Return leftmost-longest match

Performance:

  • Forward search: O(n*m) where n=haystack length, m=pattern length
  • ReverseInner: O(k*(m1+m2)) where k=number of inner candidates, m1=prefix length, m2=suffix length
  • Speedup: 10-100x for patterns like `ERROR.*connection.*timeout` on large haystacks

Example:

// Pattern `ERROR.*connection.*timeout` on 1MB log with 5 "connection" occurrences
// Forward: tries pattern match at every position (~1M attempts)
// ReverseInner: prefilter finds 5 "connection" positions, bidirectional DFA verifies (~5 attempts)
// Speedup: ~200,000x

func NewReverseInnerSearcher added in v0.8.0

func NewReverseInnerSearcher(
	fullNFA *nfa.NFA,
	innerInfo *literal.InnerLiteralInfo,
	config lazy.Config,
) (*ReverseInnerSearcher, error)

NewReverseInnerSearcher creates a reverse inner searcher using AST splitting.

The key optimization (from rust-regex):

  • Build reverse NFA from PREFIX AST only (not full pattern)
  • Build forward NFA from SUFFIX AST only (not full pattern)
  • This enables true bidirectional search with 10-100x speedup

Requirements:

  • InnerLiteralInfo must have PrefixAST and SuffixAST populated
  • Inner literal must have wildcards both before and after
  • Prefilter must be available

Parameters:

  • fullNFA: the compiled NFA for the full pattern (for fallback)
  • innerInfo: extracted inner literal info with split AST
  • config: DFA configuration for reverse/forward DFA cache

Returns error if reverse inner optimization cannot be applied:

  • ErrNoInnerLiterals: no inner literals available
  • ErrNoPrefilter: prefilter could not be built
  • DFA compilation errors

func (*ReverseInnerSearcher) Find added in v0.8.0

func (s *ReverseInnerSearcher) Find(haystack []byte) *Match

Find searches using inner literal prefilter + bidirectional DFA and returns the match.

Algorithm (leftmost-longest/greedy semantics):

  1. Use prefilter to find ALL inner literal candidates
  2. For each candidate at position P: a. Reverse DFA scans backward from P to find match START b. Forward DFA scans forward from P+innerLen to find match END
  3. Track leftmost-longest match: - Leftmost: earliest start position - Longest: if same start, choose longest end
  4. Return the best match found

Performance:

  • ZERO PikeVM calls - uses DFA exclusively
  • Bidirectional DFA scan finds both match start and end efficiently
  • Prefilter reduces search space dramatically

Example (bidirectional matching):

Pattern: `ERROR.*connection.*timeout`
Haystack: "ERROR: connection lost due to connection timeout"
Inner literal: "connection"

1. Prefilter finds "connection" at position 7, then at position 32
2. Candidate 1 (pos=7):
   a. Reverse DFA from pos=7 backward → finds start=0 (ERROR)
   b. Forward DFA from pos=17 forward → finds end=24 (lost, no timeout)
   c. No valid match (pattern requires "timeout" after)
3. Candidate 2 (pos=32):
   a. Reverse DFA from pos=32 backward → finds start=0 (ERROR)
   b. Forward DFA from pos=42 forward → finds end=49 (timeout)
   c. Valid match [0:49]
4. Return [0:49] = "ERROR: connection lost due to connection timeout"

func (*ReverseInnerSearcher) IsMatch added in v0.8.0

func (s *ReverseInnerSearcher) IsMatch(haystack []byte) bool

IsMatch checks if the pattern matches using inner prefilter + bidirectional DFA.

This is optimized for boolean matching:

  • Uses prefilter for fast candidate finding
  • Uses bidirectional DFA for fast verification
  • No Match object allocation
  • Early termination on first match
  • ZERO PikeVM calls - DFA confirmation is sufficient

Algorithm:

  1. Prefilter finds inner literal candidates
  2. For each candidate: a. Reverse DFA checks if we can reach inner from start b. Forward DFA checks if we can reach end from inner
  3. Return true on first valid match

type ReverseSuffixSearcher added in v0.6.0

type ReverseSuffixSearcher struct {
	// contains filtered or unexported fields
}

ReverseSuffixSearcher performs suffix literal prefilter + reverse DFA search.

This strategy is used for patterns with literal suffixes like `.*\.txt` where:

  • The pattern is NOT anchored at start (^)
  • Has a good suffix literal for prefiltering
  • Can use reverse DFA to verify the prefix pattern

Algorithm:

  1. Extract suffix literals from pattern
  2. Build prefilter for suffix literals
  3. Search algorithm: a. Prefilter finds suffix candidates in haystack b. For each candidate: - Build reverse search from haystack start to suffix end - Use reverse DFA to verify prefix pattern - If match, use forward DFA to find full match end c. Return first match

Performance:

  • Forward naive search: O(n*m) where n=haystack length, m=pattern length
  • ReverseSuffix: O(k*m) where k=number of suffix candidates (usually k << n)
  • Speedup: 10-100x for patterns like `.*\.txt` on large haystacks

Example:

// Pattern `.*\.txt` on 1MB data with 10 `.txt` occurrences
// Forward: tries pattern match at every position (~1M attempts)
// ReverseSuffix: prefilter finds 10 `.txt` positions, reverse DFA verifies (~10 attempts)
// Speedup: ~100,000x

func NewReverseSuffixSearcher added in v0.6.0

func NewReverseSuffixSearcher(
	forwardNFA *nfa.NFA,
	suffixLiterals *literal.Seq,
	config lazy.Config,
) (*ReverseSuffixSearcher, error)

NewReverseSuffixSearcher creates a reverse suffix searcher from forward NFA.

Requirements:

  • Pattern must have good suffix literals
  • Pattern must NOT be start-anchored (^)
  • Prefilter must be available

Parameters:

  • forwardNFA: the compiled forward NFA
  • suffixLiterals: extracted suffix literals from pattern
  • config: DFA configuration for reverse DFA cache

Returns nil if reverse suffix optimization cannot be applied.

func (*ReverseSuffixSearcher) Find added in v0.6.0

func (s *ReverseSuffixSearcher) Find(haystack []byte) *Match

Find searches using suffix literal prefilter + reverse DFA and returns the match.

Algorithm (leftmost-longest/greedy semantics):

  1. Use prefilter to find ALL suffix literal candidates
  2. For the FIRST candidate that produces a valid match: a. Use reverse DFA SearchReverse to find match START (leftmost) b. Record this as the leftmost match start
  3. Continue scanning for more candidates with the SAME match start a. Keep track of the longest match end (greedy)
  4. Return the match with leftmost start and longest end

Performance:

  • ZERO PikeVM calls - uses DFA exclusively
  • Single reverse DFA scan finds both match validity AND start position
  • Multiple candidates scanned for greedy semantics

Example (greedy matching):

Pattern: `.*\.txt`
Haystack: "a.txt.txt"
Suffix literal: `.txt`

1. Prefilter finds `.txt` at position 1, then at position 5
2. Candidate 1 (pos=1): SearchReverse returns 0, match = [0:5]
3. Candidate 2 (pos=5): SearchReverse returns 0, match = [0:9] (longer!)
4. Return [0:9] = "a.txt.txt" (greedy)

func (*ReverseSuffixSearcher) IsMatch added in v0.6.0

func (s *ReverseSuffixSearcher) IsMatch(haystack []byte) bool

IsMatch checks if the pattern matches using suffix prefilter + reverse DFA.

This is optimized for boolean matching:

  • Uses prefilter for fast candidate finding
  • Uses reverse DFA for fast prefix verification
  • No Match object allocation
  • Early termination on first match
  • ZERO PikeVM calls - reverse DFA confirmation is sufficient

type Stats

type Stats struct {
	// NFASearches counts NFA (PikeVM) searches
	NFASearches uint64

	// DFASearches counts DFA searches
	DFASearches uint64

	// OnePassSearches counts OnePass DFA searches (for FindSubmatch)
	OnePassSearches uint64

	// PrefilterHits counts successful prefilter matches
	PrefilterHits uint64

	// PrefilterMisses counts prefilter candidates that didn't match
	PrefilterMisses uint64

	// DFACacheFull counts times DFA fell back to NFA due to cache full
	DFACacheFull uint64
}

Stats tracks execution statistics for performance analysis.

type Strategy

type Strategy int

Strategy represents the execution strategy for regex matching.

The meta-engine chooses between:

  • UseNFA: use PikeVM exclusively (simple patterns, no cache needed)
  • UseDFA: use Lazy DFA with NFA fallback (complex patterns, good literals)
  • UseBoth: adaptive strategy (try DFA first, fallback to NFA on cache full)

Strategy selection is automatic based on pattern analysis.

const (
	// UseNFA uses only the NFA (PikeVM) engine.
	// Selected for:
	//   - Very small NFAs (< 20 states) where DFA overhead isn't worth it
	//   - Patterns without literals where DFA has no advantage
	//   - When EnableDFA is false in config
	UseNFA Strategy = iota

	// UseDFA uses Lazy DFA with NFA fallback on cache overflow.
	// Selected for:
	//   - Large NFAs (> 100 states) where DFA is essential
	//   - Patterns with good literals (prefilter + DFA is fastest)
	//   - Simple patterns (no alternations) where DFA doesn't blow up
	UseDFA

	// UseBoth uses adaptive strategy: try DFA, fallback to NFA if cache fills.
	// Selected for:
	//   - Medium-sized NFAs (20-100 states)
	//   - Patterns with some literals but complex structure
	//   - Default when pattern characteristics are unclear
	UseBoth

	// UseReverseAnchored uses reverse DFA search for patterns anchored at end.
	// Selected for:
	//   - Patterns with $ or \z anchor (end of text)
	//   - NOT also anchored at start (^)
	//   - Searches backward from end of haystack
	//   - Converts O(n*m) to O(m) for end-anchored patterns
	UseReverseAnchored

	// UseReverseSuffix uses suffix literal prefilter + reverse DFA search.
	// Selected for:
	//   - Patterns with literal suffix (e.g., `.*\.txt`)
	//   - NOT start-anchored (^)
	//   - Has good suffix literal for prefiltering
	//   - Speedup: 10-100x for patterns like `.*\.txt`
	UseReverseSuffix

	// UseOnePass uses one-pass DFA for anchored patterns with capture groups.
	// Selected for:
	//   - Pattern is always anchored (^ or implicit anchor)
	//   - Pattern is "one-pass" (no ambiguity in matching paths)
	//   - Pattern has capture groups (otherwise lazy DFA is faster)
	//   - Speedup: 10-20x over PikeVM for capture group extraction
	//   - Only used for FindSubmatch, not Find
	UseOnePass

	// UseReverseInner uses inner literal prefilter + bidirectional DFA search.
	// Selected for:
	//   - Patterns with inner literal (e.g., `prefix.*inner.*suffix`)
	//   - NOT start-anchored (^) or end-anchored ($)
	//   - Has good inner literal for prefiltering
	//   - NO good prefix or suffix literals (otherwise prefer UseDFA/UseReverseSuffix)
	//   - Has wildcards both before AND after inner literal
	//   - Speedup: 10-100x for patterns like `ERROR.*connection.*timeout`
	UseReverseInner
)

func SelectStrategy

func SelectStrategy(n *nfa.NFA, re *syntax.Regexp, literals *literal.Seq, config Config) Strategy

SelectStrategy analyzes the NFA and literals to choose the best execution strategy.

Algorithm:

  1. If end-anchored ($ or \z) and not start-anchored → UseReverseAnchored
  2. If DFA disabled in config → UseNFA
  3. If NFA is tiny (< 20 states) → UseNFA (DFA overhead not worth it)
  4. If good literals exist → UseDFA (prefilter + DFA is fastest)
  5. If NFA is large (> 100 states) → UseDFA (essential for performance)
  6. Otherwise → UseBoth (adaptive)

"Good literals" means:

  • At least one literal exists
  • Longest common prefix (LCP) length >= MinLiteralLen
  • This enables effective prefiltering

Parameters:

  • n: the compiled NFA to analyze
  • re: the parsed regexp (for anchor detection, can be nil)
  • literals: extracted prefix literals (can be nil)
  • config: meta-engine configuration

Example:

strategy := meta.SelectStrategy(nfa, re, literals, config)
switch strategy {
case meta.UseNFA:
    // Use PikeVM only
case meta.UseDFA:
    // Use Lazy DFA
case meta.UseReverseAnchored:
    // Use reverse search
case meta.UseBoth:
    // Adaptive
}

func (Strategy) String

func (s Strategy) String() string

String returns a human-readable representation of the Strategy.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL