meta

package

v0.8.3 Latest Latest Go to latest Published: Dec 4, 2025 License: MIT Imports: 7 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/coregx/coregex

Links

Open Source Insights

Documentation ¶

Overview ¶

Package meta implements the meta-engine orchestrator that automatically selects the optimal regex execution strategy.

The meta-engine coordinates three engines:

Prefilter: fast literal-based candidate finding (optional)
Lazy DFA: deterministic finite automaton with on-demand state construction
NFA (PikeVM): nondeterministic fallback for complex patterns or DFA cache overflow

Strategy selection is based on:

Pattern complexity (NFA size, literal quality)
Prefilter availability (good literals enable fast filtering)
DFA suitability (patterns without alternations benefit most)

The meta-engine provides the public API for pattern compilation and matching, hiding the complexity of multi-engine coordination from users.

Index ¶

Variables
func StrategyReason(strategy Strategy, n *nfa.NFA, literals *literal.Seq, config Config) string
type CompileError
- func (e *CompileError) Error() string
- func (e *CompileError) Unwrap() error
type Config
- func DefaultConfig() Config
- func (c Config) Validate() error
type ConfigError
- func (e *ConfigError) Error() string
type Engine
- func Compile(pattern string) (*Engine, error)
- func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error)
- func CompileWithConfig(pattern string, config Config) (*Engine, error)
- func (e *Engine) Count(haystack []byte, n int) int
- func (e *Engine) Find(haystack []byte) *Match
- func (e *Engine) FindAllSubmatch(haystack []byte, n int) []*MatchWithCaptures
- func (e *Engine) FindSubmatch(haystack []byte) *MatchWithCaptures
- func (e *Engine) IsMatch(haystack []byte) bool
- func (e *Engine) NumCaptures() int
- func (e *Engine) ResetStats()
- func (e *Engine) Stats() Stats
- func (e *Engine) Strategy() Strategy
- func (e *Engine) SubexpNames() []string
type Match
- func NewMatch(start, end int, haystack []byte) *Match
- func (m *Match) Bytes() []byte
- func (m *Match) Contains(pos int) bool
- func (m *Match) End() int
- func (m *Match) IsEmpty() bool
- func (m *Match) Len() int
- func (m *Match) Start() int
- func (m *Match) String() string
type MatchWithCaptures
- func NewMatchWithCaptures(haystack []byte, captures [][]int) *MatchWithCaptures
- func (m *MatchWithCaptures) AllGroupStrings() []string
- func (m *MatchWithCaptures) AllGroups() [][]byte
- func (m *MatchWithCaptures) Bytes() []byte
- func (m *MatchWithCaptures) End() int
- func (m *MatchWithCaptures) Group(index int) []byte
- func (m *MatchWithCaptures) GroupIndex(index int) []int
- func (m *MatchWithCaptures) GroupString(index int) string
- func (m *MatchWithCaptures) NumCaptures() int
- func (m *MatchWithCaptures) Start() int
- func (m *MatchWithCaptures) String() string
type ReverseAnchoredSearcher
- func NewReverseAnchoredSearcher(forwardNFA *nfa.NFA, config lazy.Config) (*ReverseAnchoredSearcher, error)
- func (s *ReverseAnchoredSearcher) Find(haystack []byte) *Match
- func (s *ReverseAnchoredSearcher) IsMatch(haystack []byte) bool
type ReverseInnerSearcher
- func NewReverseInnerSearcher(fullNFA *nfa.NFA, innerInfo *literal.InnerLiteralInfo, config lazy.Config) (*ReverseInnerSearcher, error)
- func (s *ReverseInnerSearcher) Find(haystack []byte) *Match
- func (s *ReverseInnerSearcher) IsMatch(haystack []byte) bool
type ReverseSuffixSearcher
- func NewReverseSuffixSearcher(forwardNFA *nfa.NFA, suffixLiterals *literal.Seq, config lazy.Config) (*ReverseSuffixSearcher, error)
- func (s *ReverseSuffixSearcher) Find(haystack []byte) *Match
- func (s *ReverseSuffixSearcher) IsMatch(haystack []byte) bool
type Stats
type Strategy
- func SelectStrategy(n *nfa.NFA, re *syntax.Regexp, literals *literal.Seq, config Config) Strategy
- func (s Strategy) String() string

Constants ¶

This section is empty.

Variables ¶

View Source

var ErrNoInnerLiterals = errors.New("no inner literals available for ReverseInner strategy")

ErrNoInnerLiterals indicates that no inner literals could be extracted for ReverseInner strategy. This is not a fatal error - it just means ReverseInner optimization cannot be used.

View Source

var ErrNoPrefilter = errors.New("no prefilter available for suffix literals")

ErrNoPrefilter indicates that no prefilter could be built for suffix literals. This is not a fatal error - it just means ReverseSuffix optimization cannot be used.

Functions ¶

func StrategyReason ¶

func StrategyReason(strategy Strategy, n *nfa.NFA, literals *literal.Seq, config Config) string

StrategyReason provides a human-readable explanation for strategy selection.

This is useful for debugging and performance tuning.

Example:

strategy := meta.SelectStrategy(nfa, literals, config)
reason := meta.StrategyReason(strategy, nfa, literals, config)
log.Printf("Using %s: %s", strategy, reason)

Types ¶

type CompileError ¶

type CompileError struct {
	Pattern string
	Err     error
}

CompileError represents a pattern compilation error.

func (*CompileError) Error ¶

func (e *CompileError) Error() string

Error implements the error interface.

func (*CompileError) Unwrap ¶

func (e *CompileError) Unwrap() error

Unwrap returns the underlying error.

type Config ¶

type Config struct {
	// EnableDFA enables the Lazy DFA engine.
	// When false, only NFA (PikeVM) is used.
	// Default: true
	EnableDFA bool

	// EnablePrefilter enables literal-based prefiltering.
	// When false, no prefilter is used even if literals are available.
	// Default: true
	EnablePrefilter bool

	// MaxDFAStates sets the maximum number of DFA states to cache.
	// Larger values use more memory but reduce NFA fallback frequency.
	// Default: 10000
	MaxDFAStates uint32

	// DeterminizationLimit caps the number of NFA states per DFA state.
	// This prevents exponential blowup in patterns like (a*)*b.
	// Default: 1000
	DeterminizationLimit int

	// MinLiteralLen is the minimum length for prefilter literals.
	// Shorter literals may have too many false positives.
	// Default: 2
	MinLiteralLen int

	// MaxLiterals limits the number of literals to extract for prefiltering.
	// Default: 64
	MaxLiterals int

	// MaxRecursionDepth limits recursion during NFA compilation.
	// Default: 100
	MaxRecursionDepth int
}

Config controls meta-engine behavior and performance characteristics.

Configuration options affect:

Strategy selection (which engine to use)
Cache sizes (DFA state cache)
Limits (determinization, recursion)
Prefilter enablement

Example:

config := meta.DefaultConfig()
config.EnableDFA = false // Force NFA-only execution
engine := meta.NewEngine(nfa, config)

func DefaultConfig ¶

func DefaultConfig() Config

DefaultConfig returns a configuration with sensible defaults.

Defaults are tuned for typical regex patterns with a balance between performance and memory usage:

DFA enabled with 10K state cache (moderate memory usage)
Prefilter enabled (5-50x speedup on patterns with literals)
Conservative determinization limit (prevents exponential blowup)
Reasonable recursion depth (handles nested patterns)

Example:

config := meta.DefaultConfig()
// Use as-is or customize specific options
config.MaxDFAStates = 50000 // Increase cache for better hit rate

func (Config) Validate ¶

func (c Config) Validate() error

Validate checks if the configuration is valid. Returns an error if any parameter is out of range.

Valid ranges:

MaxDFAStates: 1 to 1,000,000
DeterminizationLimit: 10 to 100,000
MinLiteralLen: 1 to 64
MaxLiterals: 1 to 1,000
MaxRecursionDepth: 10 to 1,000

Example:

config := meta.Config{MaxDFAStates: 0} // Invalid!
if err := config.Validate(); err != nil {
    log.Fatal(err)
}

type ConfigError ¶

type ConfigError struct {
	Field   string
	Message string
}

ConfigError represents an invalid configuration parameter.

func (*ConfigError) Error ¶

func (e *ConfigError) Error() string

Error implements the error interface.

type Engine ¶

type Engine struct {
	// contains filtered or unexported fields
}

Engine is the meta-engine that orchestrates all regex execution strategies.

The Engine:

Analyzes the pattern and extracts literals
Selects the optimal strategy (NFA, DFA, or both)
Builds prefilter (if literals available)
Coordinates search across engines

Thread safety: Not thread-safe. Each goroutine should use its own Engine instance. The underlying NFA is immutable and can be shared, but Engine state is mutable.

Example:

// Compile pattern
engine, err := meta.Compile("(foo|bar)\\d+")
if err != nil {
    return err
}

// Search
haystack := []byte("test foo123 end")
match := engine.Find(haystack)
if match != nil {
    println(match.String()) // "foo123"
}

func Compile ¶

func Compile(pattern string) (*Engine, error)

Compile compiles a regex pattern string into an executable Engine.

Steps:

Parse pattern using regexp/syntax
Compile to NFA
Extract literals (prefixes, suffixes)
Build prefilter (if good literals exist)
Select strategy
Build DFA (if strategy requires it)

Returns an error if:

Pattern syntax is invalid
Pattern is too complex (recursion limit exceeded)
Configuration is invalid

Example:

engine, err := meta.Compile("hello.*world")
if err != nil {
    log.Fatal(err)
}

func CompileRegexp ¶

func CompileRegexp(re *syntax.Regexp, config Config) (*Engine, error)

CompileRegexp compiles a parsed syntax.Regexp with default configuration.

This is useful when you already have a parsed regexp from another source.

Example:

re, _ := syntax.Parse("hello", syntax.Perl)
engine, err := meta.CompileRegexp(re, meta.DefaultConfig())

func CompileWithConfig ¶

func CompileWithConfig(pattern string, config Config) (*Engine, error)

CompileWithConfig compiles a pattern with custom configuration.

Example:

config := meta.DefaultConfig()
config.MaxDFAStates = 50000 // Increase cache
engine, err := meta.CompileWithConfig("(a|b|c)*", config)

func (*Engine) Count ¶ added in v0.4.0

func (e *Engine) Count(haystack []byte, n int) int

Count returns the number of non-overlapping matches in the haystack.

This is optimized for counting without allocating result slices. Uses early termination for boolean checks at each step. If n > 0, counts at most n matches. If n <= 0, counts all matches.

Example:

engine, _ := meta.Compile(`\d+`)
count := engine.Count([]byte("1 2 3 4 5"), -1)
// count == 5

func (*Engine) Find ¶

func (e *Engine) Find(haystack []byte) *Match

Find returns the first match in the haystack, or nil if no match.

The search algorithm depends on the selected strategy:

UseNFA:   PikeVM search directly
UseDFA:   Prefilter (if available) → DFA → NFA fallback
UseBoth:  Try DFA, fallback to NFA on cache full

Example:

engine, _ := meta.Compile("hello")
match := engine.Find([]byte("say hello world"))
if match != nil {
    println(match.String()) // "hello"
}

func (*Engine) FindAllSubmatch ¶ added in v0.4.0

func (e *Engine) FindAllSubmatch(haystack []byte, n int) []*MatchWithCaptures

FindAllSubmatch returns all successive matches with capture group information. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

engine, _ := meta.Compile(`(\w+)@(\w+)\.(\w+)`)
matches := engine.FindAllSubmatch([]byte("[email protected] [email protected]"), -1)
// len(matches) == 2

func (*Engine) FindSubmatch ¶ added in v0.2.0

func (e *Engine) FindSubmatch(haystack []byte) *MatchWithCaptures

FindSubmatch returns the first match with capture group information. Returns nil if no match is found.

Group 0 is always the entire match. Groups 1+ are explicit capture groups. Unmatched optional groups will have nil values.

When a one-pass DFA is available (for anchored patterns), this method is 10-20x faster than PikeVM for capture group extraction.

Example:

engine, _ := meta.Compile(`(\w+)@(\w+)\.(\w+)`)
match := engine.FindSubmatch([]byte("[email protected]"))
if match != nil {
    fmt.Println(match.Group(0)) // "[email protected]"
    fmt.Println(match.Group(1)) // "user"
    fmt.Println(match.Group(2)) // "example"
    fmt.Println(match.Group(3)) // "com"
}

func (*Engine) IsMatch ¶

func (e *Engine) IsMatch(haystack []byte) bool

IsMatch returns true if the pattern matches anywhere in the haystack.

This is optimized for boolean matching:

Uses early termination (returns immediately on first match)
Avoids Match object creation
Uses DFA.IsMatch when available (2-10x faster than Find)

Example:

engine, _ := meta.Compile("hello")
if engine.IsMatch([]byte("say hello world")) {
    println("matches!")
}

func (*Engine) NumCaptures ¶ added in v0.2.0

func (e *Engine) NumCaptures() int

NumCaptures returns the number of capture groups in the pattern. Group 0 is the entire match, groups 1+ are explicit captures.

func (*Engine) ResetStats ¶

func (e *Engine) ResetStats()

ResetStats resets execution statistics to zero.

func (*Engine) Stats ¶

func (e *Engine) Stats() Stats

Stats returns execution statistics.

Useful for performance analysis and debugging.

Example:

stats := engine.Stats()
println("NFA searches:", stats.NFASearches)
println("DFA searches:", stats.DFASearches)

func (*Engine) Strategy ¶

func (e *Engine) Strategy() Strategy

Strategy returns the execution strategy selected for this engine.

Example:

strategy := engine.Strategy()
println(strategy.String()) // "UseDFA"

func (*Engine) SubexpNames ¶ added in v0.5.0

func (e *Engine) SubexpNames() []string

SubexpNames returns the names of capture groups in the pattern. Index 0 is always "" (entire match). Named groups return their names, unnamed groups return "". This matches stdlib regexp.Regexp.SubexpNames() behavior.

type Match ¶

type Match struct {
	// contains filtered or unexported fields
}

Match represents a successful regex match with position information.

A Match contains:

Start position (inclusive)
End position (exclusive)
Reference to the original haystack

Note: This is a simple match without capture group support. Capture groups will be added in a future version.

Example:

match := &Match{start: 5, end: 11, haystack: []byte("test foo123 end")}
println(match.String()) // "foo123"
println(match.Start(), match.End()) // 5, 11

func NewMatch ¶

func NewMatch(start, end int, haystack []byte) *Match

NewMatch creates a new Match from start and end positions.

Parameters:

start: inclusive start position in haystack
end: exclusive end position in haystack
haystack: the original byte buffer that was searched

The haystack is stored by reference (not copied) for efficiency. Callers must ensure the haystack remains valid for the lifetime of the Match.

Example:

haystack := []byte("hello world")
match := meta.NewMatch(0, 5, haystack) // "hello"

func (*Match) Bytes ¶

func (m *Match) Bytes() []byte

Bytes returns the matched bytes as a slice.

The returned slice is a view into the original haystack (not a copy). Callers should copy the bytes if they need to retain them after the haystack is modified or deallocated.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(string(match.Bytes())) // "foo123"

func (*Match) Contains ¶

func (m *Match) Contains(pos int) bool

Contains returns true if the given position is within the match range.

Parameters:

pos: position to check (must be >= 0)

Returns true if start <= pos < end.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Contains(7))  // true
println(match.Contains(11)) // false (exclusive end)

func (*Match) End ¶

func (m *Match) End() int

End returns the exclusive end position of the match.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.End()) // 11

func (*Match) IsEmpty ¶

func (m *Match) IsEmpty() bool

IsEmpty returns true if the match has zero length.

Empty matches can occur with patterns like "" or "(?:)" that match without consuming input.

Example:

match := meta.NewMatch(5, 5, []byte("test"))
println(match.IsEmpty()) // true

func (*Match) Len ¶

func (m *Match) Len() int

Len returns the length of the match in bytes.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Len()) // 6 (11 - 5)

func (*Match) Start ¶

func (m *Match) Start() int

Start returns the inclusive start position of the match.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.Start()) // 5

func (*Match) String ¶

func (m *Match) String() string

String returns the matched text as a string.

This allocates a new string by copying the matched bytes. For zero-allocation access, use Bytes() instead.

Example:

match := meta.NewMatch(5, 11, []byte("test foo123 end"))
println(match.String()) // "foo123"

type MatchWithCaptures ¶ added in v0.2.0

type MatchWithCaptures struct {
	// contains filtered or unexported fields
}

MatchWithCaptures represents a match with capture group information. Group 0 is always the entire match, groups 1+ are explicit capture groups.

func NewMatchWithCaptures ¶ added in v0.2.0

func NewMatchWithCaptures(haystack []byte, captures [][]int) *MatchWithCaptures

NewMatchWithCaptures creates a MatchWithCaptures from capture positions.

func (*MatchWithCaptures) AllGroupStrings ¶ added in v0.2.0

func (m *MatchWithCaptures) AllGroupStrings() []string

AllGroupStrings returns all captured groups as strings.

func (*MatchWithCaptures) AllGroups ¶ added in v0.2.0

func (m *MatchWithCaptures) AllGroups() [][]byte

AllGroups returns all captured groups as byte slices. Result[0] is the entire match, result[i] may be nil if group i was not captured.

func (*MatchWithCaptures) Bytes ¶ added in v0.2.0

func (m *MatchWithCaptures) Bytes() []byte

Bytes returns the matched bytes for group 0 (entire match).

func (*MatchWithCaptures) End ¶ added in v0.2.0

func (m *MatchWithCaptures) End() int

End returns the end position of the entire match (group 0).

func (*MatchWithCaptures) Group ¶ added in v0.2.0

func (m *MatchWithCaptures) Group(index int) []byte

Group returns the captured text for the specified group. Group 0 is the entire match. Returns nil if group was not captured.

func (*MatchWithCaptures) GroupIndex ¶ added in v0.2.0

func (m *MatchWithCaptures) GroupIndex(index int) []int

GroupIndex returns [start, end] positions for the specified group. Returns nil if group was not captured.

func (*MatchWithCaptures) GroupString ¶ added in v0.2.0

func (m *MatchWithCaptures) GroupString(index int) string

GroupString returns the captured text as string for the specified group.

func (*MatchWithCaptures) NumCaptures ¶ added in v0.2.0

func (m *MatchWithCaptures) NumCaptures() int

NumCaptures returns the number of capture groups (including group 0).

func (*MatchWithCaptures) Start ¶ added in v0.2.0

func (m *MatchWithCaptures) Start() int

Start returns the start position of the entire match (group 0).

func (*MatchWithCaptures) String ¶ added in v0.2.0

func (m *MatchWithCaptures) String() string

String returns the matched text for group 0 (entire match).

type ReverseAnchoredSearcher ¶ added in v0.4.0

type ReverseAnchoredSearcher struct {
	// contains filtered or unexported fields
}

ReverseAnchoredSearcher performs reverse search for patterns anchored at end.

This strategy is used for patterns like "abc$" or "pattern.*suffix$" where the pattern must match at the end of the haystack. Instead of trying to match from every position in the haystack (O(n) attempts), we search backward from the end of the haystack (O(1) attempt).

Algorithm:

Build reverse NFA from forward NFA
Build reverse DFA from reverse NFA
Search backward from end of haystack using reverse DFA
If match found, convert reverse positions to forward positions

Performance:

Forward search (naive): O(n*m) where n=haystack length, m=pattern length
Reverse search: O(m) - only one search attempt from the end
Speedup: ~n/m (e.g., 1000x for 1MB haystack and 1KB pattern)

Example:

// Pattern "Easy1$" on 1MB data
// Forward: 340 seconds (tries match at every position)
// Reverse: ~1 millisecond (one match attempt from end)

func NewReverseAnchoredSearcher ¶ added in v0.4.0

func NewReverseAnchoredSearcher(forwardNFA *nfa.NFA, config lazy.Config) (*ReverseAnchoredSearcher, error)

NewReverseAnchoredSearcher creates a reverse searcher from forward NFA.

Parameters:

forwardNFA: the compiled forward NFA
config: DFA configuration for reverse DFA cache

Returns nil if reverse DFA cannot be built (falls back to forward search).

func (*ReverseAnchoredSearcher) Find ¶ added in v0.4.0

func (s *ReverseAnchoredSearcher) Find(haystack []byte) *Match

Find searches backward from end of haystack and returns the match.

Algorithm:

Quick check with reverse DFA (zero-allocation backward scan)
If DFA confirms match, reverse bytes for PikeVM to get exact bounds
Convert reverse positions back to forward positions

For a pattern ending with $, the reverse NFA is anchored at start (^). PikeVM on reverse NFA requires reversed bytes to find exact match bounds.

Example:

Forward pattern: "abc$"
Forward haystack: "xxxabc"
Reverse haystack: "cbaxxx"
Reverse pattern: "^cba"
Match in reverse: [0:3] = "cba"
Convert to forward: [3:6] = "abc"

func (*ReverseAnchoredSearcher) IsMatch ¶ added in v0.4.0

func (s *ReverseAnchoredSearcher) IsMatch(haystack []byte) bool

IsMatch checks if the pattern matches at the end of haystack.

This is optimized for boolean matching:

Uses reverse DFA for fast rejection
ZERO-ALLOCATION: backward scan without byte reversal
No Match object allocation
Early termination

type ReverseInnerSearcher ¶ added in v0.8.0

type ReverseInnerSearcher struct {
	// contains filtered or unexported fields
}

ReverseInnerSearcher performs inner literal prefilter + bidirectional DFA search.

This strategy is used for patterns with inner literals like `prefix.*inner.*suffix` where:

The pattern has a good inner literal for prefiltering
Has wildcards/repetitions both BEFORE and AFTER the inner literal
Can use bidirectional DFA to find exact match bounds

Algorithm:

Extract inner literals from pattern
Build prefilter for inner literals
Search algorithm: a. Prefilter finds inner literal candidates in haystack b. For each candidate at position P: - Reverse DFA scans backward from P to find match START - Forward DFA scans forward from P+innerLen to find match END c. Return leftmost-longest match

Performance:

Forward search: O(n*m) where n=haystack length, m=pattern length
ReverseInner: O(k*(m1+m2)) where k=number of inner candidates, m1=prefix length, m2=suffix length
Speedup: 10-100x for patterns like `ERROR.*connection.*timeout` on large haystacks

Example:

// Pattern `ERROR.*connection.*timeout` on 1MB log with 5 "connection" occurrences
// Forward: tries pattern match at every position (~1M attempts)
// ReverseInner: prefilter finds 5 "connection" positions, bidirectional DFA verifies (~5 attempts)
// Speedup: ~200,000x

func NewReverseInnerSearcher ¶ added in v0.8.0

func NewReverseInnerSearcher(
	fullNFA *nfa.NFA,
	innerInfo *literal.InnerLiteralInfo,
	config lazy.Config,
) (*ReverseInnerSearcher, error)

NewReverseInnerSearcher creates a reverse inner searcher using AST splitting.

The key optimization (from rust-regex):

Build reverse NFA from PREFIX AST only (not full pattern)
Build forward NFA from SUFFIX AST only (not full pattern)
This enables true bidirectional search with 10-100x speedup

Requirements:

InnerLiteralInfo must have PrefixAST and SuffixAST populated
Inner literal must have wildcards both before and after
Prefilter must be available

Parameters:

fullNFA: the compiled NFA for the full pattern (for fallback)
innerInfo: extracted inner literal info with split AST
config: DFA configuration for reverse/forward DFA cache

Returns error if reverse inner optimization cannot be applied:

ErrNoInnerLiterals: no inner literals available
ErrNoPrefilter: prefilter could not be built
DFA compilation errors

func (*ReverseInnerSearcher) Find ¶ added in v0.8.0

func (s *ReverseInnerSearcher) Find(haystack []byte) *Match

Find searches using inner literal prefilter + bidirectional DFA and returns the match.

Algorithm (leftmost-longest/greedy semantics):

Use prefilter to find ALL inner literal candidates
For each candidate at position P: a. Reverse DFA scans backward from P to find match START b. Forward DFA scans forward from P+innerLen to find match END
Track leftmost-longest match: - Leftmost: earliest start position - Longest: if same start, choose longest end
Return the best match found

Performance:

ZERO PikeVM calls - uses DFA exclusively
Bidirectional DFA scan finds both match start and end efficiently
Prefilter reduces search space dramatically

Example (bidirectional matching):

Pattern: `ERROR.*connection.*timeout`
Haystack: "ERROR: connection lost due to connection timeout"
Inner literal: "connection"

1. Prefilter finds "connection" at position 7, then at position 32
2. Candidate 1 (pos=7):
   a. Reverse DFA from pos=7 backward → finds start=0 (ERROR)
   b. Forward DFA from pos=17 forward → finds end=24 (lost, no timeout)
   c. No valid match (pattern requires "timeout" after)
3. Candidate 2 (pos=32):
   a. Reverse DFA from pos=32 backward → finds start=0 (ERROR)
   b. Forward DFA from pos=42 forward → finds end=49 (timeout)
   c. Valid match [0:49]
4. Return [0:49] = "ERROR: connection lost due to connection timeout"

func (*ReverseInnerSearcher) IsMatch ¶ added in v0.8.0

func (s *ReverseInnerSearcher) IsMatch(haystack []byte) bool

IsMatch checks if the pattern matches using inner prefilter + bidirectional DFA.

This is optimized for boolean matching:

Uses prefilter for fast candidate finding
Uses bidirectional DFA for fast verification
No Match object allocation
Early termination on first match
ZERO PikeVM calls - DFA confirmation is sufficient

Algorithm:

Prefilter finds inner literal candidates
For each candidate: a. Reverse DFA checks if we can reach inner from start b. Forward DFA checks if we can reach end from inner
Return true on first valid match

type ReverseSuffixSearcher ¶ added in v0.6.0

type ReverseSuffixSearcher struct {
	// contains filtered or unexported fields
}

ReverseSuffixSearcher performs suffix literal prefilter + reverse DFA search.

This strategy is used for patterns with literal suffixes like `.*\.txt` where:

The pattern is NOT anchored at start (^)
Has a good suffix literal for prefiltering
Can use reverse DFA to verify the prefix pattern

Algorithm:

Extract suffix literals from pattern
Build prefilter for suffix literals
Search algorithm: a. Prefilter finds suffix candidates in haystack b. For each candidate: - Build reverse search from haystack start to suffix end - Use reverse DFA to verify prefix pattern - If match, use forward DFA to find full match end c. Return first match

Performance:

Forward naive search: O(n*m) where n=haystack length, m=pattern length
ReverseSuffix: O(k*m) where k=number of suffix candidates (usually k << n)
Speedup: 10-100x for patterns like `.*\.txt` on large haystacks

Example:

// Pattern `.*\.txt` on 1MB data with 10 `.txt` occurrences
// Forward: tries pattern match at every position (~1M attempts)
// ReverseSuffix: prefilter finds 10 `.txt` positions, reverse DFA verifies (~10 attempts)
// Speedup: ~100,000x

func NewReverseSuffixSearcher ¶ added in v0.6.0

func NewReverseSuffixSearcher(
	forwardNFA *nfa.NFA,
	suffixLiterals *literal.Seq,
	config lazy.Config,
) (*ReverseSuffixSearcher, error)

NewReverseSuffixSearcher creates a reverse suffix searcher from forward NFA.

Requirements:

Pattern must have good suffix literals
Pattern must NOT be start-anchored (^)
Prefilter must be available

Parameters:

forwardNFA: the compiled forward NFA
suffixLiterals: extracted suffix literals from pattern
config: DFA configuration for reverse DFA cache

Returns nil if reverse suffix optimization cannot be applied.

func (*ReverseSuffixSearcher) Find ¶ added in v0.6.0

func (s *ReverseSuffixSearcher) Find(haystack []byte) *Match

Find searches using suffix literal prefilter + reverse DFA and returns the match.

Algorithm (leftmost-longest/greedy semantics):

Use prefilter to find ALL suffix literal candidates
For the FIRST candidate that produces a valid match: a. Use reverse DFA SearchReverse to find match START (leftmost) b. Record this as the leftmost match start
Continue scanning for more candidates with the SAME match start a. Keep track of the longest match end (greedy)
Return the match with leftmost start and longest end

Performance:

ZERO PikeVM calls - uses DFA exclusively
Single reverse DFA scan finds both match validity AND start position
Multiple candidates scanned for greedy semantics

Example (greedy matching):

Pattern: `.*\.txt`
Haystack: "a.txt.txt"
Suffix literal: `.txt`

1. Prefilter finds `.txt` at position 1, then at position 5
2. Candidate 1 (pos=1): SearchReverse returns 0, match = [0:5]
3. Candidate 2 (pos=5): SearchReverse returns 0, match = [0:9] (longer!)
4. Return [0:9] = "a.txt.txt" (greedy)

func (*ReverseSuffixSearcher) IsMatch ¶ added in v0.6.0

func (s *ReverseSuffixSearcher) IsMatch(haystack []byte) bool

IsMatch checks if the pattern matches using suffix prefilter + reverse DFA.

This is optimized for boolean matching:

Uses prefilter for fast candidate finding
Uses reverse DFA for fast prefix verification
No Match object allocation
Early termination on first match
ZERO PikeVM calls - reverse DFA confirmation is sufficient

type Stats ¶

type Stats struct {
	// NFASearches counts NFA (PikeVM) searches
	NFASearches uint64

	// DFASearches counts DFA searches
	DFASearches uint64

	// OnePassSearches counts OnePass DFA searches (for FindSubmatch)
	OnePassSearches uint64

	// PrefilterHits counts successful prefilter matches
	PrefilterHits uint64

	// PrefilterMisses counts prefilter candidates that didn't match
	PrefilterMisses uint64

	// DFACacheFull counts times DFA fell back to NFA due to cache full
	DFACacheFull uint64
}

Stats tracks execution statistics for performance analysis.

type Strategy ¶

type Strategy int

Strategy represents the execution strategy for regex matching.

The meta-engine chooses between:

UseNFA: use PikeVM exclusively (simple patterns, no cache needed)
UseDFA: use Lazy DFA with NFA fallback (complex patterns, good literals)
UseBoth: adaptive strategy (try DFA first, fallback to NFA on cache full)

Strategy selection is automatic based on pattern analysis.

const (
	// UseNFA uses only the NFA (PikeVM) engine.
	// Selected for:
	//   - Very small NFAs (< 20 states) where DFA overhead isn't worth it
	//   - Patterns without literals where DFA has no advantage
	//   - When EnableDFA is false in config
	UseNFA Strategy = iota

	// UseDFA uses Lazy DFA with NFA fallback on cache overflow.
	// Selected for:
	//   - Large NFAs (> 100 states) where DFA is essential
	//   - Patterns with good literals (prefilter + DFA is fastest)
	//   - Simple patterns (no alternations) where DFA doesn't blow up
	UseDFA

	// UseBoth uses adaptive strategy: try DFA, fallback to NFA if cache fills.
	// Selected for:
	//   - Medium-sized NFAs (20-100 states)
	//   - Patterns with some literals but complex structure
	//   - Default when pattern characteristics are unclear
	UseBoth

	// UseReverseAnchored uses reverse DFA search for patterns anchored at end.
	// Selected for:
	//   - Patterns with $ or \z anchor (end of text)
	//   - NOT also anchored at start (^)
	//   - Searches backward from end of haystack
	//   - Converts O(n*m) to O(m) for end-anchored patterns
	UseReverseAnchored

	// UseReverseSuffix uses suffix literal prefilter + reverse DFA search.
	// Selected for:
	//   - Patterns with literal suffix (e.g., `.*\.txt`)
	//   - NOT start-anchored (^)
	//   - Has good suffix literal for prefiltering
	//   - Speedup: 10-100x for patterns like `.*\.txt`
	UseReverseSuffix

	// UseOnePass uses one-pass DFA for anchored patterns with capture groups.
	// Selected for:
	//   - Pattern is always anchored (^ or implicit anchor)
	//   - Pattern is "one-pass" (no ambiguity in matching paths)
	//   - Pattern has capture groups (otherwise lazy DFA is faster)
	//   - Speedup: 10-20x over PikeVM for capture group extraction
	//   - Only used for FindSubmatch, not Find
	UseOnePass

	// UseReverseInner uses inner literal prefilter + bidirectional DFA search.
	// Selected for:
	//   - Patterns with inner literal (e.g., `prefix.*inner.*suffix`)
	//   - NOT start-anchored (^) or end-anchored ($)
	//   - Has good inner literal for prefiltering
	//   - NO good prefix or suffix literals (otherwise prefer UseDFA/UseReverseSuffix)
	//   - Has wildcards both before AND after inner literal
	//   - Speedup: 10-100x for patterns like `ERROR.*connection.*timeout`
	UseReverseInner
)

func SelectStrategy ¶

func SelectStrategy(n *nfa.NFA, re *syntax.Regexp, literals *literal.Seq, config Config) Strategy

SelectStrategy analyzes the NFA and literals to choose the best execution strategy.

Algorithm:

If end-anchored ($ or \z) and not start-anchored → UseReverseAnchored
If DFA disabled in config → UseNFA
If NFA is tiny (< 20 states) → UseNFA (DFA overhead not worth it)
If good literals exist → UseDFA (prefilter + DFA is fastest)
If NFA is large (> 100 states) → UseDFA (essential for performance)
Otherwise → UseBoth (adaptive)

"Good literals" means:

At least one literal exists
Longest common prefix (LCP) length >= MinLiteralLen
This enables effective prefiltering

Parameters:

n: the compiled NFA to analyze
re: the parsed regexp (for anchor detection, can be nil)
literals: extracted prefix literals (can be nil)
config: meta-engine configuration

Example:

strategy := meta.SelectStrategy(nfa, re, literals, config)
switch strategy {
case meta.UseNFA:
    // Use PikeVM only
case meta.UseDFA:
    // Use Lazy DFA
case meta.UseReverseAnchored:
    // Use reverse search
case meta.UseBoth:
    // Adaptive
}

func (Strategy) String ¶

func (s Strategy) String() string

String returns a human-readable representation of the Strategy.

Source Files ¶

View all Source files

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL