coregex

package module

v0.8.2 Latest Latest Go to latest Published: Dec 3, 2025 License: MIT Imports: 1 Imported by: 0

Details

Valid go.mod file
Redistributable license
Tagged version
Stable version
Learn more about best practices

Repository

github.com/coregx/coregex

Links

Open Source Insights

README ¶

coregex - Production-Grade Regex Engine for Go

3-3000x+ faster than stdlib through multi-engine architecture and SIMD optimizations

A production-grade regex engine for Go with dramatic performance improvements over the standard library. Inspired by Rust's regex crate, coregex uses a multi-engine architecture with SIMD-accelerated prefilters to achieve 3-3000x+ speedup depending on pattern type (especially suffix patterns like .*\.txt and inner literal patterns like .*keyword.*).

Features

⚡ Performance

🚀 Up to 263x faster than Go's regexp package (case-insensitive patterns)
🎯 SIMD-accelerated search with AVX2/SSSE3 assembly (10-15x faster substring search)
📊 Multi-pattern search (Teddy SIMD algorithm for 2-8 literals)
💾 Zero allocations in hot paths through object pooling

🏗️ Architecture

🧠 Meta-engine orchestrates strategy selection (DFA/NFA/ReverseAnchored/ReverseInner)
⚡ Lazy DFA with configurable caching (on-demand state construction)
🔄 Pike VM (Thompson's NFA) for guaranteed O(n×m) performance
🔙 Reverse Search for $ anchor and suffix patterns (1000x+ speedup)
🎯 ReverseInner for .*keyword.* patterns with bidirectional DFA (3000x+ speedup)
⚡ OnePass DFA for simple anchored patterns (10x faster captures, 0 allocs)
📌 Prefilter coordination (memchr/memmem/teddy)

🎯 API Design

Simple, drop-in replacement for regexp package
Configuration system for performance tuning
Thread-safe with concurrent compilation support
Comprehensive error handling

Installation

go get github.com/coregx/coregex

Requirements:

Go 1.25 or later
Zero external dependencies (except golang.org/x/sys for CPU feature detection)

Quick Start

Basic Usage

package main

import (
	"fmt"
	"log"

	"github.com/coregx/coregex"
)

func main() {
	// Compile a regex pattern
	re, err := coregex.Compile(`\b\w+@\w+\.\w+\b`)
	if err != nil {
		log.Fatal(err)
	}

	// Find first match
	text := []byte("Contact us at support@example.com for help")
	if match := re.Find(text); match != nil {
		fmt.Printf("Found email: %s\n", match)
	}

	// Find all matches
	matches := re.FindAll(text, -1)
	for _, m := range matches {
		fmt.Printf("Match: %s\n", m)
	}
}

Advanced Configuration

package main

import (
	"log"

	"github.com/coregx/coregex"
)

func main() {
	// Create custom configuration for performance tuning
	config := coregex.DefaultConfig()
	config.DFAMaxStates = 10000        // Limit DFA cache size
	config.EnablePrefilter = true       // Use SIMD prefilters (default)
	config.UseObjectPools = true        // Zero-allocation mode (default)

	// Compile with custom config
	re, err := coregex.CompileWithConfig(`pattern`, config)
	if err != nil {
		log.Fatal(err)
	}

	// Use regex...
	text := []byte("search this text")
	match := re.Find(text)
	if match != nil {
		log.Printf("Found: %s", match)
	}
}

Performance Example

package main

import (
	"fmt"
	"regexp"
	"time"

	"github.com/coregx/coregex"
)

func benchmarkSearch(pattern string, text []byte) {
	// stdlib regexp
	start := time.Now()
	reStdlib := regexp.MustCompile(pattern)
	for i := 0; i < 10000; i++ {
		reStdlib.Find(text)
	}
	stdlibTime := time.Since(start)

	// coregex
	start = time.Now()
	reGoregex := coregex.MustCompile(pattern)
	for i := 0; i < 10000; i++ {
		reGoregex.Find(text)
	}
	coregexTime := time.Since(start)

	speedup := float64(stdlibTime) / float64(coregexTime)
	fmt.Printf("Speedup: %.1fx faster\n", speedup)
}

Performance Benchmarks

SIMD Primitives (vs stdlib):

memchr (single byte): 12.3x faster (64KB input)
memmem (substring): 14.2x faster (64KB input, short needle)
teddy (multi-pattern): 8.5x faster (2-8 patterns)

Regex Search (vs regexp):

Pattern Type	Input Size	stdlib	coregex	Speedup
Case-sensitive	1KB	688 ns	196 ns	3.5x faster
Case-sensitive	32KB	9,715 ns	8,367 ns	1.2x faster
Case-insensitive	1KB	24,110 ns	262 ns	92x faster
Case-insensitive	32KB	1,229,521 ns	4,669 ns	263x faster
*`.\.txt` IsMatch**	32KB	1.3 ms	855 ns	1,549x faster
*`.\.txt` IsMatch**	1MB	27 ms	21 µs	1,314x faster
*`.keyword.` IsMatch*	250KB	12.6 ms	4 µs	3,154x faster
*`.keyword.` Find*	250KB	15.2 ms	8 µs	1,894x faster

Key insights:

Inner literal patterns (.*keyword.*) see massive speedups (2000-3000x+) through ReverseInner optimization (v0.8.0)
Suffix patterns (.*\.txt) see 1000x+ speedups through ReverseSuffix optimization
Case-insensitive patterns ((?i)...) are also excellent (100-263x) - stdlib backtracking is slow, our DFA is fast
Simple patterns see 1-5x improvement depending on literals

See benchmark/ for detailed comparisons.

Supported Features

Current Features

Feature	Status	Notes
SIMD Primitives	✅	memchr, memchr2/3, memmem, teddy
Literal Extraction	✅	Prefix/suffix/inner literals
Prefilter System	✅	Automatic strategy selection
Meta-Engine	✅	DFA/NFA/ReverseAnchored orchestration
Lazy DFA	✅	On-demand state construction
Pike VM (NFA)	✅	Thompson's construction
Reverse Search	✅	ReverseAnchored (v0.4.0), ReverseSuffix (v0.6.0), ReverseInner (v0.8.0)
OnePass DFA	✅	NEW in v0.7.0 - 10x faster captures, 0 allocs
Unicode support	✅	Via `regexp/syntax`
Capture groups	✅	FindSubmatch, FindSubmatchIndex
Replace/Split	✅	ReplaceAll, ReplaceAllFunc, Split
Named captures	✅	NEW in v0.5.0 - SubexpNames() API
Look-around	📅	Planned
Backreferences	❌	Incompatible with O(n) guarantee

Regex Syntax

coregex uses Go's regexp/syntax for pattern parsing, supporting:

✅ Character classes [a-z], \d, \w, \s
✅ Quantifiers *, +, ?, {n,m}
✅ Anchors ^, $, \b, \B
✅ Groups (...) and alternation |
✅ Unicode categories \p{L}, \P{N}
✅ Case-insensitive matching (?i)
✅ Non-capturing groups (?:...)
❌ Backreferences (not supported - O(n) performance guarantee)

Known Limitations

What Works:

✅ All standard regex syntax (except backreferences)
✅ Unicode support via regexp/syntax
✅ SIMD acceleration on AMD64 (AVX2/SSSE3)
✅ Cross-platform (fallback to pure Go on other architectures)
✅ Thread-safe compilation and execution
✅ Zero external dependencies
✅ Capture groups with FindSubmatch API
✅ Named capture groups with SubexpNames() API
✅ Replace/Split with $0-$9 template expansion

Current Limitations:

⚠️ Experimental API - May change before v1.0
⚠️ No look-around assertions yet (planned)
⚠️ SIMD only on AMD64 (ARM NEON planned)

Performance Notes:

🚀 Best speedup on patterns with literal prefixes/suffixes
🚀 Excellent for log parsing, email/URL extraction
⚡ May be slower than stdlib on trivial patterns (overhead)
⚡ First match slower (compilation cost), repeated matches faster

See CHANGELOG.md for detailed version history.

Documentation

Getting Started - Usage examples and tutorials
API Reference - Full API documentation
CHANGELOG.md - Version history
ROADMAP.md - Future plans and development timeline
SECURITY.md - Security policy and ReDoS prevention

Development

Building

# Clone repository
git clone https://github.com/coregx/coregex.git
cd coregex

# Build all packages
go build ./...

# Run tests
go test ./...

# Run tests with race detector
go test -race ./...

# Run benchmarks
go test -bench=. -benchmem ./simd/
go test -bench=. -benchmem ./prefilter/

Testing

# Run all tests
go test ./...

# Run specific package tests
go test ./simd/ -v
go test ./meta/ -v

# Run with coverage
go test -cover ./...

# Run linter (golangci-lint required)
golangci-lint run

Pre-release Check

Before creating a release, run the comprehensive validation script:

bash scripts/pre-release-check.sh

This checks:

✅ Go version (1.25+)
✅ Code formatting (gofmt)
✅ go vet passes
✅ All tests pass (with race detector)
✅ Test coverage >70%
✅ golangci-lint passes
✅ Documentation present

Contributing

Contributions are welcome! This is an experimental project and we'd love your help.

Before contributing:

Read CONTRIBUTING.md - Git Flow workflow and guidelines
Check open issues
Join GitHub Discussions

Ways to contribute:

🐛 Report bugs and edge cases
💡 Suggest features
📝 Improve documentation
🔧 Submit pull requests
⭐ Star the project
🧪 Benchmark against stdlib and report results

Priority areas:

Look-around assertions
ARM NEON SIMD implementation
More comprehensive benchmarks
Performance profiling and optimization

Comparison with Other Libraries

Feature	coregex	stdlib `regexp`	regexp2
Performance	🚀 3-3000x faster	Baseline	Slower (backtracking)
SIMD acceleration	✅ AVX2/SSSE3	❌ No	❌ No
Prefilters	✅ Automatic	❌ No	❌ No
Multi-engine	✅ DFA/NFA/PikeVM	❌ Single	❌ Backtracking only
O(n) guarantee	✅ Yes	✅ Yes	❌ No (exponential worst-case)
Backreferences	❌ Not supported	❌ Not supported	✅ Supported
Capture groups	✅ Supported	✅ Supported	✅ Supported
Named captures	✅ Supported	✅ Supported	✅ Supported
Look-around	📅 Planned	❌ Limited	✅ Supported
API compatibility	✅ Drop-in replacement	-	Different
Maintained	✅ Active	✅ Stdlib	✅ Active

Note on Backreferences: Both coregex and stdlib regexp do NOT support backreferences (like \1, \2) because they are fundamentally incompatible with guaranteed O(n) linear time complexity. Backreferences require backtracking which can lead to exponential worst-case performance (ReDoS vulnerability). If you absolutely need backreferences, use regexp2, but be aware of the performance trade-offs.

When to use coregex:

✅ Performance-critical applications (log parsing, text processing)
✅ Patterns with literal prefixes/suffixes
✅ Multi-pattern search (email/URL extraction)
✅ When you need O(n) performance guarantee

When to use stdlib regexp:

✅ Simple patterns where performance doesn't matter
✅ Maximum stability and API compatibility

When to use regexp2:

✅ You need backreferences (not supported by coregex)
✅ Complex look-around assertions (v0.4.0 for coregex)
⚠️ Accept exponential worst-case performance

Architecture Overview

┌─────────────────────────────────────────────────────────────────┐
│                        Meta-Engine                              │
│  (Strategy: DFA/NFA/ReverseAnchored/ReverseInner/OnePass)       │
└────────────┬────────────────────────────────────────────────────┘
             │
     ┌───────┴───────┐
     │  Prefilter    │ ──► memchr (single byte)
     │  Coordinator  │ ──► memmem (substring)
     └───────┬───────┘ ──► teddy (2-8 patterns, SIMD)
             │         ──► aho-corasick (many patterns)
             │
┌────────────┼─────────────────────────────────────────────────────┐
│            │                                                     │
│  ┌─────────┴─────────┬──────────┬──────────┬──────────┬────────┐│
│  │                   │          │          │          │        ││
│  ▼                   ▼          ▼          ▼          ▼        ││
│ ┌─────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────┐│
│ │  Lazy   │  │ Pike VM  │  │ Reverse  │  │ Reverse  │  │OnePass│
│ │  DFA    │  │  (NFA)   │  │ Anchored │  │  Inner   │  │ DFA  ││
│ │         │  │          │  │ (v0.4.0) │  │ (v0.8.0) │  │(v0.7)││
│ └─────────┘  └──────────┘  └──────────┘  └──────────┘  └──────┘│
│      │            │               │            │            │   │
│      │            │               └────────────┴────────────┘   │
│      │            │                    ReverseSuffix (v0.6.0)   │
└──────┴────────────┴─────────────────────────────────────────────┘
                       │
              ┌────────┴────────┐
              │ SIMD Primitives │
              │ (AVX2/SSSE3)    │
              └─────────────────┘

Key components:

Meta-Engine - Intelligent strategy selection based on pattern analysis
Prefilter System - Fast rejection of non-matching candidates
Multi-Engine Execution - DFA for speed, NFA for correctness
ReverseAnchored - For $ anchor patterns (v0.4.0)
ReverseSuffix - 1000x+ speedup for .*\.txt suffix patterns (v0.6.0)
OnePass DFA - 10x faster captures with 0 allocations (v0.7.0)
ReverseInner - 3000x+ speedup for .*keyword.* patterns (v0.8.0)
SIMD Primitives - 10-15x faster byte/substring search

See package documentation on pkg.go.dev for API details.

Part of the CoreGX (Core Go eXtensions) ecosystem:

More projects coming soon!

Community:

golang/go#26623 - Go stdlib regexp performance discussion (we posted there!)

Inspired by:

Rust regex crate - Architecture and design
RE2 - O(n) performance guarantees
Hyperscan - SIMD multi-pattern matching

License

This project is licensed under the MIT License - see the LICENSE file for details.

Acknowledgments

Rust regex crate team for architectural inspiration
Russ Cox for Thompson's NFA articles and RE2
Intel for Hyperscan and Teddy algorithm
Go team for regexp/syntax parser
All contributors to this project

Support

📖 API Reference - Full documentation
🐛 Issue Tracker - Report bugs
💬 Discussions - Ask questions

Status: ⚠️ Pre-1.0 - API may change before v1.0.0

Ready for: Testing, benchmarking, feedback, and experimental use

See Releases for the latest version and Discussions for roadmap.

Built with performance and correctness in mind by the coregex community

Documentation ¶

Overview ¶

Package coregex provides a high-performance regex engine for Go.

coregex achieves 5-50x speedup over Go's stdlib regexp through:

Multi-engine architecture (NFA, Lazy DFA, prefilters)
SIMD-accelerated primitives (memchr, memmem, teddy)
Literal extraction and prefiltering
Automatic strategy selection

The public API is compatible with stdlib regexp where possible, making it easy to migrate existing code.

Basic usage:

// Compile a pattern
re, err := coregex.Compile(`\d+`)
if err != nil {
    log.Fatal(err)
}

// Find first match
match := re.Find([]byte("hello 123 world"))
fmt.Println(string(match)) // "123"

// Check if matches
if re.Match([]byte("hello 123")) {
    fmt.Println("matched!")
}

Advanced usage:

// Custom configuration
config := coregex.DefaultConfig()
config.MaxDFAStates = 50000
re, err := coregex.CompileWithConfig("(a|b|c)*", config)

Performance characteristics:

Patterns with literals: 5-50x faster (prefilter optimization)
Simple patterns: comparable to stdlib
Complex patterns: 2-10x faster (DFA avoids backtracking)
Worst case: guaranteed O(m*n) (ReDoS safe)

Limitations (v1.0):

No capture groups (coming in v1.1)
No replace functions (coming in v1.1)
No multiline/case-insensitive flags (coming in v1.1)

Examples ¶

Compile
CompileWithConfig
MustCompile
Regex.Find
Regex.FindAll
Regex.FindAllString
Regex.FindIndex
Regex.FindString
Regex.FindStringSubmatch
Regex.FindSubmatch
Regex.NumSubexp
Regex.SubexpNames
Regex.SubexpNames (Matching)

Constants ¶

This section is empty.

Variables ¶

This section is empty.

Functions ¶

func DefaultConfig ¶

func DefaultConfig() meta.Config

DefaultConfig returns the default configuration for compilation.

Users can customize this and pass to CompileWithConfig.

Example:

config := coregex.DefaultConfig()
config.EnableDFA = false // Use NFA only
re, _ := coregex.CompileWithConfig("pattern", config)

func QuoteMeta ¶ added in v0.8.2

func QuoteMeta(s string) string

QuoteMeta returns a string that escapes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.

Example:

escaped := coregex.QuoteMeta("hello.world")
// escaped = "hello\\.world"
re := coregex.MustCompile(escaped)
re.MatchString("hello.world") // true

Types ¶

type Regex ¶

type Regex struct {
	// contains filtered or unexported fields
}

Regex represents a compiled regular expression.

A Regex is safe to use concurrently from multiple goroutines, except for methods that modify internal state (like ResetStats).

Example:

re := coregex.MustCompile(`hello`)
if re.Match([]byte("hello world")) {
    println("matched!")
}

func Compile ¶

func Compile(pattern string) (*Regex, error)

Compile compiles a regular expression pattern.

Syntax is Perl-compatible (same as Go's stdlib regexp). Returns an error if the pattern is invalid.

Example:

re, err := coregex.Compile(`\d{3}-\d{4}`)
if err != nil {
    log.Fatal(err)
}

Example ¶

ExampleCompile demonstrates basic pattern compilation and matching.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re, err := coregex.Compile(`\d+`)
	if err != nil {
		panic(err)
	}

	fmt.Println(re.Match([]byte("hello 123")))
}

Output:

true

func CompileWithConfig ¶

func CompileWithConfig(pattern string, config meta.Config) (*Regex, error)

CompileWithConfig compiles a pattern with custom configuration.

This allows fine-tuning of performance characteristics.

Example:

config := coregex.DefaultConfig()
config.MaxDFAStates = 100000 // Larger cache
re, err := coregex.CompileWithConfig("(a|b|c)*", config)

Example ¶

ExampleCompileWithConfig demonstrates custom configuration.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	config := coregex.DefaultConfig()
	config.MaxDFAStates = 50000 // Increase cache size

	re, err := coregex.CompileWithConfig("(a|b|c)*", config)
	if err != nil {
		panic(err)
	}

	fmt.Println(re.MatchString("abcabc"))
}

Output:

true

func MustCompile ¶

func MustCompile(pattern string) *Regex

MustCompile compiles a regular expression pattern and panics if it fails.

This is useful for patterns known to be valid at compile time.

Example:

var emailRegex = coregex.MustCompile(`[a-z]+@[a-z]+\.[a-z]+`)

Example ¶

ExampleMustCompile demonstrates panic-on-error compilation.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`hello`)
	fmt.Println(re.MatchString("hello world"))
}

Output:

true

func (*Regex) Count ¶ added in v0.4.0

func (r *Regex) Count(b []byte, n int) int

Count returns the number of non-overlapping matches of the pattern in b. If n > 0, counts at most n matches. If n <= 0, counts all matches.

This is optimized for counting without building result slices.

Example:

re := coregex.MustCompile(`\d+`)
count := re.Count([]byte("1 2 3 4 5"), -1)
// count == 5

func (*Regex) CountString ¶ added in v0.4.0

func (r *Regex) CountString(s string, n int) int

CountString returns the number of non-overlapping matches of the pattern in s. If n > 0, counts at most n matches. If n <= 0, counts all matches.

Example:

re := coregex.MustCompile(`\d+`)
count := re.CountString("1 2 3 4 5", -1)
// count == 5

func (*Regex) Find ¶

func (r *Regex) Find(b []byte) []byte

Find returns a slice holding the text of the leftmost match in b. Returns nil if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
match := re.Find([]byte("age: 42"))
println(string(match)) // "42"

Example ¶

ExampleRegex_Find demonstrates finding the first match.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\d+`)
	match := re.Find([]byte("age: 42 years"))
	fmt.Println(string(match))
}

Output:

42

func (*Regex) FindAll ¶

func (r *Regex) FindAll(b []byte, n int) [][]byte

FindAll returns a slice of all successive matches of the pattern in b. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
matches := re.FindAll([]byte("1 2 3"), -1)
// matches = [[]byte("1"), []byte("2"), []byte("3")]

Example ¶

ExampleRegex_FindAll demonstrates finding all matches.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\d`)
	matches := re.FindAll([]byte("a1b2c3"), -1)
	for _, m := range matches {
		fmt.Print(string(m), " ")
	}
	fmt.Println()
}

Output:

1 2 3

func (*Regex) FindAllIndex ¶ added in v0.3.0

func (r *Regex) FindAllIndex(b []byte, n int) [][]int

FindAllIndex returns a slice of all successive matches of the pattern in b, as index pairs [start, end]. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
indices := re.FindAllIndex([]byte("1 2 3"), -1)
// indices = [[0,1], [2,3], [4,5]]

func (*Regex) FindAllString ¶

func (r *Regex) FindAllString(s string, n int) []string

FindAllString returns a slice of all successive matches of the pattern in s. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
matches := re.FindAllString("1 2 3", -1)
// matches = ["1", "2", "3"]

Example ¶

ExampleRegex_FindAllString demonstrates finding all string matches.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\w+`)
	words := re.FindAllString("hello world test", -1)
	for _, word := range words {
		fmt.Print(word, " ")
	}
	fmt.Println()
}

Output:

hello world test

func (*Regex) FindAllStringIndex ¶ added in v0.3.0

func (r *Regex) FindAllStringIndex(s string, n int) [][]int

FindAllStringIndex returns a slice of all successive matches of the pattern in s, as index pairs [start, end]. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
indices := re.FindAllStringIndex("1 2 3", -1)
// indices = [[0,1], [2,3], [4,5]]

func (*Regex) FindAllStringSubmatch ¶ added in v0.4.0

func (r *Regex) FindAllStringSubmatch(s string, n int) [][]string

FindAllStringSubmatch returns a slice of all successive matches of the pattern in s, where each match includes all capture groups as strings. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
matches := re.FindAllStringSubmatch("a@b.c x@y.z", -1)
// len(matches) == 2
// matches[0][0] = "a@b.c"
// matches[0][1] = "a"

func (*Regex) FindAllStringSubmatchIndex ¶ added in v0.4.0

func (r *Regex) FindAllStringSubmatchIndex(s string, n int) [][]int

FindAllStringSubmatchIndex returns a slice of all successive matches of the pattern in s, where each match includes index pairs for all capture groups. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
indices := re.FindAllStringSubmatchIndex("a@b.c x@y.z", -1)

func (*Regex) FindAllSubmatch ¶ added in v0.4.0

func (r *Regex) FindAllSubmatch(b []byte, n int) [][][]byte

FindAllSubmatch returns a slice of all successive matches of the pattern in b, where each match includes all capture groups. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
matches := re.FindAllSubmatch([]byte("a@b.c x@y.z"), -1)
// len(matches) == 2
// matches[0][0] = "a@b.c"
// matches[0][1] = "a"

func (*Regex) FindAllSubmatchIndex ¶ added in v0.4.0

func (r *Regex) FindAllSubmatchIndex(b []byte, n int) [][]int

FindAllSubmatchIndex returns a slice of all successive matches of the pattern in b, where each match includes index pairs for all capture groups. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
indices := re.FindAllSubmatchIndex([]byte("a@b.c x@y.z"), -1)
// len(indices) == 2
// indices[0] contains start/end pairs for each group

func (*Regex) FindIndex ¶

func (r *Regex) FindIndex(b []byte) []int

FindIndex returns a two-element slice of integers defining the location of the leftmost match in b. The match is at b[loc[0]:loc[1]]. Returns nil if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
loc := re.FindIndex([]byte("age: 42"))
println(loc[0], loc[1]) // 5, 7

Example ¶

ExampleRegex_FindIndex demonstrates finding match positions.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\d+`)
	loc := re.FindIndex([]byte("age: 42"))
	fmt.Printf("Match at [%d:%d]\n", loc[0], loc[1])
}

Output:

Match at [5:7]

func (*Regex) FindString ¶

func (r *Regex) FindString(s string) string

FindString returns a string holding the text of the leftmost match in s. Returns empty string if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
match := re.FindString("age: 42")
println(match) // "42"

Example ¶

ExampleRegex_FindString demonstrates finding a match in a string.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\w+@\w+\.\w+`)
	email := re.FindString("Contact: user@example.com")
	fmt.Println(email)
}

Output:

user@example.com

func (*Regex) FindStringIndex ¶

func (r *Regex) FindStringIndex(s string) []int

FindStringIndex returns a two-element slice of integers defining the location of the leftmost match in s. The match is at s[loc[0]:loc[1]]. Returns nil if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
loc := re.FindStringIndex("age: 42")
println(loc[0], loc[1]) // 5, 7

func (*Regex) FindStringSubmatch ¶ added in v0.2.0

func (r *Regex) FindStringSubmatch(s string) []string

FindStringSubmatch returns a slice of strings holding the text of the leftmost match and the matches of all capture groups.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
match := re.FindStringSubmatch("user@example.com")
// match[0] = "user@example.com"
// match[1] = "user"

Example ¶

ExampleRegex_FindStringSubmatch demonstrates capture groups with strings.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
	match := re.FindStringSubmatch("Date: 2024-12-25")
	if match != nil {
		fmt.Printf("Year: %s, Month: %s, Day: %s\n", match[1], match[2], match[3])
	}
}

Output:

Year: 2024, Month: 12, Day: 25

func (*Regex) FindStringSubmatchIndex ¶ added in v0.2.0

func (r *Regex) FindStringSubmatchIndex(s string) []int

FindStringSubmatchIndex returns the index pairs for the leftmost match and capture groups. Same as FindSubmatchIndex but for strings.

func (*Regex) FindSubmatch ¶ added in v0.2.0

func (r *Regex) FindSubmatch(b []byte) [][]byte

FindSubmatch returns a slice holding the text of the leftmost match and the matches of all capture groups.

A return value of nil indicates no match. Result[0] is the entire match, result[i] is the ith capture group. Unmatched groups will be nil.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
match := re.FindSubmatch([]byte("user@example.com"))
// match[0] = "user@example.com"
// match[1] = "user"
// match[2] = "example"
// match[3] = "com"

Example ¶

ExampleRegex_FindSubmatch demonstrates capture group extraction.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
	match := re.FindSubmatch([]byte("Contact: user@example.com"))
	if match != nil {
		fmt.Println("Full match:", string(match[0]))
		fmt.Println("User:", string(match[1]))
		fmt.Println("Domain:", string(match[2]))
		fmt.Println("TLD:", string(match[3]))
	}
}

Output:

Full match: user@example.com
User: user
Domain: example
TLD: com

func (*Regex) FindSubmatchIndex ¶ added in v0.2.0

func (r *Regex) FindSubmatchIndex(b []byte) []int

FindSubmatchIndex returns a slice holding the index pairs for the leftmost match and the matches of all capture groups.

A return value of nil indicates no match. Result[2*i:2*i+2] is the indices for the ith group. Unmatched groups have -1 indices.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
idx := re.FindSubmatchIndex([]byte("user@example.com"))
// idx[0:2] = indices for entire match
// idx[2:4] = indices for first capture group

func (*Regex) Longest ¶ added in v0.8.2

func (r *Regex) Longest()

Longest makes future searches prefer the longest match.

Note: coregex already uses leftmost-longest semantics by default for DFA-based matching, so this method is provided for API compatibility with stdlib regexp. It returns the receiver to allow method chaining.

Example:

re := coregex.MustCompile(`a+`)
re.Longest()

func (*Regex) Match ¶

func (r *Regex) Match(b []byte) bool

Match reports whether the byte slice b contains any match of the pattern.

Example:

re := coregex.MustCompile(`\d+`)
if re.Match([]byte("hello 123")) {
    println("contains digits")
}

func (*Regex) MatchString ¶

func (r *Regex) MatchString(s string) bool

MatchString reports whether the string s contains any match of the pattern.

Example:

re := coregex.MustCompile(`hello`)
if re.MatchString("hello world") {
    println("matched!")
}

func (*Regex) NumSubexp ¶ added in v0.2.0

func (r *Regex) NumSubexp() int

NumSubexp returns the number of parenthesized subexpressions (capture groups). Group 0 is the entire match, so the returned value equals the number of explicit capture groups plus 1.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
println(re.NumSubexp()) // 4 (entire match + 3 groups)

Example ¶

ExampleRegex_NumSubexp demonstrates counting capture groups.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
	fmt.Println("Number of groups:", re.NumSubexp())
}

Output:

Number of groups: 4

func (*Regex) ReplaceAll ¶ added in v0.3.0

func (r *Regex) ReplaceAll(src, repl []byte) []byte

ReplaceAll returns a copy of src, replacing matches of the pattern with the replacement bytes repl. Inside repl, $ signs are interpreted as in Regexp.Expand: $0 is the entire match, $1 is the first capture group, etc.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
result := re.ReplaceAll([]byte("user@example.com"), []byte("$1 at $2 dot $3"))
// result = []byte("user at example dot com")

func (*Regex) ReplaceAllFunc ¶ added in v0.3.0

func (r *Regex) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte

ReplaceAllFunc returns a copy of src in which all matches of the pattern have been replaced by the return value of function repl applied to the matched byte slice. The replacement returned by repl is substituted directly, without using Expand.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllFunc([]byte("1 2 3"), func(s []byte) []byte {
    n, _ := strconv.Atoi(string(s))
    return []byte(strconv.Itoa(n * 2))
})
// result = []byte("2 4 6")

func (*Regex) ReplaceAllLiteral ¶ added in v0.3.0

func (r *Regex) ReplaceAllLiteral(src, repl []byte) []byte

ReplaceAllLiteral returns a copy of src, replacing matches of the pattern with the replacement bytes repl. The replacement is substituted directly, without expanding $ variables.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllLiteral([]byte("age: 42"), []byte("XX"))
// result = []byte("age: XX")

func (*Regex) ReplaceAllLiteralString ¶ added in v0.3.0

func (r *Regex) ReplaceAllLiteralString(src, repl string) string

ReplaceAllLiteralString returns a copy of src, replacing matches of the pattern with the replacement string repl. The replacement is substituted directly, without expanding $ variables.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllLiteralString("age: 42", "XX")
// result = "age: XX"

func (*Regex) ReplaceAllString ¶ added in v0.3.0

func (r *Regex) ReplaceAllString(src, repl string) string

ReplaceAllString returns a copy of src, replacing matches of the pattern with the replacement string repl. Inside repl, $ signs are interpreted as in Regexp.Expand: $0 is the entire match, $1 is the first capture group, etc.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
result := re.ReplaceAllString("user@example.com", "$1 at $2 dot $3")
// result = "user at example dot com"

func (*Regex) ReplaceAllStringFunc ¶ added in v0.3.0

func (r *Regex) ReplaceAllStringFunc(src string, repl func(string) string) string

ReplaceAllStringFunc returns a copy of src in which all matches of the pattern have been replaced by the return value of function repl applied to the matched string. The replacement returned by repl is substituted directly, without using Expand.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllStringFunc("1 2 3", func(s string) string {
    n, _ := strconv.Atoi(s)
    return strconv.Itoa(n * 2)
})
// result = "2 4 6"

func (*Regex) Split ¶ added in v0.3.0

func (r *Regex) Split(s string, n int) []string

Split slices s into substrings separated by the expression and returns a slice of the substrings between those expression matches.

The slice returned by this method consists of all the substrings of s not contained in the slice returned by FindAllString. When called on an expression that contains no metacharacters, it is equivalent to strings.SplitN.

The count determines the number of substrings to return:

n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings

Example:

re := coregex.MustCompile(`,`)
parts := re.Split("a,b,c", -1)
// parts = ["a", "b", "c"]

parts = re.Split("a,b,c", 2)
// parts = ["a", "b,c"]

func (*Regex) String ¶

func (r *Regex) String() string

String returns the source text used to compile the regular expression.

Example:

re := coregex.MustCompile(`\d+`)
println(re.String()) // `\d+`

func (*Regex) SubexpNames ¶ added in v0.5.0

func (r *Regex) SubexpNames() []string

SubexpNames returns the names of the parenthesized subexpressions in this Regex. The name for the first sub-expression is names[1], so that if m is a match slice, the name for m[i] is SubexpNames()[i]. Since the Regexp as a whole cannot be named, names[0] is always the empty string. The slice returned is shared and must not be modified.

Example:

re := coregex.MustCompile(`(?P<year>\d+)-(?P<month>\d+)`)
names := re.SubexpNames()
// names[0] = ""
// names[1] = "year"
// names[2] = "month"

Example ¶

ExampleRegex_SubexpNames demonstrates named capture groups

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	// Pattern with named and unnamed captures
	re := coregex.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(\d{2})`)

	// Get capture group names
	names := re.SubexpNames()
	fmt.Printf("Capture groups: %d\n", re.NumSubexp())
	fmt.Printf("Group 0 (full match): %q\n", names[0])
	fmt.Printf("Group 1 (year): %q\n", names[1])
	fmt.Printf("Group 2 (month): %q\n", names[2])
	fmt.Printf("Group 3 (day, unnamed): %q\n", names[3])

}

Output:

Capture groups: 4
Group 0 (full match): ""
Group 1 (year): "year"
Group 2 (month): "month"
Group 3 (day, unnamed): ""

Example (Matching) ¶

ExampleRegex_SubexpNames_matching shows using SubexpNames with matches

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	// Compile pattern with named captures
	re := coregex.MustCompile(`(?P<protocol>https?)://(?P<domain>\w+)`)

	// Find match and get submatch values
	match := re.FindStringSubmatch("Visit https://example for more")
	names := re.SubexpNames()

	// Print matches with their names
	for i, name := range names {
		if i < len(match) && match[i] != "" {
			if name != "" {
				fmt.Printf("%s: %s\n", name, match[i])
			} else if i == 0 {
				fmt.Printf("Full match: %s\n", match[i])
			}
		}
	}

}

Output:

Full match: https://example
protocol: https
domain: example

type Regexp ¶ added in v0.8.1

type Regexp = Regex

Regexp is an alias for Regex to provide drop-in compatibility with stdlib regexp. This allows replacing `import "regexp"` with `import regexp "github.com/coregx/coregex"` without changing type names in existing code.

Example:

import regexp "github.com/coregx/coregex"

var re *regexp.Regexp = regexp.MustCompile(`\d+`)

Source Files ¶

View all Source files

regex.go

Directories ¶

Path	Synopsis
dfa
lazy Package lazy implements a Lazy DFA (Deterministic Finite Automaton) engine for regex matching.	Package lazy implements a Lazy DFA (Deterministic Finite Automaton) engine for regex matching.
onepass Package onepass implements a one-pass DFA for regex patterns that have no ambiguity in their matching paths.	Package onepass implements a one-pass DFA for regex patterns that have no ambiguity in their matching paths.
internal
sparse Package sparse provides a sparse set data structure for efficient membership testing.	Package sparse provides a sparse set data structure for efficient membership testing.
literal Package literal provides types and operations for extracting literal sequences from regex patterns for prefilter optimization.	Package literal provides types and operations for extracting literal sequences from regex patterns for prefilter optimization.
meta Package meta implements the meta-engine orchestrator that automatically selects the optimal regex execution strategy.	Package meta implements the meta-engine orchestrator that automatically selects the optimal regex execution strategy.
nfa Package nfa provides a Thompson NFA (Non-deterministic Finite Automaton) implementation for regex matching.	Package nfa provides a Thompson NFA (Non-deterministic Finite Automaton) implementation for regex matching.
prefilter Package prefilter provides fast candidate filtering for regex search using extracted literal sequences.	Package prefilter provides fast candidate filtering for regex search using extracted literal sequences.
simd Package simd provides SIMD-accelerated string operations for high-performance byte searching.	Package simd provides SIMD-accelerated string operations for high-performance byte searching.

?	: This menu
/	: Search site
f or F	: Jump to
y or Y	: Canonical URL

README ¶

coregex - Production-Grade Regex Engine for Go

Features

Installation

Quick Start

Basic Usage

Advanced Configuration

Performance Example

Performance Benchmarks

Supported Features

Current Features

Regex Syntax

Known Limitations

Documentation

Development

Building

Testing

Pre-release Check

Contributing

Comparison with Other Libraries

Architecture Overview

Related Projects

License

Acknowledgments

Support

Documentation ¶

Overview ¶

Index ¶

Examples ¶

Constants ¶

Variables ¶

Functions ¶

func DefaultConfig ¶

func QuoteMeta ¶ added in v0.8.2

Types ¶

type Regex ¶

func Compile ¶

func CompileWithConfig ¶

func MustCompile ¶

func (*Regex) Count ¶ added in v0.4.0

func (*Regex) CountString ¶ added in v0.4.0

func (*Regex) Find ¶

func (*Regex) FindAll ¶

func (*Regex) FindAllIndex ¶ added in v0.3.0

func (*Regex) FindAllString ¶

func (*Regex) FindAllStringIndex ¶ added in v0.3.0

func (*Regex) FindAllStringSubmatch ¶ added in v0.4.0

func (*Regex) FindAllStringSubmatchIndex ¶ added in v0.4.0

func (*Regex) FindAllSubmatch ¶ added in v0.4.0

func (*Regex) FindAllSubmatchIndex ¶ added in v0.4.0

func (*Regex) FindIndex ¶

func (*Regex) FindString ¶

func (*Regex) FindStringIndex ¶

func (*Regex) FindStringSubmatch ¶ added in v0.2.0

func (*Regex) FindStringSubmatchIndex ¶ added in v0.2.0

func (*Regex) FindSubmatch ¶ added in v0.2.0

func (*Regex) FindSubmatchIndex ¶ added in v0.2.0

func (*Regex) Longest ¶ added in v0.8.2

func (*Regex) Match ¶

func (*Regex) MatchString ¶

func (*Regex) NumSubexp ¶ added in v0.2.0

func (*Regex) ReplaceAll ¶ added in v0.3.0

func (*Regex) ReplaceAllFunc ¶ added in v0.3.0

func (*Regex) ReplaceAllLiteral ¶ added in v0.3.0

func (*Regex) ReplaceAllLiteralString ¶ added in v0.3.0

func (*Regex) ReplaceAllString ¶ added in v0.3.0

func (*Regex) ReplaceAllStringFunc ¶ added in v0.3.0

func (*Regex) Split ¶ added in v0.3.0

func (*Regex) String ¶

func (*Regex) SubexpNames ¶ added in v0.5.0

type Regexp ¶ added in v0.8.1

Source Files ¶

Directories ¶