coregex

package module
v0.10.6 Latest Latest
Warning

This package is not in the latest version of its module.

Go to latest
Published: Jan 14, 2026 License: MIT Imports: 2 Imported by: 0

README

coregex

GitHub Release Go Version Go Reference CI Go Report Card License GitHub Stars GitHub Issues GitHub Discussions

High-performance regex engine for Go. Drop-in replacement for regexp with 3-3000x speedup.*

* Typical speedup 15-240x on real-world patterns. 1000x+ achieved on specific edge cases where prefilters skip entire input (e.g., IP pattern on text with no digits).

Why coregex?

Go's stdlib regexp is intentionally simple — single NFA engine, no optimizations. This guarantees O(n) time but leaves performance on the table.

coregex brings Rust regex-crate architecture to Go:

  • Multi-engine: Lazy DFA, PikeVM, OnePass, BoundedBacktracker
  • SIMD prefilters: AVX2/SSSE3 for fast candidate rejection
  • Reverse search: Suffix/inner literal patterns run 1000x+ faster
  • O(n) guarantee: No backtracking, no ReDoS vulnerabilities

Installation

go get github.com/coregx/coregex

Requires Go 1.25+. Minimal dependencies (golang.org/x/sys, github.com/coregx/ahocorasick).

Quick Start

package main

import (
    "fmt"
    "github.com/coregx/coregex"
)

func main() {
    re := coregex.MustCompile(`\w+@\w+\.\w+`)

    text := []byte("Contact support@example.com for help")

    // Find first match
    fmt.Printf("Found: %s\n", re.Find(text))

    // Check if matches (zero allocation)
    if re.MatchString("test@email.com") {
        fmt.Println("Valid email format")
    }
}

Performance

Cross-language benchmarks on 6MB input (source):

Pattern Go stdlib coregex vs stdlib
IP validation 600 ms 5 ms 120x
Inner .*keyword.* 408 ms 3 ms 136x
Suffix .*\.txt 441 ms 2 ms 220x
Literal alternation 435 ms 29 ms 15x
Email validation 352 ms 2 ms 176x
URL extraction 319 ms 2 ms 160x
Char class [\w]+ 932 ms 113 ms 8x

Where coregex excels:

  • IP/phone patterns (\d+\.\d+\.\d+\.\d+) — SIMD digit prefilter skips non-digit regions
  • Suffix patterns (.*\.log, .*\.txt) — reverse search optimization (1000x+)
  • Inner literals (.*error.*, .*@example\.com) — bidirectional DFA (900x+)
  • Multi-pattern (foo|bar|baz|...) — Slim Teddy (≤32), Fat Teddy (33-64), or Aho-Corasick (>64)
  • Anchored alternations (^(\d+|UUID|hex32)) — O(1) branch dispatch (5-20x)
  • Concatenated char classes ([a-zA-Z]+[0-9]+) — DFA with byte classes (25x)

Features

Engine Selection

coregex automatically selects the optimal engine:

Strategy Pattern Type Speedup
ReverseInner .*keyword.* 100-900x
ReverseSuffix .*\.txt 100-1100x
BranchDispatch ^(\d+|UUID|hex32) 5-20x
CompositeSequenceDFA [a-zA-Z]+[0-9]+ 25x
LazyDFA IP, complex patterns 10-150x
AhoCorasick a|b|c|...|z (>64 patterns) 75-113x
CharClassSearcher [\w]+, \d+ 4-25x
Slim Teddy foo|bar|baz (2-32 patterns) 15-240x
Fat Teddy 33-64 patterns 60-73x
OnePass Anchored captures 10x
BoundedBacktracker Small patterns 2-5x
API Compatibility

Drop-in replacement for regexp.Regexp:

// stdlib
re := regexp.MustCompile(pattern)

// coregex — same API
re := coregex.MustCompile(pattern)

Supported methods:

  • Match, MatchString, MatchReader
  • Find, FindString, FindAll, FindAllString
  • FindIndex, FindStringIndex, FindAllIndex
  • FindSubmatch, FindStringSubmatch, FindAllSubmatch
  • ReplaceAll, ReplaceAllString, ReplaceAllFunc
  • Split, SubexpNames, NumSubexp
  • Longest, Copy, String
Zero-Allocation APIs
// Zero allocations — returns bool
matched := re.IsMatch(text)

// Zero allocations — returns (start, end, found)
start, end, found := re.FindIndices(text)
Configuration
config := coregex.DefaultConfig()
config.DFAMaxStates = 10000      // Limit DFA cache
config.EnablePrefilter = true    // SIMD acceleration

re, err := coregex.CompileWithConfig(pattern, config)
Thread Safety

A compiled *Regexp is safe for concurrent use by multiple goroutines:

re := coregex.MustCompile(`\d+`)

// Safe: multiple goroutines sharing one compiled pattern
var wg sync.WaitGroup
for i := 0; i < 100; i++ {
    wg.Add(1)
    go func() {
        defer wg.Done()
        re.FindString("test 123 data")  // thread-safe
    }()
}
wg.Wait()

Internally uses sync.Pool (same pattern as Go stdlib regexp) for per-search state management.

Syntax Support

Uses Go's regexp/syntax parser:

Feature Support
Character classes [a-z], \d, \w, \s
Quantifiers *, +, ?, {n,m}
Anchors ^, $, \b, \B
Groups (...), (?:...), (?P<name>...)
Unicode \p{L}, \P{N}
Flags (?i), (?m), (?s)
Backreferences Not supported (O(n) guarantee)

Architecture

Pattern → Parse → NFA → Literal Extract → Strategy Select
                                               ↓
                         ┌─────────────────────────────────┐
                         │ Engines (13 strategies):        │
                         │  LazyDFA, PikeVM, OnePass,      │
                         │  BoundedBacktracker,            │
                         │  ReverseInner, ReverseSuffix,   │
                         │  ReverseSuffixSet,              │
                         │  CharClassSearcher, Teddy,      │
                         │  DigitPrefilter, AhoCorasick    │
                         └─────────────────────────────────┘
                                               ↓
Input → Prefilter (SIMD) → Engine → Match Result

SIMD Primitives (AMD64):

  • memchr — single byte search (AVX2)
  • memmem — substring search (SSSE3)
  • Slim Teddy — multi-pattern search, 2-32 patterns (SSSE3, 9+ GB/s)
  • Fat Teddy — multi-pattern search, 33-64 patterns (AVX2, 9+ GB/s)

Pure Go fallback on other architectures.

Battle-Tested

coregex was tested in GoAWK. This real-world testing uncovered 15+ edge cases that synthetic benchmarks missed.

Powered by coregex: uawk

uawk is a modern AWK interpreter built on coregex:

Benchmark (10MB) GoAWK uawk Speedup
Regex alternation 1.85s 97ms 19x
IP matching 290ms 99ms 2.9x
General regex 320ms 100ms 3.2x
go install github.com/kolkov/uawk/cmd/uawk@latest
uawk '/error/ { print $0 }' server.log

We need more testers! If you have a project using regexp, try coregex and report issues.

Documentation

Comparison

coregex stdlib regexp2
Performance 3-3000x faster Baseline Slower
SIMD AVX2/SSSE3 No No
O(n) guarantee Yes Yes No
Backreferences No No Yes
API Drop-in Different

Use coregex for performance-critical code with O(n) guarantee. Use stdlib for simple cases where performance doesn't matter. Use regexp2 if you need backreferences (accept exponential worst-case).

Inspired by:

License

MIT — see LICENSE.


Status: Pre-1.0 (API may change). Ready for testing and feedback.

Releases · Issues · Discussions

Documentation

Overview

Package coregex provides a high-performance regex engine for Go.

coregex achieves 5-50x speedup over Go's stdlib regexp through:

  • Multi-engine architecture (NFA, Lazy DFA, prefilters)
  • SIMD-accelerated primitives (memchr, memmem, teddy)
  • Literal extraction and prefiltering
  • Automatic strategy selection

The public API is compatible with stdlib regexp where possible, making it easy to migrate existing code.

Basic usage:

// Compile a pattern
re, err := coregex.Compile(`\d+`)
if err != nil {
    log.Fatal(err)
}

// Find first match
match := re.Find([]byte("hello 123 world"))
fmt.Println(string(match)) // "123"

// Check if matches
if re.Match([]byte("hello 123")) {
    fmt.Println("matched!")
}

Advanced usage:

// Custom configuration
config := coregex.DefaultConfig()
config.MaxDFAStates = 50000
re, err := coregex.CompileWithConfig("(a|b|c)*", config)

Performance characteristics:

  • Patterns with literals: 5-50x faster (prefilter optimization)
  • Simple patterns: comparable to stdlib
  • Complex patterns: 2-10x faster (DFA avoids backtracking)
  • Worst case: guaranteed O(m*n) (ReDoS safe)

Limitations (v1.0):

  • No capture groups (coming in v1.1)
  • No replace functions (coming in v1.1)
  • No multiline/case-insensitive flags (coming in v1.1)

Index

Examples

Constants

This section is empty.

Variables

This section is empty.

Functions

func DefaultConfig

func DefaultConfig() meta.Config

DefaultConfig returns the default configuration for compilation.

Users can customize this and pass to CompileWithConfig.

Example:

config := coregex.DefaultConfig()
config.EnableDFA = false // Use NFA only
re, _ := coregex.CompileWithConfig("pattern", config)

func QuoteMeta added in v0.8.2

func QuoteMeta(s string) string

QuoteMeta returns a string that escapes all regular expression metacharacters inside the argument text; the returned string is a regular expression matching the literal text.

Example:

escaped := coregex.QuoteMeta("hello.world")
// escaped = "hello\\.world"
re := coregex.MustCompile(escaped)
re.MatchString("hello.world") // true

Types

type Regex

type Regex struct {
	// contains filtered or unexported fields
}

Regex represents a compiled regular expression.

A Regex is safe to use concurrently from multiple goroutines, except for methods that modify internal state (like ResetStats).

Example:

re := coregex.MustCompile(`hello`)
if re.Match([]byte("hello world")) {
    println("matched!")
}

func Compile

func Compile(pattern string) (*Regex, error)

Compile compiles a regular expression pattern.

Syntax is Perl-compatible (same as Go's stdlib regexp). Returns an error if the pattern is invalid.

Example:

re, err := coregex.Compile(`\d{3}-\d{4}`)
if err != nil {
    log.Fatal(err)
}
Example

ExampleCompile demonstrates basic pattern compilation and matching.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re, err := coregex.Compile(`\d+`)
	if err != nil {
		panic(err)
	}

	fmt.Println(re.Match([]byte("hello 123")))
}
Output:

true

func CompileWithConfig

func CompileWithConfig(pattern string, config meta.Config) (*Regex, error)

CompileWithConfig compiles a pattern with custom configuration.

This allows fine-tuning of performance characteristics.

Example:

config := coregex.DefaultConfig()
config.MaxDFAStates = 100000 // Larger cache
re, err := coregex.CompileWithConfig("(a|b|c)*", config)
Example

ExampleCompileWithConfig demonstrates custom configuration.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	config := coregex.DefaultConfig()
	config.MaxDFAStates = 50000 // Increase cache size

	re, err := coregex.CompileWithConfig("(a|b|c)*", config)
	if err != nil {
		panic(err)
	}

	fmt.Println(re.MatchString("abcabc"))
}
Output:

true

func MustCompile

func MustCompile(pattern string) *Regex

MustCompile compiles a regular expression pattern and panics if it fails.

This is useful for patterns known to be valid at compile time.

Example:

var emailRegex = coregex.MustCompile(`[a-z]+@[a-z]+\.[a-z]+`)
Example

ExampleMustCompile demonstrates panic-on-error compilation.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`hello`)
	fmt.Println(re.MatchString("hello world"))
}
Output:

true

func (*Regex) Count added in v0.4.0

func (r *Regex) Count(b []byte, n int) int

Count returns the number of non-overlapping matches of the pattern in b. If n > 0, counts at most n matches. If n <= 0, counts all matches.

This is optimized for counting without building result slices.

Example:

re := coregex.MustCompile(`\d+`)
count := re.Count([]byte("1 2 3 4 5"), -1)
// count == 5

func (*Regex) CountString added in v0.4.0

func (r *Regex) CountString(s string, n int) int

CountString returns the number of non-overlapping matches of the pattern in s. If n > 0, counts at most n matches. If n <= 0, counts all matches.

Example:

re := coregex.MustCompile(`\d+`)
count := re.CountString("1 2 3 4 5", -1)
// count == 5

func (*Regex) Find

func (r *Regex) Find(b []byte) []byte

Find returns a slice holding the text of the leftmost match in b. Returns nil if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
match := re.Find([]byte("age: 42"))
println(string(match)) // "42"
Example

ExampleRegex_Find demonstrates finding the first match.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\d+`)
	match := re.Find([]byte("age: 42 years"))
	fmt.Println(string(match))
}
Output:

42

func (*Regex) FindAll

func (r *Regex) FindAll(b []byte, n int) [][]byte

FindAll returns a slice of all successive matches of the pattern in b. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
matches := re.FindAll([]byte("1 2 3"), -1)
// matches = [[]byte("1"), []byte("2"), []byte("3")]
Example

ExampleRegex_FindAll demonstrates finding all matches.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\d`)
	matches := re.FindAll([]byte("a1b2c3"), -1)
	for _, m := range matches {
		fmt.Print(string(m), " ")
	}
	fmt.Println()
}
Output:

1 2 3

func (*Regex) FindAllIndex added in v0.3.0

func (r *Regex) FindAllIndex(b []byte, n int) [][]int

FindAllIndex returns a slice of all successive matches of the pattern in b, as index pairs [start, end]. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
indices := re.FindAllIndex([]byte("1 2 3"), -1)
// indices = [[0,1], [2,3], [4,5]]

func (*Regex) FindAllIndexCompact added in v0.10.6

func (r *Regex) FindAllIndexCompact(b []byte, n int, results [][2]int) [][2]int

FindAllIndexCompact returns all successive matches as a compact [][2]int slice. This is a zero-allocation API (single allocation for the result slice). Unlike FindAllIndex which returns [][]int (N allocations for N matches), this method pre-allocates the entire result in one contiguous block.

Performance: ~2x fewer allocations than FindAllIndex for high match counts.

If n > 0, it returns at most n matches. If n <= 0, it returns all matches. The optional 'results' slice can be provided for reuse (set to nil for fresh allocation).

Example:

re := coregex.MustCompile(`\d+`)
indices := re.FindAllIndexCompact([]byte("a1b2c3"), -1, nil)
// indices = [[1,2], [3,4], [5,6]]

func (*Regex) FindAllString

func (r *Regex) FindAllString(s string, n int) []string

FindAllString returns a slice of all successive matches of the pattern in s. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
matches := re.FindAllString("1 2 3", -1)
// matches = ["1", "2", "3"]
Example

ExampleRegex_FindAllString demonstrates finding all string matches.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\w+`)
	words := re.FindAllString("hello world test", -1)
	for _, word := range words {
		fmt.Print(word, " ")
	}
	fmt.Println()
}
Output:

hello world test

func (*Regex) FindAllStringIndex added in v0.3.0

func (r *Regex) FindAllStringIndex(s string, n int) [][]int

FindAllStringIndex returns a slice of all successive matches of the pattern in s, as index pairs [start, end]. If n > 0, it returns at most n matches. If n <= 0, it returns all matches.

Example:

re := coregex.MustCompile(`\d+`)
indices := re.FindAllStringIndex("1 2 3", -1)
// indices = [[0,1], [2,3], [4,5]]

func (*Regex) FindAllStringIndexCompact added in v0.10.6

func (r *Regex) FindAllStringIndexCompact(s string, n int, results [][2]int) [][2]int

FindAllStringIndexCompact returns all successive matches as a compact [][2]int slice. This is the string version of FindAllIndexCompact.

func (*Regex) FindAllStringSubmatch added in v0.4.0

func (r *Regex) FindAllStringSubmatch(s string, n int) [][]string

FindAllStringSubmatch returns a slice of all successive matches of the pattern in s, where each match includes all capture groups as strings. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
matches := re.FindAllStringSubmatch("a@b.c x@y.z", -1)
// len(matches) == 2
// matches[0][0] = "a@b.c"
// matches[0][1] = "a"

func (*Regex) FindAllStringSubmatchIndex added in v0.4.0

func (r *Regex) FindAllStringSubmatchIndex(s string, n int) [][]int

FindAllStringSubmatchIndex returns a slice of all successive matches of the pattern in s, where each match includes index pairs for all capture groups. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
indices := re.FindAllStringSubmatchIndex("a@b.c x@y.z", -1)

func (*Regex) FindAllSubmatch added in v0.4.0

func (r *Regex) FindAllSubmatch(b []byte, n int) [][][]byte

FindAllSubmatch returns a slice of all successive matches of the pattern in b, where each match includes all capture groups. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
matches := re.FindAllSubmatch([]byte("a@b.c x@y.z"), -1)
// len(matches) == 2
// matches[0][0] = "a@b.c"
// matches[0][1] = "a"

func (*Regex) FindAllSubmatchIndex added in v0.4.0

func (r *Regex) FindAllSubmatchIndex(b []byte, n int) [][]int

FindAllSubmatchIndex returns a slice of all successive matches of the pattern in b, where each match includes index pairs for all capture groups. If n > 0, returns at most n matches. If n <= 0, returns all matches.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
indices := re.FindAllSubmatchIndex([]byte("a@b.c x@y.z"), -1)
// len(indices) == 2
// indices[0] contains start/end pairs for each group

func (*Regex) FindIndex

func (r *Regex) FindIndex(b []byte) []int

FindIndex returns a two-element slice of integers defining the location of the leftmost match in b. The match is at b[loc[0]:loc[1]]. Returns nil if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
loc := re.FindIndex([]byte("age: 42"))
println(loc[0], loc[1]) // 5, 7
Example

ExampleRegex_FindIndex demonstrates finding match positions.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\d+`)
	loc := re.FindIndex([]byte("age: 42"))
	fmt.Printf("Match at [%d:%d]\n", loc[0], loc[1])
}
Output:

Match at [5:7]

func (*Regex) FindString

func (r *Regex) FindString(s string) string

FindString returns a string holding the text of the leftmost match in s. Returns empty string if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
match := re.FindString("age: 42")
println(match) // "42"
Example

ExampleRegex_FindString demonstrates finding a match in a string.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`\w+@\w+\.\w+`)
	email := re.FindString("Contact: user@example.com")
	fmt.Println(email)
}
Output:

user@example.com

func (*Regex) FindStringIndex

func (r *Regex) FindStringIndex(s string) []int

FindStringIndex returns a two-element slice of integers defining the location of the leftmost match in s. The match is at s[loc[0]:loc[1]]. Returns nil if no match is found.

Example:

re := coregex.MustCompile(`\d+`)
loc := re.FindStringIndex("age: 42")
println(loc[0], loc[1]) // 5, 7

func (*Regex) FindStringSubmatch added in v0.2.0

func (r *Regex) FindStringSubmatch(s string) []string

FindStringSubmatch returns a slice of strings holding the text of the leftmost match and the matches of all capture groups.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
match := re.FindStringSubmatch("user@example.com")
// match[0] = "user@example.com"
// match[1] = "user"
Example

ExampleRegex_FindStringSubmatch demonstrates capture groups with strings.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`(\d{4})-(\d{2})-(\d{2})`)
	match := re.FindStringSubmatch("Date: 2024-12-25")
	if match != nil {
		fmt.Printf("Year: %s, Month: %s, Day: %s\n", match[1], match[2], match[3])
	}
}
Output:

Year: 2024, Month: 12, Day: 25

func (*Regex) FindStringSubmatchIndex added in v0.2.0

func (r *Regex) FindStringSubmatchIndex(s string) []int

FindStringSubmatchIndex returns the index pairs for the leftmost match and capture groups. Same as FindSubmatchIndex but for strings.

func (*Regex) FindSubmatch added in v0.2.0

func (r *Regex) FindSubmatch(b []byte) [][]byte

FindSubmatch returns a slice holding the text of the leftmost match and the matches of all capture groups.

A return value of nil indicates no match. Result[0] is the entire match, result[i] is the ith capture group. Unmatched groups will be nil.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
match := re.FindSubmatch([]byte("user@example.com"))
// match[0] = "user@example.com"
// match[1] = "user"
// match[2] = "example"
// match[3] = "com"
Example

ExampleRegex_FindSubmatch demonstrates capture group extraction.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
	match := re.FindSubmatch([]byte("Contact: user@example.com"))
	if match != nil {
		fmt.Println("Full match:", string(match[0]))
		fmt.Println("User:", string(match[1]))
		fmt.Println("Domain:", string(match[2]))
		fmt.Println("TLD:", string(match[3]))
	}
}
Output:

Full match: user@example.com
User: user
Domain: example
TLD: com

func (*Regex) FindSubmatchIndex added in v0.2.0

func (r *Regex) FindSubmatchIndex(b []byte) []int

FindSubmatchIndex returns a slice holding the index pairs for the leftmost match and the matches of all capture groups.

A return value of nil indicates no match. Result[2*i:2*i+2] is the indices for the ith group. Unmatched groups have -1 indices.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
idx := re.FindSubmatchIndex([]byte("user@example.com"))
// idx[0:2] = indices for entire match
// idx[2:4] = indices for first capture group

func (*Regex) Longest added in v0.8.2

func (r *Regex) Longest()

Longest makes future searches prefer the leftmost-longest match.

By default, coregex uses leftmost-first (Perl) semantics where the first alternative in an alternation wins. After calling Longest(), coregex uses leftmost-longest (POSIX) semantics where the longest match wins.

Example:

re := coregex.MustCompile(`(a|ab)`)
re.FindString("ab")    // returns "a" (leftmost-first: first branch wins)

re.Longest()
re.FindString("ab")    // returns "ab" (leftmost-longest: longest wins)

Note: Unlike stdlib, calling Longest() modifies the regex state and should not be called concurrently with search methods.

func (*Regex) Match

func (r *Regex) Match(b []byte) bool

Match reports whether the byte slice b contains any match of the pattern.

Example:

re := coregex.MustCompile(`\d+`)
if re.Match([]byte("hello 123")) {
    println("contains digits")
}

func (*Regex) MatchString

func (r *Regex) MatchString(s string) bool

MatchString reports whether the string s contains any match of the pattern. This is a zero-allocation operation (like Rust's is_match).

Example:

re := coregex.MustCompile(`hello`)
if re.MatchString("hello world") {
    println("matched!")
}

func (*Regex) NumSubexp added in v0.2.0

func (r *Regex) NumSubexp() int

NumSubexp returns the number of parenthesized subexpressions (capture groups). Group 0 is the entire match, so the returned value equals the number of explicit capture groups plus 1.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
println(re.NumSubexp()) // 4 (entire match + 3 groups)
Example

ExampleRegex_NumSubexp demonstrates counting capture groups.

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
	fmt.Println("Number of groups:", re.NumSubexp())
}
Output:

Number of groups: 4

func (*Regex) ReplaceAll added in v0.3.0

func (r *Regex) ReplaceAll(src, repl []byte) []byte

ReplaceAll returns a copy of src, replacing matches of the pattern with the replacement bytes repl. Inside repl, $ signs are interpreted as in Regexp.Expand: $0 is the entire match, $1 is the first capture group, etc.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
result := re.ReplaceAll([]byte("user@example.com"), []byte("$1 at $2 dot $3"))
// result = []byte("user at example dot com")

func (*Regex) ReplaceAllFunc added in v0.3.0

func (r *Regex) ReplaceAllFunc(src []byte, repl func([]byte) []byte) []byte

ReplaceAllFunc returns a copy of src in which all matches of the pattern have been replaced by the return value of function repl applied to the matched byte slice. The replacement returned by repl is substituted directly, without using Expand.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllFunc([]byte("1 2 3"), func(s []byte) []byte {
    n, _ := strconv.Atoi(string(s))
    return []byte(strconv.Itoa(n * 2))
})
// result = []byte("2 4 6")

func (*Regex) ReplaceAllLiteral added in v0.3.0

func (r *Regex) ReplaceAllLiteral(src, repl []byte) []byte

ReplaceAllLiteral returns a copy of src, replacing matches of the pattern with the replacement bytes repl. The replacement is substituted directly, without expanding $ variables.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllLiteral([]byte("age: 42"), []byte("XX"))
// result = []byte("age: XX")

func (*Regex) ReplaceAllLiteralString added in v0.3.0

func (r *Regex) ReplaceAllLiteralString(src, repl string) string

ReplaceAllLiteralString returns a copy of src, replacing matches of the pattern with the replacement string repl. The replacement is substituted directly, without expanding $ variables.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllLiteralString("age: 42", "XX")
// result = "age: XX"

func (*Regex) ReplaceAllString added in v0.3.0

func (r *Regex) ReplaceAllString(src, repl string) string

ReplaceAllString returns a copy of src, replacing matches of the pattern with the replacement string repl. Inside repl, $ signs are interpreted as in Regexp.Expand: $0 is the entire match, $1 is the first capture group, etc.

Example:

re := coregex.MustCompile(`(\w+)@(\w+)\.(\w+)`)
result := re.ReplaceAllString("user@example.com", "$1 at $2 dot $3")
// result = "user at example dot com"

func (*Regex) ReplaceAllStringFunc added in v0.3.0

func (r *Regex) ReplaceAllStringFunc(src string, repl func(string) string) string

ReplaceAllStringFunc returns a copy of src in which all matches of the pattern have been replaced by the return value of function repl applied to the matched string. The replacement returned by repl is substituted directly, without using Expand.

Example:

re := coregex.MustCompile(`\d+`)
result := re.ReplaceAllStringFunc("1 2 3", func(s string) string {
    n, _ := strconv.Atoi(s)
    return strconv.Itoa(n * 2)
})
// result = "2 4 6"

func (*Regex) Split added in v0.3.0

func (r *Regex) Split(s string, n int) []string

Split slices s into substrings separated by the expression and returns a slice of the substrings between those expression matches.

The slice returned by this method consists of all the substrings of s not contained in the slice returned by FindAllString. When called on an expression that contains no metacharacters, it is equivalent to strings.SplitN.

The count determines the number of substrings to return:

n > 0: at most n substrings; the last substring will be the unsplit remainder.
n == 0: the result is nil (zero substrings)
n < 0: all substrings

Example:

re := coregex.MustCompile(`,`)
parts := re.Split("a,b,c", -1)
// parts = ["a", "b", "c"]

parts = re.Split("a,b,c", 2)
// parts = ["a", "b,c"]

func (*Regex) String

func (r *Regex) String() string

String returns the source text used to compile the regular expression.

Example:

re := coregex.MustCompile(`\d+`)
println(re.String()) // `\d+`

func (*Regex) SubexpNames added in v0.5.0

func (r *Regex) SubexpNames() []string

SubexpNames returns the names of the parenthesized subexpressions in this Regex. The name for the first sub-expression is names[1], so that if m is a match slice, the name for m[i] is SubexpNames()[i]. Since the Regexp as a whole cannot be named, names[0] is always the empty string. The slice returned is shared and must not be modified.

Example:

re := coregex.MustCompile(`(?P<year>\d+)-(?P<month>\d+)`)
names := re.SubexpNames()
// names[0] = ""
// names[1] = "year"
// names[2] = "month"
Example

ExampleRegex_SubexpNames demonstrates named capture groups

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	// Pattern with named and unnamed captures
	re := coregex.MustCompile(`(?P<year>\d{4})-(?P<month>\d{2})-(\d{2})`)

	// Get capture group names
	names := re.SubexpNames()
	fmt.Printf("Capture groups: %d\n", re.NumSubexp())
	fmt.Printf("Group 0 (full match): %q\n", names[0])
	fmt.Printf("Group 1 (year): %q\n", names[1])
	fmt.Printf("Group 2 (month): %q\n", names[2])
	fmt.Printf("Group 3 (day, unnamed): %q\n", names[3])

}
Output:

Capture groups: 4
Group 0 (full match): ""
Group 1 (year): "year"
Group 2 (month): "month"
Group 3 (day, unnamed): ""
Example (Matching)

ExampleRegex_SubexpNames_matching shows using SubexpNames with matches

package main

import (
	"fmt"

	"github.com/coregx/coregex"
)

func main() {
	// Compile pattern with named captures
	re := coregex.MustCompile(`(?P<protocol>https?)://(?P<domain>\w+)`)

	// Find match and get submatch values
	match := re.FindStringSubmatch("Visit https://example for more")
	names := re.SubexpNames()

	// Print matches with their names
	for i, name := range names {
		if i < len(match) && match[i] != "" {
			if name != "" {
				fmt.Printf("%s: %s\n", name, match[i])
			} else if i == 0 {
				fmt.Printf("Full match: %s\n", match[i])
			}
		}
	}

}
Output:

Full match: https://example
protocol: https
domain: example

type Regexp added in v0.8.1

type Regexp = Regex

Regexp is an alias for Regex to provide drop-in compatibility with stdlib regexp. This allows replacing `import "regexp"` with `import regexp "github.com/coregx/coregex"` without changing type names in existing code.

Example:

import regexp "github.com/coregx/coregex"

var re *regexp.Regexp = regexp.MustCompile(`\d+`)

Directories

Path Synopsis
dfa
lazy
Package lazy implements a Lazy DFA (Deterministic Finite Automaton) engine for regex matching.
Package lazy implements a Lazy DFA (Deterministic Finite Automaton) engine for regex matching.
onepass
Package onepass implements a one-pass DFA for regex patterns that have no ambiguity in their matching paths.
Package onepass implements a one-pass DFA for regex patterns that have no ambiguity in their matching paths.
internal
conv
Package conv provides safe integer conversion helpers for the regex engine.
Package conv provides safe integer conversion helpers for the regex engine.
sparse
Package sparse provides a sparse set data structure for efficient state tracking.
Package sparse provides a sparse set data structure for efficient state tracking.
Package literal provides types and operations for extracting literal sequences from regex patterns for prefilter optimization.
Package literal provides types and operations for extracting literal sequences from regex patterns for prefilter optimization.
Package meta implements the meta-engine orchestrator that automatically selects the optimal regex execution strategy.
Package meta implements the meta-engine orchestrator that automatically selects the optimal regex execution strategy.
Package nfa provides a Thompson NFA (Non-deterministic Finite Automaton) implementation for regex matching.
Package nfa provides a Thompson NFA (Non-deterministic Finite Automaton) implementation for regex matching.
Package prefilter provides fast candidate filtering for regex search using extracted literal sequences.
Package prefilter provides fast candidate filtering for regex search using extracted literal sequences.
Package simd provides SIMD-accelerated string operations for high-performance byte searching.
Package simd provides SIMD-accelerated string operations for high-performance byte searching.

Jump to

Keyboard shortcuts

? : This menu
/ : Search site
f or F : Jump to
y or Y : Canonical URL