Skip to content

Conversation

@Mzack9999
Copy link
Member

@Mzack9999 Mzack9999 commented May 14, 2025

@Mzack9999 Mzack9999 self-assigned this May 14, 2025
@ehsandeep ehsandeep requested a review from dwisiswant0 May 14, 2025 21:18
@dwisiswant0
Copy link
Member

afaik, using the combination of regexp2 and std regexp pkg should be sufficient, since regexep2 already support Perl5 (lookarounds & backreferences). so there’s really no need for the go-re2 engine. also it seems like the current detectEngine implementation isn’t comprehensive enough to handle non-std-regexp features properly.

tip: i actually built a package called pcregexp that supports lookarounds & backreferences while aiming for compatibility with the std regexp, it uses libpcre2 API bindings under the hood with purego (so there’s no Cgo). it also has a wrapper that automatically selects the appropriate engine based on the regex features being used. it’s NOT fully stable yet, but i’m actively looking for adopters to support its continuous development and testing.

go test -benchmem -run=^$ -bench . benchmark
goos: linux
goarch: amd64
pkg: benchmark
cpu: 11th Gen Intel(R) Core(TM) i9-11900H @ 2.50GHz
BenchmarkMatch/pcregexp/simple-16         	77527322	        15.09 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/dlclark-regexp2/simple-16  	 2509765	       496.2 ns/op	      80 B/op	       1 allocs/op
BenchmarkMatch/AspieSoft-regex/simple-16  	 2244966	       552.1 ns/op	      28 B/op	       2 allocs/op
BenchmarkMatch/scorpionknifes-pcre/simple-16         	 4953229	       234.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/GRbit-pcre/simple-16                  	 2469645	       522.0 ns/op	      28 B/op	       2 allocs/op
BenchmarkMatch/wasilibs-re2/simple-16                	 1229862	       961.8 ns/op	     248 B/op	       7 allocs/op
BenchmarkMatch/pcregexp/email-16                     	81896185	        14.94 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/dlclark-regexp2/email-16              	  578250	      1836 ns/op	      64 B/op	       1 allocs/op
BenchmarkMatch/AspieSoft-regex/email-16              	 2268996	       516.1 ns/op	      16 B/op	       2 allocs/op
BenchmarkMatch/scorpionknifes-pcre/email-16          	 4037431	       287.0 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/GRbit-pcre/email-16                   	 2544838	       489.9 ns/op	      16 B/op	       2 allocs/op
BenchmarkMatch/wasilibs-re2/email-16                 	 1000000	      1082 ns/op	     240 B/op	       7 allocs/op
BenchmarkMatch/pcregexp/backreference-16             	76438894	        15.10 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/dlclark-regexp2/backreference-16      	 1294224	      1021 ns/op	      80 B/op	       1 allocs/op
BenchmarkMatch/AspieSoft-regex/backreference-16      	 2315284	       486.8 ns/op	      28 B/op	       2 allocs/op
BenchmarkMatch/scorpionknifes-pcre/backreference-16  	 5510665	       226.4 ns/op	       0 B/op	       0 allocs/op
BenchmarkMatch/GRbit-pcre/backreference-16           	 2588995	       460.9 ns/op	      28 B/op	       2 allocs/op
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1747392575.123307  145936 re2.cc:237] Error parsing '(\w+)\s+\1': invalid escape sequence: \1
PASS
ok  	benchmark	27.466s
package benchmark_test

import (
	"testing"

	AspieSoft "github.com/AspieSoft/go-regex/v8"
	GRbit "github.com/GRbit/go-pcre"
	dlclark "github.com/dlclark/regexp2"
	scorpionknifes "github.com/scorpionknifes/go-pcre"
	wasilibs "github.com/wasilibs/go-re2"

	"github.com/dwisiswant0/pcregexp"
)

func BenchmarkMatch(b *testing.B) {
	tests := []struct {
		name    string
		pattern string
		text    []byte
	}{
		{"simple", `p([a-z]+)ch`, []byte("peach punch pinch")},
		{"email", `\b\w+@\w+\.\w+\b`, []byte("[email protected]")},
		{"backreference", `(\w+)\s+\1`, []byte("hello hello world")},
		{"lookaround", `(?<=foo)bar`, []byte("foobar")},
	}

	for _, tt := range tests {
		r1 := pcregexp.MustCompile(tt.pattern)

		b.ResetTimer()
		b.Run("pcregexp/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r1.Match(tt.text)
			}
		})

		r2 := dlclark.MustCompile(tt.pattern, 0)

		b.ResetTimer()
		b.Run("dlclark-regexp2/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r2.MatchString(string(tt.text))
			}
		})

		r3, err := AspieSoft.CompTry(tt.pattern)
		if err != nil {
			b.Fatalf("r3: failed to compile pattern: %v", err)
		}

		b.ResetTimer()
		b.Run("AspieSoft-regex/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r3.Match(tt.text)
			}
		})

		r4, err := scorpionknifes.Compile(tt.pattern, 0)
		if err != nil {
			b.Fatalf("r4: failed to compile pattern: %v", err)
		}
		r4Matcher := r4.NewMatcher()

		b.ResetTimer()
		b.Run("scorpionknifes-pcre/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r4Matcher.Match(tt.text, 0)
			}
		})

		r5, err := GRbit.Compile(tt.pattern, 0)
		if err != nil {
			b.Fatalf("r5: failed to compile pattern: %v", err)
		}

		b.ResetTimer()
		b.Run("GRbit-pcre/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r5.MatchWFlags(tt.text, 0)
			}
		})

		r6, err := wasilibs.Compile(tt.pattern)
		if err != nil {
			b.Skipf("r6: failed to compile pattern: %v", err)
		}

		b.ResetTimer()
		b.Run("wasilibs-re2/"+tt.name, func(b *testing.B) {
			for i := 0; i < b.N; i++ {
				r6.MatchString(string(tt.text))
			}
		})
	}
}

latest benchstat of pcregexp v. std regexp - dwisiswant0/pcregexp#5 (comment).

@dogancanbakir dogancanbakir requested a review from ehsandeep June 20, 2025 12:54
Copy link
Member

@ehsandeep ehsandeep left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

merge conflict ^

@dogancanbakir dogancanbakir requested a review from ehsandeep June 20, 2025 13:23
@ehsandeep ehsandeep merged commit 91461fd into main Jun 20, 2025
7 checks passed
@ehsandeep ehsandeep deleted the feat-regexp branch June 20, 2025 15:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants