Skip to content

Commit e5ca180

Browse files
committed
Change incomplete DFA to a complete one
Previously, a hash map was used for the DFA state transitions (key was the symbol; value was the next state) because each DFA state had only the transitions that were really necessary and did not have transitions for every symbol of the alphabet. Now a complete DFA is created, meaning each state has transitions for each symbol of the alphabet. In addition, the DFA now has a real dead state. So, the DFA now meets the strict definition of a complete DFA. This change allowed the hash map that stored the DFA state transitions to be replaced with an array, which greatly increases the speed of the DFA since there is now no need to compute hashes.
1 parent 5ed82cf commit e5ca180

File tree

3 files changed

+37
-21
lines changed

3 files changed

+37
-21
lines changed

README.md

Lines changed: 5 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -41,14 +41,16 @@ EndIf
4141
```
4242
More code examples can be found in the [`examples`](examples) directory.
4343

44-
## Public Enumerations
44+
## Public Constants
4545

4646
```purebasic
4747
Enumeration NfaSpecialSymbols 256
4848
#Symbol_Move ; Used for NFA epsilon moves
4949
#Symbol_Split ; Used for NFA unions
5050
#Symbol_Final ; Used for NFA final state
5151
EndEnumeration
52+
53+
#State_DfaDeadState = 0
5254
```
5355

5456
## Public Structures
@@ -63,8 +65,8 @@ EndStructure
6365

6466
```purebasic
6567
Structure DfaStateStruc
66-
Map symbols.i() ; Key is the symbol and the value is the next DFA state
67-
isFinalState.i ; `#True` if the DFA state is a final state, otherwise `#False`
68+
symbols.i[256] ; Index is the symbol code and the value is the next DFA state
69+
isFinalState.i ; `#True` if the DFA state is a final state, otherwise `#False`
6870
EndStructure
6971
```
7072

RegExEngine.pbi

Lines changed: 23 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -9,15 +9,17 @@ DeclareModule RegEx
99
#Symbol_Final ; Used for NFA final state
1010
EndEnumeration
1111

12+
#State_DfaDeadState = 0
13+
1214
Structure NfaStateStruc
1315
symbol.i ; Unicode number or special symbol number
1416
*nextState1 ; Pointer to the first next NFA state
1517
*nextState2 ; Pointer to the second next NFA state
1618
EndStructure
1719

1820
Structure DfaStateStruc
19-
Map symbols.i() ; Key is the symbol and the value is the next DFA state
20-
isFinalState.i ; `#True` if the DFA state is a final state, otherwise `#False`
21+
symbols.i[256] ; Index is the symbol code and the value is the next DFA state
22+
isFinalState.i ; `#True` if the DFA state is a final state, otherwise `#False`
2123
EndStructure
2224

2325
Structure RegExEngineStruc
@@ -440,7 +442,9 @@ Module RegEx
440442
sizeOfArray = ArraySize(eClosures())
441443
countOfStates = ListSize(*states())
442444

443-
For dfaState = 0 To sizeOfArray
445+
; dfaState '0' is the dead state, so it will be skipped.
446+
447+
For dfaState = 1 To sizeOfArray
444448

445449
isFound = #True
446450

@@ -468,13 +472,18 @@ Module RegEx
468472
EndProcedure
469473

470474
Procedure CreateDfa(*regExEngine.RegExEngineStruc, clearNfa = #True)
471-
Protected.EClosureStruc Dim eClosures(0), NewMap symbols()
475+
Protected.EClosureStruc Dim eClosures(1), NewMap symbols()
472476
Protected.NfaStateStruc *state
473477
Protected sizeOfArray, dfaState, result
474478

479+
dfaState = 1
480+
481+
; dfaState '0' is the dead state, so it will be skipped.
482+
; eClosures(0) is then always unused, but it is easier that way.
483+
475484
AddState(*regExEngine\initialNfaState, eClosures(dfaState)\nfaStates())
476485

477-
For dfaState = 0 To ArraySize(eClosures())
486+
For dfaState = 1 To ArraySize(eClosures())
478487

479488
ClearMap(symbols())
480489

@@ -491,13 +500,13 @@ Module RegEx
491500
ForEach symbols()
492501
result = FindStatesSet(eClosures(), symbols()\nfaStates())
493502
If result
494-
*regExEngine\dfaStatesPool(dfaState)\symbols(MapKey(symbols())) = result
503+
*regExEngine\dfaStatesPool(dfaState)\symbols[Asc(MapKey(symbols()))] = result
495504
Else
496505
sizeOfArray = ArraySize(eClosures())
497506
ReDim eClosures(sizeOfArray + 1)
498507
ReDim *regExEngine\dfaStatesPool(sizeOfArray + 1)
499508
CopyList(symbols()\nfaStates(), eClosures(sizeOfArray + 1)\nfaStates())
500-
*regExEngine\dfaStatesPool(dfaState)\symbols(MapKey(symbols())) = sizeOfArray + 1
509+
*regExEngine\dfaStatesPool(dfaState)\symbols[Asc(MapKey(symbols()))] = sizeOfArray + 1
501510
EndIf
502511
Next
503512

@@ -547,12 +556,15 @@ Module RegEx
547556
Procedure DfaMatch(*regExEngine.RegExEngineStruc, *string.Ascii)
548557
Protected dfaState, matchLength, lastFinalStateMatchLength
549558

559+
dfaState = 1
560+
561+
; dfaState '0' is the dead state, so it will be skipped.
562+
550563
Repeat
551-
552-
If Not FindMapElement(*regExEngine\dfaStatesPool(dfaState)\symbols(), Chr(*string\a))
564+
dfaState = *regExEngine\dfaStatesPool(dfaState)\symbols[*string\a]
565+
If dfaState = #State_DfaDeadState
553566
Break
554567
EndIf
555-
dfaState = *regExEngine\dfaStatesPool(dfaState)\symbols()
556568

557569
matchLength + 1
558570
*string + SizeOf(Ascii)
@@ -566,7 +578,7 @@ Module RegEx
566578
EndProcedure
567579

568580
Procedure Match(*regExEngine.RegExEngineStruc, *string.Character)
569-
If MapSize(*regExEngine\dfaStatesPool(0)\symbols())
581+
If ArraySize(*regExEngine\dfaStatesPool())
570582
ProcedureReturn DfaMatch(*regExEngine, *string)
571583
Else
572584
ProcedureReturn NfaMatch(*regExEngine, *string)

examples/Debug_DFA_table.pb

Lines changed: 9 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@ IncludePath ".."
33
IncludeFile "RegExEngine.pbi"
44

55
Define.RegEx::RegExEngineStruc *regEx
6-
Define sizeOfArray, i
6+
Define sizeOfArray, i, i2
77
Define hex$
88

99
*regEx = RegEx::Create("ab*")
@@ -17,19 +17,21 @@ RegEx::CreateDfa(*regEx)
1717
Debug "| State | Symbol | Next state |"
1818
Debug "| =================== | ====== | =================== |"
1919
sizeOfArray = ArraySize(*regEx\dfaStatesPool())
20-
For i = 0 To sizeOfArray
20+
For i = 1 To sizeOfArray
2121
If *regEx\dfaStatesPool(i)\isFinalState
2222
Debug "| " + LSet(Str(i) + " (final)", 19) + " | " + Space(6) + " | " +
2323
Space(19) + " |"
2424
Else
2525
Debug "| " + LSet(Str(i), 19) + " | " + Space(6) + " | " + Space(19) + " |"
2626
EndIf
2727

28-
ForEach *regEx\dfaStatesPool(i)\symbols()
29-
hex$ = RSet(Hex(Asc(MapKey(*regEx\dfaStatesPool(i)\symbols()))), 2, "0")
30-
Debug "| " + Space(19) +
31-
" | " + LSet(hex$, 6) +
32-
" | " + LSet(Str(*regEx\dfaStatesPool(i)\symbols()), 19) + " |"
28+
For i2 = 0 To 255
29+
If *regEx\dfaStatesPool(i)\symbols[i2] <> RegEx::#State_DfaDeadState
30+
hex$ = RSet(Hex(i2), 2, "0")
31+
Debug "| " + Space(19) +
32+
" | " + LSet(hex$, 6) +
33+
" | " + LSet(Str(*regEx\dfaStatesPool(i)\symbols[i2]), 19) + " |"
34+
EndIf
3335
Next
3436
Debug "| ------------------- | ------ | ------------------- |"
3537
Next

0 commit comments

Comments
 (0)