Replies: 2 comments 3 replies
-
Is not possible with my current DFA data structure (at least not easily) because it is an array of maps and maps cannot be static. I use a map because my regex engine generates an incomplete DFA. A complete DFA could be implemented with a static array of a static array. But since a complete DFA must have transitions for all the symbols of the alphabet of the language in each state, the memory consumption gets very high very quickly. Example: For a language that supports all characters of the character set that PureBasic uses, 65536 symbol transitions must be in each state.
Only the NFA data structure uses memory pointers to the next states, whereas the DFA data structure uses index numbers.
Is planned, but in a different way.
|
Beta Was this translation helpful? Give feedback.
-
The RegEx engine can now use a DFA directly from a memory, i.e. also from a PureBasic |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
@SciroAtGit, I think that to improve performance of the RegEx engine you could benefit from pre-compiled RegExs (DFAs) being stored in Data sections (i.e. the read only
.rodata
for constants), which should make memory access faster.I'm trying this approach in the Lemon PB project, as you can see in the PoC code in
lemon-pb/poc/calc/calc.pbi
:Since these C-style array are read-only data about the parse table, I'm planning to make the parser generator emit them as PB code in
DataSection
with labels, and then have the template create structured arrays that point to the label. Preliminary tests showed that this work.Probably you'll have to fiddle with pointers though, since from what I understood so far the data structures here contain pointers to other sub-elements, which means you'll need to do some pointers arithmetic to get this right.
Also, you'd probably need to add a tool that generates a
.pbi
include file with all the requiredDataSection
s, which end users then would have to include using an alternative module of this engine, designed specifically for working with pre-compiled RegExs. But I thinks it's worth it, especially for projects that don't need to define RegEx at run-time (e.g. from user input) but rely on a predefined set of RegExs.I'm not sure how faster accessing data stored in data sections is going to be, compared to allocating the same data at run-time, but I think that it could introduce a significant improvement since the DFAs won't need to be memory managed on the heap, and because data sections are globally reachable in terms of memory management. Of course, it's also true that for each type of data section in the binary, the OS will allocated memory chunks which are 4Kb minimum (or multiples thereof), so if the RegEx is very small chances are that at run-time you could find a small memory bloat (but probably using heap allocated memory is worst, for the OS also use fixed sized chunks to allocate memory for structures that could grow).
Beta Was this translation helpful? Give feedback.
All reactions