-
Notifications
You must be signed in to change notification settings - Fork 38
Description
The Tokenizer appears to perform very poorly when you have a large number of replacement token instances.
For example in a file this line:
| $matches = select-string -Path $tempFile -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } |
Returns 8600 instances on a file I am attempting to have it replace.
Based on the logic of this loop:
| ForEach ($match in $matches) { |
This will attempt to perform this operation 8600 times. If you look at the code this loops though the file row, by row, attempting a replacement of all of the variables that are found.
This is inefficent, rather the above line should have gathered distinct values like so (the following tries to follow the PowerShell-isms and is not 100% efficent):
$matches = select-string -Path $tempFile -Pattern $regex -AllMatches | % { $_.Matches } | % { $_.Value } | Sort-Object | Get-UniqueNote that in order to use Get-Unique the documentation states that the list must be ordered, which is why a Sort-Object is called before hand.
Running this on that same file returns a mere 38 instances to have to attempt to replace. which is an order of magnitude smaller than the previous attempts.
There are still other performance issues, for example the row-by-row replacement per variable as seen here:
Extension-UtilitiesPack/Utilites/Tokenizer/tokenize-ps3.ps1
Lines 145 to 149 in 4747cae
| (Get-Content $tempFile -Encoding $encoding) | | |
| Foreach-Object { | |
| $_ -replace $match, $variableValue | |
| } | | |
| Set-Content $tempFile -Encoding $encoding -Force |
This becomes painful as the number of lines in the file increase, However, this simple fix would resolve the most obvious performance issue.