-
Notifications
You must be signed in to change notification settings - Fork 40
Open
Description
Data.List.Utils.split if you profile it, does a lot of allocation.
With a 600 mb "data" file containing short words, I profiled this program:
import Data.List.Utils
main = do
s <- readFile "data"
let l = split " " s
print (length l)
Tue Jan 31 19:49 2017 Time and Allocation Profiling Report (Final)
foo +RTS -p -RTS
total time = 117.11 secs (117106 ticks @ 1000 us, 1 processor)
total alloc = 139,419,374,656 bytes (excludes profiling overheads)
COST CENTRE MODULE %time %alloc
spanList Data.List.Utils 46.2 59.3
startswith Data.List.Utils 28.4 14.1
MAIN MAIN 14.7 17.8
split Data.List.Utils 8.3 6.4
breakList Data.List.Utils 2.5 2.4
Compare with this program using a splitc specialized to a single character delimiter:
main = do
s <- readFile "data"
let l = splitc ' ' s
print (length l)
splitc :: Char -> String -> [String]
splitc _ [] = []
splitc c s = case break (== c) s of
(i, d:rest) -> i : splitc c rest
(i, []) -> i : []
Tue Jan 31 19:54 2017 Time and Allocation Profiling Report (Final)
foo +RTS -p -RTS
total time = 54.44 secs (54435 ticks @ 1000 us, 1 processor)
total alloc = 99,505,766,960 bytes (excludes profiling overheads)
So over twice as fast!
A simple improvement then would be to make split use splitc when the delimiter is a single character. But, it might be better to investigate why the helper functions are doing so much allocation.
Metadata
Metadata
Assignees
Labels
No labels