-
Notifications
You must be signed in to change notification settings - Fork 36
WIP: Constant space sort #61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
bgamari
wants to merge
7
commits into
haskell:master
Choose a base branch
from
bgamari:constant-space-sort
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from all commits
Commits
Show all changes
7 commits
Select commit
Hold shift + click to select a range
1b1728a
Fix string encodings.
bgamari ded621b
Fix cabal file formatting
bgamari 300f6b4
Add test to verify round-trip-ability
bgamari 6dd43c1
Add GHC.RTS.Events.Sort
bgamari 63be78b
Sort: Make sorting parameters configurable
bgamari fcc02be
Ensure that temporary files are deleted
bgamari 3f7eca6
Add testcase for merge sort
bgamari File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,188 @@ | ||
-- | Constant-space sorting. | ||
-- | ||
-- This module provides a routine for sorting events in constant-space via | ||
-- on-disk merge sort. | ||
module GHC.RTS.Events.Sort | ||
( sortEvents | ||
, sortEvents' | ||
, SortParams(..) | ||
, defaultSortParams | ||
) where | ||
|
||
import Data.Traversable | ||
import Data.Coerce | ||
import Data.Function (on) | ||
import Data.List (sortBy, minimumBy) | ||
import Data.Maybe | ||
import Data.Foldable (toList) | ||
import System.IO | ||
import System.IO.Temp | ||
import System.Directory | ||
import Prelude | ||
|
||
import Data.Binary.Put as P | ||
import qualified Data.ByteString.Lazy as BSL | ||
import qualified Data.Sequence as S | ||
|
||
import GHC.RTS.Events hiding (sortEvents) | ||
import GHC.RTS.Events.Binary (putEventLog) | ||
|
||
type SortedChunk = FilePath | ||
|
||
newtype OnTime = OnTime Event | ||
|
||
instance Ord OnTime where | ||
compare = coerce (compare `on` evTime) | ||
|
||
instance Eq OnTime where | ||
(==) = coerce ((==) `on` evTime) | ||
|
||
-- | Parameters which determine the behavior of the merge sort. | ||
data SortParams = SortParams | ||
{ -- | The chunk size which the input eventlog is broken into (in events). This | ||
-- determines the upper-bound on memory usage during the sorting process. | ||
-- | ||
-- This value is a reasonable trade-off between memory and computation, | ||
-- requiring approximately 100MBytes while sorting a "typical" eventlog. | ||
chunkSize :: !Int | ||
|
||
-- | Maximum number of chunks to merge at once. Determined by the largest | ||
-- number of file descriptors we can safely open at once. | ||
, maxFanIn :: !Int | ||
} | ||
|
||
-- | A reasonable set of sorting parameters. | ||
defaultSortParams :: SortParams | ||
defaultSortParams = | ||
SortParams { chunkSize = 500*1000 | ||
, maxFanIn = 256 | ||
} | ||
|
||
-- | @sortEvents outPath eventlog@ sorts @eventlog@ via on-disk merge | ||
-- sort. The sorted eventlog is written to @eventlog@. The system's temporary | ||
-- directory is used for temporary data. See 'sortEvents\'' for more control. | ||
sortEvents | ||
:: FilePath -- ^ output eventlog file path | ||
-> EventLog -- ^ eventlog to sort | ||
-> IO () | ||
sortEvents outPath eventLog = | ||
withSystemTempDirectory "sort-events" $ \tmpDir -> | ||
sortEvents' defaultSortParams tmpDir outPath eventLog | ||
|
||
-- | @sortEvents' params tmpDir outPath eventlog@ sorts | ||
-- @eventlog@ via on-disk merge sort, using @tmpDir@ for | ||
-- intermediate data. The caller is responsible for deleting @tmpDir@ upon | ||
-- return. | ||
-- | ||
-- The sorted eventlog is written to @eventlog@. | ||
sortEvents' | ||
:: SortParams | ||
-> FilePath -- ^ temporary directory | ||
-> FilePath -- ^ output eventlog file path | ||
-> EventLog -- ^ eventlog to sort | ||
-> IO () | ||
sortEvents' _params _tmpDir _outPath (EventLog _ (Data [])) = fail "sortEvents: no events" | ||
sortEvents' params tmpDir outPath (EventLog hdr (Data events0)) = do | ||
chunks <- toSortedChunks events0 | ||
hdl <- openBinaryFile outPath WriteMode | ||
mergeChunks' hdl chunks | ||
hClose hdl | ||
return () | ||
where | ||
SortParams chunkSize fanIn = params | ||
|
||
toSortedChunks :: [Event] -> IO (S.Seq SortedChunk) | ||
toSortedChunks = | ||
fmap S.fromList | ||
. mapM (writeTempChunk . sortEventsInMem) | ||
. chunksOf chunkSize | ||
|
||
mergeChunks' :: Handle -> S.Seq SortedChunk -> IO () | ||
mergeChunks' destFile chunks | ||
| S.null chunks = | ||
fail "sortEvents: this can't happen" | ||
| S.length chunks <= fanIn = do | ||
events <- mapM readChunk chunks | ||
let sorted = mergeSort $ toList (coerce events :: S.Seq [OnTime]) | ||
writeChunk destFile (coerce sorted) | ||
mapM_ removeFile chunks | ||
hClose destFile | ||
| otherwise = do | ||
chunksss <- flip mapM (nChunks fanIn chunks) $ \fps -> do | ||
(fp, hdl) <- createTempChunk | ||
mergeChunks' hdl fps | ||
return fp | ||
mergeChunks' destFile (S.fromList chunksss) | ||
|
||
readChunk :: SortedChunk -> IO [Event] | ||
readChunk fp = do | ||
result <- readEventLogFromFile fp | ||
case result of | ||
Left err -> fail $ "sortEvents: error reading chunk: " ++ fp ++ ": " ++ err | ||
Right (EventLog _ (Data events)) -> return events | ||
|
||
createTempChunk :: IO (FilePath, Handle) | ||
createTempChunk = | ||
openBinaryTempFile tmpDir "chunk" | ||
|
||
writeTempChunk :: [Event] -> IO FilePath | ||
writeTempChunk evs = do | ||
(fp, hdl) <- createTempChunk | ||
writeChunk hdl evs | ||
hClose hdl | ||
return fp | ||
|
||
writeChunk :: Handle -> [Event] -> IO () | ||
writeChunk hdl events = | ||
BSL.hPutStr hdl | ||
$ P.runPut | ||
$ putEventLog | ||
$ EventLog hdr | ||
$ Data events | ||
|
||
-- | An unordered set. | ||
type Bag a = [a] | ||
|
||
-- | Break a list in chunks of the given size. | ||
chunksOf :: Int -> [a] -> [[a]] | ||
chunksOf _ [] = [] | ||
chunksOf n xs = | ||
let (ys, rest) = splitAt n xs | ||
in ys : chunksOf n rest | ||
|
||
-- | Break a 'S.Seq' into \(n\) roughly-even chunks. | ||
nChunks :: Int -> S.Seq a -> [S.Seq a] | ||
nChunks n xs0 = go xs0 | ||
where | ||
go :: S.Seq a -> [S.Seq a] | ||
go xs | ||
| S.null xs = [] | ||
| otherwise = let (x,y) = S.splitAt len xs in x : go y | ||
len = S.length xs0 `div` n + 1 | ||
|
||
-- | Merge the given lists into sorted order. | ||
mergeSort :: Ord a => Bag [a] -> [a] | ||
mergeSort = go | ||
where | ||
go [] = [] | ||
go xss = | ||
case catMaybes $ mapZipper f xss of | ||
[] -> [] | ||
xs -> minimumBy (compare `on` head) xs | ||
|
||
f :: Ord a => Bag [a] -> [a] -> Maybe [a] | ||
f _ [] = Nothing | ||
f rest (x:xs) = Just $ x : go (xs : rest) | ||
|
||
mapZipper :: (Bag a -> a -> b) -> Bag a -> [b] | ||
mapZipper f = go [] | ||
where | ||
--go :: Bag a -> Bag [a] -> [b] | ||
go _prevs [] = [] | ||
go prevs (x:nexts) = | ||
f (prevs ++ nexts) x : go (x : prevs) nexts | ||
|
||
sortEventsInMem :: [Event] -> [Event] | ||
sortEventsInMem = | ||
sortBy (compare `on` evTime) | ||
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,22 @@ | ||
import Control.Monad | ||
import System.Exit | ||
|
||
import GHC.RTS.Events | ||
import GHC.RTS.Events.Incremental | ||
import Utils (files, diffLines) | ||
|
||
-- | Check that an eventlog round-trips through encoding/decoding. | ||
checkRoundtrip :: FilePath -> IO Bool | ||
checkRoundtrip logFile = do | ||
putStrLn logFile | ||
Right eventlog <- readEventLogFromFile logFile | ||
let Right (roundtripped, _) = readEventLog $ serialiseEventLog eventlog | ||
let getEvents = sortEvents . events . dat | ||
if show roundtripped == show eventlog | ||
then return True | ||
else putStrLn "bad" >> return False | ||
|
||
main :: IO () | ||
main = do | ||
successes <- mapM checkRoundtrip files | ||
unless (and successes) exitFailure |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Uh oh!
There was an error while loading. Please reload this page.