Skip to content

Commit c0e8c7d

Browse files
committed
Add comments explaining the splitting traversal
Why it's a good idea, how it works, and what the benchmarks say.
1 parent 7e6d75f commit c0e8c7d

File tree

1 file changed

+57
-1
lines changed

1 file changed

+57
-1
lines changed

Data/Sequence.hs

Lines changed: 57 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -128,6 +128,7 @@ module Data.Sequence (
128128
foldlWithIndex, -- :: (b -> Int -> a -> b) -> b -> Seq a -> b
129129
foldrWithIndex, -- :: (Int -> a -> b -> b) -> b -> Seq a -> b
130130
-- * Transformations
131+
genSplitTraverseSeq,
131132
mapWithIndex, -- :: (Int -> a -> b) -> Seq a -> Seq b
132133
reverse, -- :: Seq a -> Seq a
133134
-- ** Zips
@@ -1709,14 +1710,59 @@ reverseNode f (Node3 s a b c) = Node3 s (f c) (f b) (f a)
17091710
-- For zipping, and probably also for (<*>), it is useful to build a result by
17101711
-- traversing a sequence while splitting up something else. For zipping, we
17111712
-- traverse the first sequence while splitting up the second [and third [and
1712-
-- fourth]]. For fs <*> xs, we expect soon to traverse
1713+
-- fourth]]. For fs <*> xs, we hope to traverse
17131714
--
17141715
-- > replicate (length fs * length xs) ()
17151716
--
17161717
-- while splitting something essentially equivalent to
17171718
--
17181719
-- > fmap (\f -> fmap f xs) fs
17191720
--
1721+
-- What makes all this crazy code a good idea:
1722+
--
1723+
-- Suppose we zip together two sequences of the same length:
1724+
--
1725+
-- zs = zip xs ys
1726+
--
1727+
-- We want to get reasonably fast indexing into zs immediately, rather than
1728+
-- needing to construct the entire thing first, as the previous implementation
1729+
-- required. The first aspect is that we build the result "outside-in" or
1730+
-- "top-down", rather than left to right. That gives us access to both ends
1731+
-- quickly. But that's not enough, by itself, to give immediate access to the
1732+
-- center of zs. For that, we need to be able to skip over larger segments of
1733+
-- zs, delaying their construction until we actually need them. The way we do
1734+
-- this is to traverse xs, while splitting up ys according to the structure of
1735+
-- xs. If we have a Deep _ pr m sf, we split ys into three pieces, and hand off
1736+
-- one piece to the prefix, one to the middle, and one to the suffix of the
1737+
-- result. The key point is that we don't need to actually do anything further
1738+
-- with those pieces until we actually need them; the computations to split
1739+
-- them up further and zip them with their matching pieces can be delayed until
1740+
-- they're actually needed. We do the same thing for Digits (splitting into
1741+
-- between one and four pieces) and Nodes (splitting into two or three). The
1742+
-- ultimate result is that we can index, or split at, any location in zs in
1743+
-- O(log(min{i,n-i})) time *immediately*, with only a constant-factor slowdown
1744+
-- as thunks are forced along the path.
1745+
--
1746+
-- Benchmark info, and alternatives:
1747+
--
1748+
-- The old zipping code used mapAccumL to traverse the first sequence while
1749+
-- cutting down the second sequence one piece at a time.
1750+
--
1751+
-- An alternative way to express that basic idea is to convert both sequences
1752+
-- to lists, zip the lists, and then convert the result back to a sequence.
1753+
-- I'll call this the "listy" implementation.
1754+
--
1755+
-- I benchmarked two operations: Each started by zipping two sequences
1756+
-- constructed with replicate and/or fromList. The first would then immediately
1757+
-- index into the result. The second would apply deepseq to force the entire
1758+
-- result. The new implementation worked much better than either of the others
1759+
-- on the immediate indexing test, as expected. It also worked better than the
1760+
-- old implementation for all the deepseq tests. For short sequences, the listy
1761+
-- implementation outperformed all the others on the deepseq test. However, the
1762+
-- splitting implementation caught up and surpassed it once the sequences grew
1763+
-- long enough. It seems likely that by avoiding rebuilding, it interacts
1764+
-- better with the cache hierarchy.
1765+
--
17201766
-- David Feuer, with excellent guidance from Carter Schonwald, December 2014
17211767

17221768
class Splittable s where
@@ -1731,6 +1777,16 @@ instance (Splittable a, Splittable b) => Splittable (a, b) where
17311777
(al, ar) = splitState i a
17321778
(bl, br) = splitState i b
17331779

1780+
data GenSplittable s = GenSplittable s (Int -> s -> (s,s))
1781+
instance Splittable (GenSplittable s) where
1782+
splitState i (GenSplittable s spl) = (GenSplittable l spl, GenSplittable r spl)
1783+
where
1784+
(l,r) = spl i s
1785+
1786+
{-# INLINE genSplitTraverseSeq #-}
1787+
genSplitTraverseSeq :: (Int -> s -> (s, s)) -> (s -> a -> b) -> s -> Seq a -> Seq b
1788+
genSplitTraverseSeq spl f s = splitTraverseSeq (\(GenSplittable s _) -> f s) (GenSplittable s spl)
1789+
17341790
{-# SPECIALIZE splitTraverseSeq :: (Seq x -> a -> b) -> Seq x -> Seq a -> Seq b #-}
17351791
{-# SPECIALIZE splitTraverseSeq :: ((Seq x, Seq y) -> a -> b) -> (Seq x, Seq y) -> Seq a -> Seq b #-}
17361792
splitTraverseSeq :: (Splittable s) => (s -> a -> b) -> s -> Seq a -> Seq b

0 commit comments

Comments
 (0)