Skip to content

Commit 1373ce3

Browse files
meooow25Lysxia
authored andcommitted
Change utf8LengthByLeader to a branching impl
The simple branching implementation is more efficient in the majority of use cases where the result is branched on anyway. In other cases, branch prediction should do a decent job on typical text.
1 parent 1994b13 commit 1373ce3

File tree

1 file changed

+5
-14
lines changed
  • src/Data/Text/Internal/Encoding

1 file changed

+5
-14
lines changed

src/Data/Text/Internal/Encoding/Utf8.hs

Lines changed: 5 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -80,22 +80,13 @@ utf8Length :: Char -> Int
8080
utf8Length (C# c) = I# ((1# +# geChar# c (chr# 0x80#)) +# (geChar# c (chr# 0x800#) +# geChar# c (chr# 0x10000#)))
8181
{-# INLINE utf8Length #-}
8282

83-
-- This is a branchless version of
84-
-- utf8LengthByLeader w
85-
-- | w < 0x80 = 1
86-
-- | w < 0xE0 = 2
87-
-- | w < 0xF0 = 3
88-
-- | otherwise = 4
89-
--
90-
-- c `xor` I# (c# <=# 0#) is a branchless equivalent of c `max` 1.
91-
-- It is crucial to write c# <=# 0# and not c# ==# 0#, otherwise
92-
-- GHC is tempted to "optimize" by introduction of branches.
93-
9483
-- | @since 2.0
9584
utf8LengthByLeader :: Word8 -> Int
96-
utf8LengthByLeader w = c `xor` I# (c# <=# 0#)
97-
where
98-
!c@(I# c#) = countLeadingZeros (complement w)
85+
utf8LengthByLeader w
86+
| w < 0x80 = 1
87+
| w < 0xE0 = 2
88+
| w < 0xF0 = 3
89+
| otherwise = 4
9990
{-# INLINE utf8LengthByLeader #-}
10091

10192
ord2 ::

0 commit comments

Comments
 (0)