|
| 1 | +.. _cardano regression case study: |
| 2 | + |
| 3 | +.. |
| 4 | + Local Variables |
| 5 | +.. |c-l| replace:: `cardano-ledger <https://github.com/input-output-hk/cardano-ledger/>`__ |
| 6 | +.. |new| replace:: GHC-9.2.8 |
| 7 | +.. |old| replace:: GHC-8.10.7 |
| 8 | +.. |inline| replace:: ``INLINE`` |
| 9 | +.. |inlineable| replace:: ``INLINEABLE`` |
| 10 | +.. |spec| replace:: ``SPECIALIZE`` |
| 11 | + |
| 12 | + |
| 13 | +`Cardano-Ledger: Performance Regression Updating from GHC-8.10.7 to GHC-9.2.8` |
| 14 | +============================================================================== |
| 15 | + |
| 16 | +This chapter is a case study on a performance regression in the |c-l| code base |
| 17 | +that IOG observed when upgrading the code base from |old| to |new|. To root |
| 18 | +cause the performance regression this case study directly inspects the |
| 19 | +:ref:`Core <Reading Core>` and uses the GHC :ref:`Profiler <GHC Flags>`. After |
| 20 | +reading this chapter, one should be able to spot inefficient Core, understand |
| 21 | +the difference and use cases for |inline|, |inlineable| and |spec| pragmas. |
| 22 | + |
| 23 | +The rest of the chapter is structured as follows. We introduce evidence of the |
| 24 | +performance regression. From this information we choose candidates to inspect as |
| 25 | +leads in our investigation. TODO :math:`\ldots{}` |
| 26 | + |
| 27 | + |
| 28 | +Evidence of a Regression |
| 29 | +------------------------ |
| 30 | + |
| 31 | +The regression was first observed in an integration test performed by the |
| 32 | +Cardano Benchmark team which resulted in two GHC Profiles: |
| 33 | + |
| 34 | +One for |old|: |
| 35 | + |
| 36 | +.. image:: /_static/cardano-regression/8107_perf.png |
| 37 | + :width: 800 |
| 38 | + |
| 39 | +And one for |new|: |
| 40 | + |
| 41 | +.. image:: /_static/cardano-regression/927_perf.png |
| 42 | + :width: 800 |
| 43 | + |
| 44 | +First, notice the difference in ``total alloc`` at the top of the report |
| 45 | +summaries. |old| shows total allocations of ~157GB, while |new| shows total |
| 46 | +allocations around ~220GB; a 40% increase. |
| 47 | + |
| 48 | +Next, observe that two :term:`CAF`'s have changed position in the summary: |
| 49 | +``size`` from ``Cardano.Ledger.UMap`` and ``updateStakeDistribution`` from |
| 50 | +``Cardano.Ledger.Shelley.LedgerState.IncrementalStake``. These two functions |
| 51 | +will be our guides to understanding the regression. In the spirit of :ref:`Don't |
| 52 | +think, look <Don't think, look>`, we'll compare the Core output between |old| |
| 53 | +and |new|. |
| 54 | + |
| 55 | +Understanding the Cardano.Ledger.UMap.size regression |
| 56 | +----------------------------------------------------- |
| 57 | + |
| 58 | +Here is the Core output on |new|: |
| 59 | + |
| 60 | +.. code-block:: haskell |
| 61 | +
|
| 62 | + -- RHS size: {terms: 22, types: 63, coercions: 0, joins: 0/0} |
| 63 | + size :: forall c k v. UView c k v -> Int |
| 64 | + [GblId, |
| 65 | + Arity=1, |
| 66 | + Str=<1L>, |
| 67 | + Unf=Unf{Src=InlineStable, TopLvl=True, Value=True, ConLike=True, |
| 68 | + WorkFree=True, Expandable=True, |
| 69 | + Guidance=ALWAYS_IF(arity=1,unsat_ok=True,boring_ok=False) |
| 70 | + ... |
| 71 | + size |
| 72 | + = \ (@c_aviN) |
| 73 | + (@k_aviO) |
| 74 | + (@v_aviP) |
| 75 | + (ds_dAfr :: UView c_aviN k_aviO v_aviP) -> |
| 76 | + case ds_dAfr of wild_Xe { |
| 77 | + __DEFAULT -> |
| 78 | + Cardano.Ledger.UMap.$fFoldableUView_$cfoldl' |
| 79 | + @c_aviN |
| 80 | + @k_aviO |
| 81 | + @Int |
| 82 | + @v_aviP |
| 83 | + (Cardano.Ledger.UMap.size2 @v_aviP) |
| 84 | + Cardano.Ledger.UMap.size1 |
| 85 | + wild_Xe; |
| 86 | + PtrUView co_aviQ [Dmd=A] co1_aviR [Dmd=A] ds1_dAiu -> |
| 87 | + case ds1_dAiu of { UMap ds2_sJNa ds3_sJNb -> |
| 88 | + case ds3_sJNb of { |
| 89 | + Data.Map.Internal.Bin dt_iAio ds4_iAip ds5_iAiq ds6_iAir |
| 90 | + ds7_iAis -> |
| 91 | + ghc-prim:GHC.Types.I# dt_iAio; |
| 92 | + Data.Map.Internal.Tip -> Cardano.Ledger.UMap.size1 |
| 93 | + } |
| 94 | + } |
| 95 | + } |
| 96 | +
|
| 97 | +.. note:: |
| 98 | + |
| 99 | + I've elided the :term:`Unfolding` for ``size`` and only present the |
| 100 | + ``IdInfo`` for the term. Unfoldings are important to inspect and understand, |
| 101 | + but for our purposes the unfoldings are simply copies of the function body. |
| 102 | + See :ref:`Unfoldings <Reading Core>` in the Reading Core chapter. For our |
| 103 | + purposes, unless stated otherwise all Core will be generated with |
| 104 | + ``-ddump-simpl`` and no suppression flags. This is purposefully done to show |
| 105 | + what Core in a real project can look like. |
| 106 | + |
| 107 | + |
| 108 | +On |old| the Core is slightly different: |
| 109 | + |
| 110 | + |
| 111 | +.. code-block:: haskell |
| 112 | +
|
| 113 | + size :: forall c k v. UView c k v -> Int |
| 114 | + [GblId, |
| 115 | + Arity=1, |
| 116 | + Caf=NoCafRefs, |
| 117 | + Str=<S,1*U>, |
| 118 | + Unf=Unf{Src=<vanilla>, TopLvl=True, Value=True, ConLike=True, |
| 119 | + WorkFree=True, Expandable=True, Guidance=IF_ARGS [70] 100 20}] |
| 120 | + size |
| 121 | + = \ (@ c_a7SFB) |
| 122 | + (@ k_a7SFC) |
| 123 | + (@ v_a7SFD) |
| 124 | + (ds_d7UZd :: UView c_a7SFB k_a7SFC v_a7SFD) -> |
| 125 | + case ds_d7UZd of wild_Xfk { |
| 126 | + __DEFAULT -> |
| 127 | + Cardano.Ledger.UMap.size_$cfoldl' |
| 128 | + @ c_a7SFB |
| 129 | + @ k_a7SFC |
| 130 | + @ Int |
| 131 | + @ v_a7SFD |
| 132 | + (Cardano.Ledger.UMap.size2 @ v_a7SFD) |
| 133 | + Cardano.Ledger.UMap.size1 |
| 134 | + wild_Xfk; |
| 135 | + PtrUView co_a7SFF [Dmd=<L,A>] co1_a7SFG [Dmd=<L,A>] ds1_d7Vel -> |
| 136 | + case ds1_d7Vel of { UMap ds2_s90fe ds3_s90ff -> |
| 137 | + case ds3_s90ff of { |
| 138 | + Data.Map.Internal.Bin dt_a7UZH ds4_a7UZI ds5_a7UZJ ds6_a7UZK |
| 139 | + ds7_a7UZL -> |
| 140 | + ghc-prim-0.6.1:GHC.Types.I# dt_a7UZH; |
| 141 | + Data.Map.Internal.Tip -> Cardano.Ledger.UMap.size1 |
| 142 | + } |
| 143 | + } |
| 144 | + } |
| 145 | +
|
| 146 | +Notice that on |new| the ``DEFAULT`` case calls |
| 147 | +``Cardano.Ledger.UMap.$fFoldableUView_$cfoldl'`` whereas on |old| this call is |
| 148 | +``Cardano.Ledger.UMap.size_$cfoldl'``. Let's check these functions: |
| 149 | + |
| 150 | +|new|: |
| 151 | + |
| 152 | +.. code-block:: haskell |
| 153 | +
|
| 154 | + -- RHS size: {terms: 215, types: 375, coercions: 57, joins: 0/4} |
| 155 | + Cardano.Ledger.UMap.$fFoldableUView_$cfoldl' |
| 156 | + :: forall c k b a. (b -> a -> b) -> b -> UView c k a -> b |
| 157 | + [GblId, Arity=3, Str=<LCL(C1(L))><1L><1L>, Unf=OtherCon []] |
| 158 | + Cardano.Ledger.UMap.$fFoldableUView_$cfoldl' |
| 159 | + = \ (@c_a2svV) |
| 160 | + (@k_a2svW) |
| 161 | + (@b_a2szt) |
| 162 | + (@a_a2szu) |
| 163 | + (accum_a2plt :: b_a2szt -> a_a2szu -> b_a2szt) |
| 164 | + (ans0_a2plu :: b_a2szt) |
| 165 | + (ds_d2xJs :: UView c_a2svV k_a2svW a_a2szu) -> |
| 166 | + case ds_d2xJs of { |
| 167 | + RewDepUView co_a2szv [Dmd=A] co1_a2szw ds1_d2xTB -> |
| 168 | + case ds1_d2xTB of { UMap ds2_s2BfK ds3_s2BfL -> |
| 169 | + letrec { |
| 170 | + go15_s2zs0 [Occ=LoopBreaker, Dmd=SCS(C1(L))] |
| 171 | + :: b_a2szt |
| 172 | + -> Map (Credential 'Staking c_a2svV) (UMElem c_a2svV) -> b_a2szt |
| 173 | + [LclId, Arity=2, Str=<1L><1L>, Unf=OtherCon []] |
| 174 | + go15_s2zs0 |
| 175 | + = \ (z'_i2wnP :: b_a2szt) |
| 176 | + (ds4_i2wnQ |
| 177 | + :: Map (Credential 'Staking c_a2svV) (UMElem c_a2svV)) -> |
| 178 | + case ds4_i2wnQ of { |
| 179 | + Data.Map.Internal.Bin ipv_i2wnT ipv1_i2wnU ipv2_i2wnV ipv3_i2wnW |
| 180 | + ipv4_i2wnX -> |
| 181 | + case go15_s2zs0 z'_i2wnP ipv3_i2wnW of z''_i2wnZ { __DEFAULT -> |
| 182 | + case (umElemRDPair @c_a2svV ipv2_i2wnV) |
| 183 | + ... |
| 184 | +
|
| 185 | +|old|: |
| 186 | + |
| 187 | +.. code-block:: haskell |
| 188 | +
|
| 189 | + -- RHS size: {terms: 272, types: 431, coercions: 77, joins: 0/4} |
| 190 | + Cardano.Ledger.UMap.size_$cfoldl' |
| 191 | + :: forall c k b a. (b -> a -> b) -> b -> UView c k a -> b |
| 192 | + [GblId, |
| 193 | + Arity=3, |
| 194 | + Caf=NoCafRefs, |
| 195 | + Str=<L,C(C1(U))><S,1*U><S,1*U>, |
| 196 | + Unf=OtherCon []] |
| 197 | + Cardano.Ledger.UMap.size_$cfoldl' |
| 198 | + = \ (@ c_a7TVW) |
| 199 | + (@ k_a7TVX) |
| 200 | + (@ b_a7TZI) |
| 201 | + (@ a_a7TZJ) |
| 202 | + (accum_a7RPi :: b_a7TZI -> a_a7TZJ -> b_a7TZI) |
| 203 | + (ans0_a7RPj :: b_a7TZI) |
| 204 | + (ds_d8v9s :: UView c_a7TVW k_a7TVX a_a7TZJ) -> |
| 205 | + case ds_d8v9s of { |
| 206 | + RewDepUView co_a7TZL [Dmd=<L,A>] co1_a7TZM ds1_d8wpq -> |
| 207 | + case ds1_d8wpq of { UMap ds2_s90eY ds3_s90eZ -> |
| 208 | + letrec { |
| 209 | + go15_s8G6Q [Occ=LoopBreaker] |
| 210 | + :: b_a7TZI |
| 211 | + -> Map (Credential 'Staking c_a7TVW) (UMElem c_a7TVW) -> b_a7TZI |
| 212 | + [LclId, Arity=2, Str=<S,1*U><S,1*U>, Unf=OtherCon []] |
| 213 | + go15_s8G6Q |
| 214 | + = \ (z'_a8iQB :: b_a7TZI) |
| 215 | + (ds4_a8iQC |
| 216 | + :: Map (Credential 'Staking c_a7TVW) (UMElem c_a7TVW)) -> |
| 217 | + case ds4_a8iQC of { |
| 218 | + Data.Map.Internal.Bin ipv_a8iQF ipv1_a8iQG ipv2_a8iQH ipv3_a8iQI |
| 219 | + ipv4_a8iQJ -> |
| 220 | + case go15_s8G6Q z'_a8iQB ipv3_a8iQI of z''_a8iQL { __DEFAULT -> |
| 221 | + case ipv2_a8iQH of { |
| 222 | + __DEFAULT -> go15_s8G6Q z''_a8iQL ipv4_a8iQJ; |
| 223 | + TFEEE dt_d8BOJ dt1_d8BOK -> |
| 224 | +
|
| 225 | +
|
| 226 | +These functions are again nearly identical. Both define a function which inputs |
| 227 | +four type variables , and three term variables, and then defines a local |
| 228 | +function called with a recursive let. For example on |old| we have: ``c_a7TVW``, |
| 229 | +``k_a7TVX``, ``b_a7TZI``, and ``a_a7TZJ`` for type variables, ``accum_a7RPi``, |
| 230 | +``ans0_a7RPj``, and ``ds_d8v9s`` for term variables, and ``go15_s8G6Q`` for the |
| 231 | +local recursive function. |
| 232 | + |
| 233 | +From the summary comment above the function signature we can see that |
| 234 | +``cfoldl'`` on |old| is larger (272 terms) compared to |new| (215 terms). Now |
| 235 | +larger Core *is not always* worse than smaller Core; it depends on |
| 236 | +specialization and inlining behavior. In this case, the larger Core is a better |
| 237 | +performing program. On |old| we can see that the local function ``go15`` begins |
| 238 | +pattern matching on an :term:`Algebraic Data Type`. |
0 commit comments