diff --git a/CHANGELOGS.rst b/CHANGELOGS.rst index e9835a10..02dfc4c0 100644 --- a/CHANGELOGS.rst +++ b/CHANGELOGS.rst @@ -4,6 +4,7 @@ Change Logs 0.7.0 +++++ +* :pr:`149`: supports for StaticCache * :pr:`147`: simplified log processing * :pr:`146`: patch for IdeficsAttention, IdeficsEmbedding * :pr:`145`: patch for _compute_dynamic_ntk_parameters (Phi3RotaryEmbedding) diff --git a/_doc/cmds/validate.rst b/_doc/cmds/validate.rst index c6f47551..e09ff699 100644 --- a/_doc/cmds/validate.rst +++ b/_doc/cmds/validate.rst @@ -121,3 +121,27 @@ of function :func:`onnx_diagnostic.torch_models.validate.run_ort_fusion`. from onnx_diagnostic._command_lines_parser import main main("validate -m arnir0/Tiny-LLM --run -v 1 --export onnx-dynamo -o dump_models --patch --opt ir --ortfusiontype ALL".split()) + +Sdpa or Eager implementation or Use a StaticCache ++++++++++++++++++++++++++++++++++++++++++++++++++ + +Add ``--mop cache_implementation=static --iop cls_cache=StaticCache`` to use a StaticCache instead of a DynamicCache (default). +Add ``--mop attn_implementation=eager`` to explicitly select eager implementation for attention. + +.. code-block:: bash + + python -m onnx_diagnostic validate \ + -m google/gemma-2b \ + --run \ + -v 1 \ + --export custom \ + -o dump_test \ + --dtype float16 \ + --device cpu \ + --patch \ + --no-quiet \ + --opt default \ + --rewrite \ + --mop attn_implementation=eager \ + --mop cache_implementation=static \ + --iop cls_cache=StaticCache