-
Notifications
You must be signed in to change notification settings - Fork 26
Add adaptToCallingConvention utility for element type conversion and byte offset handling #1709
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
src/enzyme_ad/jax/Utils.cpp
Outdated
| if (outerType.getElementType() != innerType.getElementType()) { | ||
| auto convertedType = RankedTensorType::get( | ||
| cast<RankedTensorType>(adaptedArg.getType()).getShape(), | ||
| innerType.getElementType()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot you need to use BitCastConvertOp, look at other uses of it in the repo to learn how it might force an additional tensor argument to be added or removed. At this point we do not know whether the target type has any dynamic indices, so you will need to pick between dynamic or static reshape, accordingly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to use BitcastConvertOp instead of ConvertOp. The implementation now:
- Handles element size differences by adding/removing dimensions as needed
- Supports both dynamic and static reshape operations based on whether dimensions are dynamic
- Follows the pattern from
AffineToStableHLORaising.cpp
Fixed in commit 18cb37a.
src/enzyme_ad/jax/Utils.cpp
Outdated
|
|
||
| // Get element size in bytes | ||
| int64_t elementBytes = 0; | ||
| if (auto complexType = dyn_cast<ComplexType>(elementType)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot you can use AutodiffTypeInterface to simplify this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified to use AutoDiffTypeInterface::getApproxSize() instead of manual bit width calculation. This handles all element types including complex types uniformly.
Fixed in commit 30715f5.
src/enzyme_ad/jax/Utils.h
Outdated
| /// \param inputTensorTypes The tensor types for the wrapper function arguments | ||
| /// \param byteOffsets Byte offsets for each argument (0 means no offset) | ||
| /// \return A new function that adapts the calling convention | ||
| mlir::func::FuncOp adaptToCallingConvention(mlir::func::FuncOp f, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Use --sandbox_debug to see verbose messages from the sandbox and retain the sandbox build root for debugging
In file included from src/enzyme_ad/jax/Dialect/Ops.cpp:14:
./src/enzyme_ad/jax/Utils.h:850:7: error: no member named 'func' in namespace 'mlir'
850 | mlir::func::FuncOp adaptToCallingConvention(mlir::func::FuncOp f,
| ~~~~~~^
./src/enzyme_ad/jax/Utils.h:850:51: error: no member named 'func' in namespace 'mlir'
850 | mlir::func::FuncOp adaptToCallingConvention(mlir::func::FuncOp f,
| ~~~~~~^
src/enzyme_ad/jax/Dialect/Ops.cpp:1776:9: warning: use of bitwise '|' with boolean operands [-Wbitwise-instead-of-logical]
1776 | if (!srcMemRefType.hasStaticShape() | !resMemRefType.hasStaticShape())
| ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
| ||
src/enzyme_ad/jax/Dialect/Ops.cpp:1776:9: note: cast one or both operands to int to silence this warning
1 warning and 2 errors generated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the compilation error by adding the missing include mlir/Dialect/Func/IR/FuncOps.h to Utils.h. This resolves the "no member named 'func' in namespace 'mlir'" error.
Fixed in commit 7fce353.
|
|
... Fixed the compilation error in Fixed in commit a4c219a. |
src/enzyme_ad/jax/Utils.cpp
Outdated
| // Get dynamic dimension from original tensor | ||
| auto dimValue = builder.create<stablehlo::GetDimensionSizeOp>( | ||
| loc, builder.getI32Type(), adaptedArg, i); | ||
| loc, scalarI32Type, adaptedArg, i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@copilot can you use the version that doesn't explicitly pass the return type here [it will be deduced automatically].
moreover, no need to have a separate path for dynamic vs constant, you can always use getdimensionsize as it will get optimized away later
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Simplified the reshape logic:
- Removed conditional branches for static vs dynamic shapes
- Always use
GetDimensionSizeOpfor all dimensions (will be optimized away for static shapes) - Removed explicit type parameter from
ConstantOp::create(type is deduced automatically)
This reduces code complexity from ~73 lines to ~27 lines while maintaining the same functionality.
Fixed in commit 33cf3f6.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
EnzymeJAX Benchmarks
Details
| Benchmark suite | Current: 0bf3be0 | Previous: 4eaf129 | Ratio |
|---|---|---|---|
actmtch / JaX / cpu / Primal |
0.000003997090700022454 s |
0.00000432271910003692 s |
0.92 |
actmtch / JaXPipe / cpu / Primal |
0.000003926248600009785 s |
0.000004038063999996666 s |
0.97 |
actmtch / JaX / cpu / Forward |
0.000006498060900003111 s |
0.000006147442999917985 s |
1.06 |
actmtch / JaXPipe / cpu / Forward |
0.000007799416799980463 s |
0.000007186103599997296 s |
1.09 |
actmtch / JaX / cpu / BothRev |
0.000006351307799968708 s |
0.000006447609199949511 s |
0.99 |
actmtch / JaXPipe / cpu / PreRev |
0.000007397082900024543 s |
0.000007411425099962798 s |
1.00 |
actmtch / JaXPipe / cpu / PostRev |
0.000006514205999974365 s |
0.000006152242799998931 s |
1.06 |
actmtch / JaXPipe / cpu / BothRev |
0.000007388681600014025 s |
0.000007574900300005538 s |
0.98 |
actmtch / JaX / cpu / Primal |
0.000007959970994852483 s |
0.00000432271910003692 s |
1.84 |
actmtch / JaXPipe / cpu / Primal |
0.000007407108996994793 s |
0.000004038063999996666 s |
1.83 |
actmtch / JaX / cpu / Forward |
0.00001070372989634052 s |
0.000006147442999917985 s |
1.74 |
actmtch / JaXPipe / cpu / Forward |
0.000013518221792764962 s |
0.000007186103599997296 s |
1.88 |
actmtch / JaX / cpu / BothRev |
0.000010748748504556716 s |
0.000006447609199949511 s |
1.67 |
actmtch / JaXPipe / cpu / PreRev |
0.00001305638070916757 s |
0.000007411425099962798 s |
1.76 |
actmtch / JaXPipe / cpu / PostRev |
0.000011024106899276376 s |
0.000006152242799998931 s |
1.79 |
actmtch / JaXPipe / cpu / BothRev |
0.000013830481108743698 s |
0.000007574900300005538 s |
1.83 |
actmtch / JaX / gpu / Primal |
0.00007867967350175604 s |
0.0001004922082996 s |
0.78 |
actmtch / JaXPipe / gpu / Primal |
0.00007634758140193298 s |
0.00007687406509649008 s |
0.99 |
actmtch / JaX / gpu / Forward |
0.0001037186417961 s |
0.0001121036239084 s |
0.93 |
actmtch / JaXPipe / gpu / Forward |
0.0001138240601983 s |
0.0001041769690113 s |
1.09 |
actmtch / JaX / gpu / BothRev |
0.0001050011685001 s |
0.0001028986057033 s |
1.02 |
actmtch / JaXPipe / gpu / PreRev |
0.000109642590594 s |
0.0001065279669943 s |
1.03 |
actmtch / JaXPipe / gpu / PostRev |
0.0001110457426984 s |
0.0001057937302975 s |
1.05 |
actmtch / JaXPipe / gpu / BothRev |
0.0001103777486947 s |
0.0001094572576927 s |
1.01 |
actmtch / JaX / cpu / Primal |
0.000003541009000036865 s |
0.00000432271910003692 s |
0.82 |
actmtch / JaXPipe / cpu / Primal |
0.0000037390639990917407 s |
0.000004038063999996666 s |
0.93 |
actmtch / JaX / cpu / Forward |
0.000005564172999584116 s |
0.000006147442999917985 s |
0.91 |
actmtch / JaXPipe / cpu / Forward |
0.000006394844000169542 s |
0.000007186103599997296 s |
0.89 |
actmtch / JaX / cpu / BothRev |
0.000005554064999159892 s |
0.000006447609199949511 s |
0.86 |
actmtch / JaXPipe / cpu / PreRev |
0.000006109179901250173 s |
0.000007411425099962798 s |
0.82 |
actmtch / JaXPipe / cpu / PostRev |
0.000005265038000652566 s |
0.000006152242799998931 s |
0.86 |
actmtch / JaXPipe / cpu / BothRev |
0.000006430619000457227 s |
0.000007574900300005538 s |
0.85 |
actmtch / JaX / tpu / Primal |
0.0001493281617003 s |
0.0001316038411001 s |
1.13 |
actmtch / JaXPipe / tpu / Primal |
0.0001470927938004 s |
0.0001369472692 s |
1.07 |
actmtch / JaX / tpu / Forward |
0.0002280579597005 s |
0.0002186001958005 s |
1.04 |
actmtch / JaXPipe / tpu / Forward |
0.0002229251657001 s |
0.0002202218143997 s |
1.01 |
actmtch / JaX / tpu / BothRev |
0.0002199896746009 s |
0.0002034876092999 s |
1.08 |
actmtch / JaXPipe / tpu / PreRev |
0.0002273046236994 s |
0.0001964800367997 s |
1.16 |
actmtch / JaXPipe / tpu / PostRev |
0.0002233937755998 s |
0.0001991199131996 s |
1.12 |
actmtch / JaXPipe / tpu / BothRev |
0.0002238546845997 s |
0.0002021701384997 s |
1.11 |
actmtch / JaX / cpu / Primal |
0.000004986773699965852 s |
0.00000432271910003692 s |
1.15 |
actmtch / JaXPipe / cpu / Primal |
0.000005192211400026281 s |
0.000004038063999996666 s |
1.29 |
actmtch / JaX / cpu / Forward |
0.000007621491000008973 s |
0.000006147442999917985 s |
1.24 |
actmtch / JaXPipe / cpu / Forward |
0.000009318313299991132 s |
0.000007186103599997296 s |
1.30 |
actmtch / JaX / cpu / BothRev |
0.000007571081399964896 s |
0.000006447609199949511 s |
1.17 |
actmtch / JaXPipe / cpu / PreRev |
0.000009302680199925815 s |
0.000007411425099962798 s |
1.26 |
actmtch / JaXPipe / cpu / PostRev |
0.000007877362999988691 s |
0.000006152242799998931 s |
1.28 |
actmtch / JaXPipe / cpu / BothRev |
0.000009788774000026023 s |
0.000007574900300005538 s |
1.29 |
actmtch / JaX / cpu / Primal |
0.000003608279199943354 s |
0.00000432271910003692 s |
0.83 |
actmtch / JaXPipe / cpu / Primal |
0.0000034439250001014443 s |
0.000004038063999996666 s |
0.85 |
actmtch / JaX / cpu / Forward |
0.000005242891700072505 s |
0.000006147442999917985 s |
0.85 |
actmtch / JaXPipe / cpu / Forward |
0.000006089400000018941 s |
0.000007186103599997296 s |
0.85 |
actmtch / JaX / cpu / BothRev |
0.000005244912499983911 s |
0.000006447609199949511 s |
0.81 |
actmtch / JaXPipe / cpu / PreRev |
0.000006098874999952386 s |
0.000007411425099962798 s |
0.82 |
actmtch / JaXPipe / cpu / PostRev |
0.000005211304099975678 s |
0.000006152242799998931 s |
0.85 |
actmtch / JaXPipe / cpu / BothRev |
0.000006314999999995052 s |
0.000007574900300005538 s |
0.83 |
add_one / JaX / cpu / Primal |
0.000004180887699976665 s |
0.000004221974800020689 s |
0.99 |
add_one / JaXPipe / cpu / Primal |
0.000004145650100008424 s |
0.000004216408699994645 s |
0.98 |
add_one / JaX / cpu / Forward |
0.000007714457600013702 s |
0.000007261218699932215 s |
1.06 |
add_one / JaXPipe / cpu / Forward |
0.000007589819599979819 s |
0.000007260521999978664 s |
1.05 |
add_one / JaX / cpu / BothRev |
0.000008145118999982515 s |
0.000007305787399945984 s |
1.11 |
add_one / JaXPipe / cpu / PreRev |
0.000007709183599990866 s |
0.000007288472500022181 s |
1.06 |
add_one / JaXPipe / cpu / PostRev |
0.00000764160419998916 s |
0.000007354774000032194 s |
1.04 |
add_one / JaXPipe / cpu / BothRev |
0.000007512363399973763 s |
0.000007398826600001485 s |
1.02 |
add_one / JaX / cpu / Primal |
0.000007728873391170055 s |
0.000004221974800020689 s |
1.83 |
add_one / JaXPipe / cpu / Primal |
0.000007856298692058772 s |
0.000004216408699994645 s |
1.86 |
add_one / JaX / cpu / Forward |
0.000011979236104525626 s |
0.000007261218699932215 s |
1.65 |
add_one / JaXPipe / cpu / Forward |
0.00001255826320266351 s |
0.000007260521999978664 s |
1.73 |
add_one / JaX / cpu / BothRev |
0.000012746677501127124 s |
0.000007305787399945984 s |
1.74 |
add_one / JaXPipe / cpu / PreRev |
0.000012860454805195333 s |
0.000007288472500022181 s |
1.76 |
add_one / JaXPipe / cpu / PostRev |
0.000012946941703557968 s |
0.000007354774000032194 s |
1.76 |
add_one / JaXPipe / cpu / BothRev |
0.000012135409796610474 s |
0.000007398826600001485 s |
1.64 |
add_one / JaX / gpu / Primal |
0.00008333408739417791 s |
0.00008429111740551888 s |
0.99 |
add_one / JaXPipe / gpu / Primal |
0.00008366891539189965 s |
0.00008587237860774621 s |
0.97 |
add_one / JaX / gpu / Forward |
0.0001134636863018 s |
0.0001102711505023 s |
1.03 |
add_one / JaXPipe / gpu / Forward |
0.0001106292591081 s |
0.0001065846313023 s |
1.04 |
add_one / JaX / gpu / BothRev |
0.0001151589332963 s |
0.0001105405066977 s |
1.04 |
add_one / JaXPipe / gpu / PreRev |
0.0001253630284103 s |
0.0001127191629027 s |
1.11 |
add_one / JaXPipe / gpu / PostRev |
0.0001151895632967 s |
0.0001098064257996 s |
1.05 |
add_one / JaXPipe / gpu / BothRev |
0.0001183863151003 s |
0.0001152891204925 s |
1.03 |
add_one / JaX / cpu / Primal |
0.000003881492999789771 s |
0.000004221974800020689 s |
0.92 |
add_one / JaXPipe / cpu / Primal |
0.000003868307000084314 s |
0.000004216408699994645 s |
0.92 |
add_one / JaX / cpu / Forward |
0.000005896087001019623 s |
0.000007261218699932215 s |
0.81 |
add_one / JaXPipe / cpu / Forward |
0.000006191526999464259 s |
0.000007260521999978664 s |
0.85 |
add_one / JaX / cpu / BothRev |
0.000006177210999885574 s |
0.000007305787399945984 s |
0.85 |
add_one / JaXPipe / cpu / PreRev |
0.0000061925298999994994 s |
0.000007288472500022181 s |
0.85 |
add_one / JaXPipe / cpu / PostRev |
0.000006194044000585564 s |
0.000007354774000032194 s |
0.84 |
add_one / JaXPipe / cpu / BothRev |
0.00000582200499920873 s |
0.000007398826600001485 s |
0.79 |
add_one / JaX / tpu / Primal |
0.0001510572578001 s |
0.0001306560915996 s |
1.16 |
add_one / JaXPipe / tpu / Primal |
0.0001504162108001 s |
0.0001352252734999 s |
1.11 |
add_one / JaX / tpu / Forward |
0.0002197295777004 s |
0.0001990924121993 s |
1.10 |
add_one / JaXPipe / tpu / Forward |
0.0002242124056007 s |
0.0001940670322997 s |
1.16 |
add_one / JaX / tpu / BothRev |
0.0002226155757001 s |
0.0001956074509995 s |
1.14 |
add_one / JaXPipe / tpu / PreRev |
0.0002116753627007 s |
0.0002013821997003 s |
1.05 |
add_one / JaXPipe / tpu / PostRev |
0.0002153649667001 s |
0.0001968598917002 s |
1.09 |
add_one / JaXPipe / tpu / BothRev |
0.0002122778846998 s |
0.0002120713192998 s |
1.00 |
add_one / JaX / cpu / Primal |
0.000005298372900051618 s |
0.000004221974800020689 s |
1.25 |
add_one / JaXPipe / cpu / Primal |
0.000005342707300042094 s |
0.000004216408699994645 s |
1.27 |
add_one / JaX / cpu / Forward |
0.000008240100799957872 s |
0.000007261218699932215 s |
1.13 |
add_one / JaXPipe / cpu / Forward |
0.000008275048399991647 s |
0.000007260521999978664 s |
1.14 |
add_one / JaX / cpu / BothRev |
0.000008932550300050935 s |
0.000007305787399945984 s |
1.22 |
add_one / JaXPipe / cpu / PreRev |
0.00000886622609996266 s |
0.000007288472500022181 s |
1.22 |
add_one / JaXPipe / cpu / PostRev |
0.000008894981400044344 s |
0.000007354774000032194 s |
1.21 |
add_one / JaXPipe / cpu / BothRev |
0.000008868470900051761 s |
0.000007398826600001485 s |
1.20 |
add_one / JaX / cpu / Primal |
0.000003253050000057556 s |
0.000004221974800020689 s |
0.77 |
add_one / JaXPipe / cpu / Primal |
0.000003339291700103786 s |
0.000004216408699994645 s |
0.79 |
add_one / JaX / cpu / Forward |
0.0000053504666999288016 s |
0.000007261218699932215 s |
0.74 |
add_one / JaXPipe / cpu / Forward |
0.000005292308299976867 s |
0.000007260521999978664 s |
0.73 |
add_one / JaX / cpu / BothRev |
0.000005683908400169457 s |
0.000007305787399945984 s |
0.78 |
add_one / JaXPipe / cpu / PreRev |
0.0000057016749999093005 s |
0.000007288472500022181 s |
0.78 |
add_one / JaXPipe / cpu / PostRev |
0.00000565203330006625 s |
0.000007354774000032194 s |
0.77 |
add_one / JaXPipe / cpu / BothRev |
0.000005653949999941687 s |
0.000007398826600001485 s |
0.76 |
add_two / JaX / cpu / Primal |
0.000004313225599980797 s |
0.000004418934899968008 s |
0.98 |
add_two / JaXPipe / cpu / Primal |
0.000004341828999986319 s |
0.000004456809299972519 s |
0.97 |
add_two / JaX / cpu / Forward |
0.000007426888600002712 s |
0.000007264471400048933 s |
1.02 |
add_two / JaXPipe / cpu / Forward |
0.000007481674000018756 s |
0.000007278208599927894 s |
1.03 |
add_two / JaX / cpu / BothRev |
0.000010262943399993671 s |
0.000009108529399964028 s |
1.13 |
add_two / JaXPipe / cpu / PreRev |
0.000009945009200009736 s |
0.00000916554420000466 s |
1.09 |
add_two / JaXPipe / cpu / PostRev |
0.000009505714100032492 s |
0.000009199619099945266 s |
1.03 |
add_two / JaXPipe / cpu / BothRev |
0.000009943389799991563 s |
0.00000909747879995848 s |
1.09 |
add_two / JaX / cpu / Primal |
0.00000807286590570584 s |
0.000004418934899968008 s |
1.83 |
add_two / JaXPipe / cpu / Primal |
0.000008214305096771568 s |
0.000004456809299972519 s |
1.84 |
add_two / JaX / cpu / Forward |
0.000012371172290295363 s |
0.000007264471400048933 s |
1.70 |
add_two / JaXPipe / cpu / Forward |
0.00001313433590112254 s |
0.000007278208599927894 s |
1.80 |
add_two / JaX / cpu / BothRev |
0.00001532532230485231 s |
0.000009108529399964028 s |
1.68 |
add_two / JaXPipe / cpu / PreRev |
0.00001565401619300246 s |
0.00000916554420000466 s |
1.71 |
add_two / JaXPipe / cpu / PostRev |
0.000015370841103140265 s |
0.000009199619099945266 s |
1.67 |
add_two / JaXPipe / cpu / BothRev |
0.00001516982209868729 s |
0.00000909747879995848 s |
1.67 |
add_two / JaX / gpu / Primal |
0.00008471721590030939 s |
0.00007844409920508043 s |
1.08 |
add_two / JaXPipe / gpu / Primal |
0.00007867487490875646 s |
0.00007744154639076441 s |
1.02 |
add_two / JaX / gpu / Forward |
0.0001074575781938 s |
0.0001076571335084 s |
1.00 |
add_two / JaXPipe / gpu / Forward |
0.0001072541330009 s |
0.0001065247646998 s |
1.01 |
add_two / JaX / gpu / BothRev |
0.0001331359238014 s |
0.0001222250973107 s |
1.09 |
add_two / JaXPipe / gpu / PreRev |
0.0001324179181014 s |
0.0001243835616973 s |
1.06 |
add_two / JaXPipe / gpu / PostRev |
0.0001304696060949 s |
0.0001366734125069 s |
0.95 |
add_two / JaXPipe / gpu / BothRev |
0.0001289001470082 s |
0.0001302451282041 s |
0.99 |
add_two / JaX / cpu / Primal |
0.000004036082999664359 s |
0.000004418934899968008 s |
0.91 |
add_two / JaXPipe / cpu / Primal |
0.000004039502999512479 s |
0.000004456809299972519 s |
0.91 |
add_two / JaX / cpu / Forward |
0.000006100370999774896 s |
0.000007264471400048933 s |
0.84 |
add_two / JaXPipe / cpu / Forward |
0.000006113002001075074 s |
0.000007278208599927894 s |
0.84 |
add_two / JaX / cpu / BothRev |
0.000007006939900747966 s |
0.000009108529399964028 s |
0.77 |
add_two / JaXPipe / cpu / PreRev |
0.000007338292000349611 s |
0.00000916554420000466 s |
0.80 |
add_two / JaXPipe / cpu / PostRev |
0.000007322542999463621 s |
0.000009199619099945266 s |
0.80 |
add_two / JaXPipe / cpu / BothRev |
0.000007309589000942651 s |
0.00000909747879995848 s |
0.80 |
add_two / JaX / tpu / Primal |
0.0001441671007007 s |
0.0001334500969998 s |
1.08 |
add_two / JaXPipe / tpu / Primal |
0.0001455031457997 s |
0.0001331803020002 s |
1.09 |
add_two / JaX / tpu / Forward |
0.0002115925296995 s |
0.0002073666984004 s |
1.02 |
add_two / JaXPipe / tpu / Forward |
0.0002229488035998 s |
0.0002246547675 s |
0.99 |
add_two / JaX / tpu / BothRev |
0.0002298252856999 s |
0.0002315863488998 s |
0.99 |
add_two / JaXPipe / tpu / PreRev |
0.0002332437237011 s |
0.0002238320146003 s |
1.04 |
add_two / JaXPipe / tpu / PostRev |
0.0002353289396007 s |
0.0002091732538996 s |
1.13 |
add_two / JaXPipe / tpu / BothRev |
0.0002287045876 s |
0.0002318442168994 s |
0.99 |
add_two / JaX / cpu / Primal |
0.000005618042599962791 s |
0.000004418934899968008 s |
1.27 |
add_two / JaXPipe / cpu / Primal |
0.000005658938999931706 s |
0.000004456809299972519 s |
1.27 |
add_two / JaX / cpu / Forward |
0.000008504413599985127 s |
0.000007264471400048933 s |
1.17 |
add_two / JaXPipe / cpu / Forward |
0.000008573224799965829 s |
0.000007278208599927894 s |
1.18 |
add_two / JaX / cpu / BothRev |
0.00000990002819999063 s |
0.000009108529399964028 s |
1.09 |
add_two / JaXPipe / cpu / PreRev |
0.000010519565000049624 s |
0.00000916554420000466 s |
1.15 |
add_two / JaXPipe / cpu / PostRev |
0.00001052884950004227 s |
0.000009199619099945266 s |
1.14 |
add_two / JaXPipe / cpu / BothRev |
0.000010488816799988852 s |
0.00000909747879995848 s |
1.15 |
add_two / JaX / cpu / Primal |
0.0000033845667001514813 s |
0.000004418934899968008 s |
0.77 |
add_two / JaXPipe / cpu / Primal |
0.000003580650000003516 s |
0.000004456809299972519 s |
0.80 |
add_two / JaX / cpu / Forward |
0.000005608412500077975 s |
0.000007264471400048933 s |
0.77 |
add_two / JaXPipe / cpu / Forward |
0.000005693983300079708 s |
0.000007278208599927894 s |
0.78 |
add_two / JaX / cpu / BothRev |
0.000006976591599959648 s |
0.000009108529399964028 s |
0.77 |
add_two / JaXPipe / cpu / PreRev |
0.000006991650000054506 s |
0.00000916554420000466 s |
0.76 |
add_two / JaXPipe / cpu / PostRev |
0.000007062358300026972 s |
0.000009199619099945266 s |
0.77 |
add_two / JaXPipe / cpu / BothRev |
0.000007053066700063937 s |
0.00000909747879995848 s |
0.78 |
cache / JaX / cpu / Primal |
0.000003896351999992476 s |
0.000003853892299957806 s |
1.01 |
cache / JaXPipe / cpu / Primal |
0.0000038106513999991874 s |
0.000004203896800026996 s |
0.91 |
cache / JaX / cpu / Forward |
0.000011048681399961424 s |
0.000010372986999936984 s |
1.07 |
cache / JaXPipe / cpu / Forward |
0.000010896471300020492 s |
0.000010036591300013242 s |
1.09 |
cache / JaX / cpu / BothRev |
0.000015986226199993326 s |
0.000014437303700015036 s |
1.11 |
cache / JaXPipe / cpu / PreRev |
0.000012455801899977814 s |
0.000011429817699990964 s |
1.09 |
cache / JaXPipe / cpu / PostRev |
0.000016010530899984586 s |
0.00001467085929998575 s |
1.09 |
cache / JaXPipe / cpu / BothRev |
0.000012278249500013772 s |
0.00001107483489995502 s |
1.11 |
cache / JaX / cpu / Primal |
0.000007772306900005788 s |
0.000003853892299957806 s |
2.02 |
cache / JaXPipe / cpu / Primal |
0.000007581161602865905 s |
0.000004203896800026996 s |
1.80 |
cache / JaX / cpu / Forward |
0.0000132246355060488 s |
0.000010372986999936984 s |
1.27 |
cache / JaXPipe / cpu / Forward |
0.000013659387407824397 s |
0.000010036591300013242 s |
1.36 |
cache / JaX / cpu / BothRev |
0.00002220423220423981 s |
0.000014437303700015036 s |
1.54 |
cache / JaXPipe / cpu / PreRev |
0.000016895696194842458 s |
0.000011429817699990964 s |
1.48 |
cache / JaXPipe / cpu / PostRev |
0.00002364342039218173 s |
0.00001467085929998575 s |
1.61 |
cache / JaXPipe / cpu / BothRev |
0.000017238127801101655 s |
0.00001107483489995502 s |
1.56 |
cache / JaX / gpu / Primal |
0.00007449662729632109 s |
0.00008785822169156745 s |
0.85 |
cache / JaXPipe / gpu / Primal |
0.00007461464160587638 s |
0.00007729562789900228 s |
0.97 |
cache / JaX / gpu / Forward |
0.00009896667409921066 s |
0.0001025989543995 s |
0.96 |
cache / JaXPipe / gpu / Forward |
0.0001025051846052 s |
0.00009873921209946275 s |
1.04 |
cache / JaX / gpu / BothRev |
0.0001070464183925 s |
0.0001095828678924 s |
0.98 |
cache / JaXPipe / gpu / PreRev |
0.0001092811336973 s |
0.0001113129438948 s |
0.98 |
cache / JaXPipe / gpu / PostRev |
0.0001080206439946 s |
0.0001058010298991 s |
1.02 |
cache / JaXPipe / gpu / BothRev |
0.0001067951785051 s |
0.000110681745701 s |
0.96 |
cache / JaX / cpu / Primal |
0.0000036403239995706825 s |
0.000003853892299957806 s |
0.94 |
cache / JaXPipe / cpu / Primal |
0.00000361292500019772 s |
0.000004203896800026996 s |
0.86 |
cache / JaX / cpu / Forward |
0.000006030501000350341 s |
0.000010372986999936984 s |
0.58 |
cache / JaXPipe / cpu / Forward |
0.0000061301370005821805 s |
0.000010036591300013242 s |
0.61 |
cache / JaX / cpu / BothRev |
0.000026510377900558524 s |
0.000014437303700015036 s |
1.84 |
cache / JaXPipe / cpu / PreRev |
0.000006826056999852881 s |
0.000011429817699990964 s |
0.60 |
cache / JaXPipe / cpu / PostRev |
0.00000851549100043485 s |
0.00001467085929998575 s |
0.58 |
cache / JaXPipe / cpu / BothRev |
0.000006731725001009181 s |
0.00001107483489995502 s |
0.61 |
cache / JaX / tpu / Primal |
0.0001310809938004 s |
0.0001323901531999 s |
0.99 |
cache / JaXPipe / tpu / Primal |
0.0001334001228009 s |
0.0001314969213999 s |
1.01 |
cache / JaX / tpu / Forward |
0.0002242326886 s |
0.0002051431588995 s |
1.09 |
cache / JaXPipe / tpu / Forward |
0.0002191389336992 s |
0.0002074977683005 s |
1.06 |
cache / JaX / tpu / BothRev |
0.0002106504586001 s |
0.0002073500963 s |
1.02 |
cache / JaXPipe / tpu / PreRev |
0.0002219002967001 s |
0.0002152243855998 s |
1.03 |
cache / JaXPipe / tpu / PostRev |
0.0002248494405997 s |
0.000208854471 s |
1.08 |
cache / JaXPipe / tpu / BothRev |
0.0002194938315995 s |
0.0002096694617997 s |
1.05 |
cache / JaX / cpu / Primal |
0.000004955122299998039 s |
0.000003853892299957806 s |
1.29 |
cache / JaXPipe / cpu / Primal |
0.000005353656100032822 s |
0.000004203896800026996 s |
1.27 |
cache / JaX / cpu / Forward |
0.00000876926339997226 s |
0.000010372986999936984 s |
0.85 |
cache / JaXPipe / cpu / Forward |
0.0000088710041000013 s |
0.000010036591300013242 s |
0.88 |
cache / JaX / cpu / BothRev |
0.000016098988500016275 s |
0.000014437303700015036 s |
1.12 |
cache / JaXPipe / cpu / PreRev |
0.000012188351300028444 s |
0.000011429817699990964 s |
1.07 |
cache / JaXPipe / cpu / PostRev |
0.00001636639890002698 s |
0.00001467085929998575 s |
1.12 |
cache / JaXPipe / cpu / BothRev |
0.000011755943699972704 s |
0.00001107483489995502 s |
1.06 |
cache / JaX / cpu / Primal |
0.0000029558459000327274 s |
0.000003853892299957806 s |
0.77 |
cache / JaXPipe / cpu / Primal |
0.0000030679999999847497 s |
0.000004203896800026996 s |
0.73 |
cache / JaX / cpu / Forward |
0.000005607462499938265 s |
0.000010372986999936984 s |
0.54 |
cache / JaXPipe / cpu / Forward |
0.00000551809160006087 s |
0.000010036591300013242 s |
0.55 |
cache / JaX / cpu / BothRev |
0.00000717630420003843 s |
0.000014437303700015036 s |
0.50 |
cache / JaXPipe / cpu / PreRev |
0.000006009754200022144 s |
0.000011429817699990964 s |
0.53 |
cache / JaXPipe / cpu / PostRev |
0.000007203641600062838 s |
0.00001467085929998575 s |
0.49 |
cache / JaXPipe / cpu / BothRev |
0.000006225337500109163 s |
0.00001107483489995502 s |
0.56 |
Concat / JaX / cpu / Primal |
0.000004213266799979465 s |
0.000004118979400027456 s |
1.02 |
Concat / JaXPipe / cpu / Primal |
0.000004145884199988359 s |
0.000004153594200033695 s |
1.00 |
Concat / JaX / cpu / Forward |
0.000007232018700005938 s |
0.000006919872399976157 s |
1.05 |
Concat / JaXPipe / cpu / Forward |
0.0000072883987000295745 s |
0.000007298058000014862 s |
1.00 |
Concat / JaX / cpu / BothRev |
0.000007508839999991323 s |
0.000007319889400059765 s |
1.03 |
Concat / JaXPipe / cpu / PreRev |
0.000007580204399982904 s |
0.000007450676299959013 s |
1.02 |
Concat / JaXPipe / cpu / PostRev |
0.000007965602200010834 s |
0.000007663862000026711 s |
1.04 |
Concat / JaXPipe / cpu / BothRev |
0.000007589218200018877 s |
0.0000072419961000377956 s |
1.05 |
Concat / JaX / cpu / Primal |
0.000007536142005119473 s |
0.000004118979400027456 s |
1.83 |
Concat / JaXPipe / cpu / Primal |
0.000007732204895000905 s |
0.000004153594200033695 s |
1.86 |
Concat / JaX / cpu / Forward |
0.000011522021004930138 s |
0.000006919872399976157 s |
1.67 |
Concat / JaXPipe / cpu / Forward |
0.00001226679200772196 s |
0.000007298058000014862 s |
1.68 |
Concat / JaX / cpu / BothRev |
0.000012137844308745117 s |
0.000007319889400059765 s |
1.66 |
Concat / JaXPipe / cpu / PreRev |
0.00001274053279776126 s |
0.000007450676299959013 s |
1.71 |
Concat / JaXPipe / cpu / PostRev |
0.00001296470039524138 s |
0.000007663862000026711 s |
1.69 |
Concat / JaXPipe / cpu / BothRev |
0.000012724650802556425 s |
0.0000072419961000377956 s |
1.76 |
Concat / JaX / gpu / Primal |
0.00007844496759353205 s |
0.00008215341890463606 s |
0.95 |
Concat / JaXPipe / gpu / Primal |
0.00007872560970718042 s |
0.00007829171610064804 s |
1.01 |
Concat / JaX / gpu / Forward |
0.0001064101367956 s |
0.0001089012710959 s |
0.98 |
Concat / JaXPipe / gpu / Forward |
0.0001070073137991 s |
0.0001312667967984 s |
0.82 |
Concat / JaX / gpu / BothRev |
0.000100449445995 s |
0.00009998244401067495 s |
1.00 |
Concat / JaXPipe / gpu / PreRev |
0.00009845183329889553 s |
0.0001032203846029 s |
0.95 |
Concat / JaXPipe / gpu / PostRev |
0.00009899934709537774 s |
0.0001004549468983 s |
0.99 |
Concat / JaXPipe / gpu / BothRev |
0.0001003304288955 s |
0.0001010157385026 s |
0.99 |
Concat / JaX / cpu / Primal |
0.000003798493999056518 s |
0.000004118979400027456 s |
0.92 |
Concat / JaXPipe / cpu / Primal |
0.000003801387999556028 s |
0.000004153594200033695 s |
0.92 |
Concat / JaX / cpu / Forward |
0.000006223907999810763 s |
0.000006919872399976157 s |
0.90 |
Concat / JaXPipe / cpu / Forward |
0.000006233251999947242 s |
0.000007298058000014862 s |
0.85 |
Concat / JaX / cpu / BothRev |
0.0000063260359995183536 s |
0.000007319889400059765 s |
0.86 |
Concat / JaXPipe / cpu / PreRev |
0.000006294475999311544 s |
0.000007450676299959013 s |
0.84 |
Concat / JaXPipe / cpu / PostRev |
0.000006350402999669313 s |
0.000007663862000026711 s |
0.83 |
Concat / JaXPipe / cpu / BothRev |
0.000006315280999115203 s |
0.0000072419961000377956 s |
0.87 |
Concat / JaX / tpu / Primal |
0.0001319705327987 s |
0.0001325698881999 s |
1.00 |
Concat / JaXPipe / tpu / Primal |
0.0001331458936998 s |
0.0001368016931999 s |
0.97 |
Concat / JaX / tpu / Forward |
0.0002216227076991 s |
0.0002081366881 s |
1.06 |
Concat / JaXPipe / tpu / Forward |
0.0002205995636002 s |
0.0002118145583001 s |
1.04 |
Concat / JaX / tpu / BothRev |
0.0002268975886996 s |
0.0002124023682001 s |
1.07 |
Concat / JaXPipe / tpu / PreRev |
0.0002307450266001 s |
0.0002236416785999 s |
1.03 |
Concat / JaXPipe / tpu / PostRev |
0.0002236328875995 s |
0.0002254987172003 s |
0.99 |
Concat / JaXPipe / tpu / BothRev |
0.0002232564115998 s |
0.0002223050649001 s |
1.00 |
Concat / JaX / cpu / Primal |
0.00000529683879994991 s |
0.000004118979400027456 s |
1.29 |
Concat / JaXPipe / cpu / Primal |
0.0000053370968999843175 s |
0.000004153594200033695 s |
1.28 |
Concat / JaX / cpu / Forward |
0.000008529723099945841 s |
0.000006919872399976157 s |
1.23 |
Concat / JaXPipe / cpu / Forward |
0.000011243588799970896 s |
0.000007298058000014862 s |
1.54 |
Concat / JaX / cpu / BothRev |
0.000008948588099974585 s |
0.000007319889400059765 s |
1.22 |
Concat / JaXPipe / cpu / PreRev |
0.000009273370699975205 s |
0.000007450676299959013 s |
1.24 |
Concat / JaXPipe / cpu / PostRev |
0.000009044144000017695 s |
0.000007663862000026711 s |
1.18 |
Concat / JaXPipe / cpu / BothRev |
0.000009044784199977584 s |
0.0000072419961000377956 s |
1.25 |
Concat / JaX / cpu / Primal |
0.0000031528750001598384 s |
0.000004118979400027456 s |
0.77 |
Concat / JaXPipe / cpu / Primal |
0.000003308029199979501 s |
0.000004153594200033695 s |
0.80 |
Concat / JaX / cpu / Forward |
0.000005419037499996194 s |
0.000006919872399976157 s |
0.78 |
Concat / JaXPipe / cpu / Forward |
0.000005445937500007858 s |
0.000007298058000014862 s |
0.75 |
Concat / JaX / cpu / BothRev |
0.000005742929199914215 s |
0.000007319889400059765 s |
0.78 |
Concat / JaXPipe / cpu / PreRev |
0.000005886108300001069 s |
0.000007450676299959013 s |
0.79 |
Concat / JaXPipe / cpu / PostRev |
0.000005832987499888986 s |
0.000007663862000026711 s |
0.76 |
Concat / JaXPipe / cpu / BothRev |
0.000005878074999964156 s |
0.0000072419961000377956 s |
0.81 |
const_scatter / JaX / cpu / Primal |
0.000007618500012540607 s |
0.000008151200017891824 s |
0.93 |
const_scatter / JaXPipe / cpu / Primal |
0.000007463300016752328 s |
0.000006514699998660945 s |
1.15 |
const_scatter / JaX / cpu / Forward |
0.000009423599976798869 s |
0.000009063400011655175 s |
1.04 |
const_scatter / JaXPipe / cpu / Forward |
0.000011239900004511584 s |
0.00001261790002899943 s |
0.89 |
const_scatter / JaX / cpu / Primal |
0.000012630305718630552 s |
0.000008151200017891824 s |
1.55 |
const_scatter / JaXPipe / cpu / Primal |
0.000013527495320886374 s |
0.000006514699998660945 s |
2.08 |
const_scatter / JaX / cpu / Forward |
0.000016224605496972798 s |
0.000009063400011655175 s |
1.79 |
const_scatter / JaXPipe / cpu / Forward |
0.000018308998551219702 s |
0.00001261790002899943 s |
1.45 |
const_scatter / JaX / gpu / Primal |
0.0001326272031292 s |
0.0001290429965592 s |
1.03 |
const_scatter / JaXPipe / gpu / Primal |
0.0001203551888465 s |
0.000115052901674 s |
1.05 |
const_scatter / JaX / gpu / Forward |
0.0001472485950216 s |
0.0001455561025068 s |
1.01 |
const_scatter / JaXPipe / gpu / Forward |
0.0001526967971585 s |
0.0001495250035077 s |
1.02 |
const_scatter / JaX / cpu / Primal |
0.000006606000533793122 s |
0.000008151200017891824 s |
0.81 |
const_scatter / JaXPipe / cpu / Primal |
0.000005460999091155827 s |
0.000006514699998660945 s |
0.84 |
const_scatter / JaX / cpu / Forward |
0.000007145000563468784 s |
0.000009063400011655175 s |
0.79 |
const_scatter / JaXPipe / cpu / Forward |
0.000008122999861370773 s |
0.00001261790002899943 s |
0.64 |
const_scatter / JaX / tpu / Primal |
0.0001594639994436 s |
0.0001576670001668 s |
1.01 |
const_scatter / JaXPipe / tpu / Primal |
0.0001627999998163 s |
0.0001574879999679 s |
1.03 |
const_scatter / JaX / tpu / Forward |
0.0002468279999447 s |
0.0002490200000465 s |
0.99 |
const_scatter / JaXPipe / tpu / Forward |
0.0002401270001428 s |
0.0002563058995292 s |
0.94 |
const_scatter / JaX / cpu / Primal |
0.00000869429995873361 s |
0.000008151200017891824 s |
1.07 |
const_scatter / JaXPipe / cpu / Primal |
0.000007896499937487533 s |
0.000006514699998660945 s |
1.21 |
const_scatter / JaX / cpu / Forward |
0.000010742500035121338 s |
0.000009063400011655175 s |
1.19 |
const_scatter / JaXPipe / cpu / Forward |
0.000012056499963364331 s |
0.00001261790002899943 s |
0.96 |
const_scatter / JaX / cpu / Primal |
0.000005795799916086253 s |
0.000008151200017891824 s |
0.71 |
const_scatter / JaXPipe / cpu / Primal |
0.000004508299934968818 s |
0.000006514699998660945 s |
0.69 |
const_scatter / JaX / cpu / Forward |
0.000007175000064307824 s |
0.000009063400011655175 s |
0.79 |
const_scatter / JaXPipe / cpu / Forward |
0.0000094292001449503 s |
0.00001261790002899943 s |
0.75 |
GenDot / JaX / cpu / Primal |
0.000004334439900003418 s |
0.0000044494697999653 s |
0.97 |
GenDot / JaXPipe / cpu / Primal |
0.00000431988650002495 s |
0.000004301712800042878 s |
1.00 |
GenDot / JaX / cpu / Forward |
0.000006589866799959055 s |
0.000006534536899926025 s |
1.01 |
GenDot / JaXPipe / cpu / Forward |
0.000007258366299993213 s |
0.000007222284700037562 s |
1.00 |
GenDot / JaX / cpu / BothRev |
0.000006922814500012464 s |
0.00000653113630005464 s |
1.06 |
GenDot / JaXPipe / cpu / PreRev |
0.000006992507899985867 s |
0.000007231539700023859 s |
0.97 |
GenDot / JaXPipe / cpu / PostRev |
0.000006916890599995895 s |
0.000006630778999988252 s |
1.04 |
GenDot / JaXPipe / cpu / BothRev |
0.000007373023400032252 s |
0.000007210804799979087 s |
1.02 |
GenDot / JaX / cpu / Primal |
0.000008419840701390058 s |
0.0000044494697999653 s |
1.89 |
GenDot / JaXPipe / cpu / Primal |
0.000008059884095564485 s |
0.000004301712800042878 s |
1.87 |
GenDot / JaX / cpu / Forward |
0.000011680582806002348 s |
0.000006534536899926025 s |
1.79 |
GenDot / JaXPipe / cpu / Forward |
0.00001409860869171098 s |
0.000007222284700037562 s |
1.95 |
GenDot / JaX / cpu / BothRev |
0.000011712606402579696 s |
0.00000653113630005464 s |
1.79 |
GenDot / JaXPipe / cpu / PreRev |
0.000014102566801011562 s |
0.000007231539700023859 s |
1.95 |
GenDot / JaXPipe / cpu / PostRev |
0.00001249842329416424 s |
0.000006630778999988252 s |
1.88 |
GenDot / JaXPipe / cpu / BothRev |
0.000014213752408977598 s |
0.000007210804799979087 s |
1.97 |
GenDot / JaX / gpu / Primal |
0.0000753886177088134 s |
0.0000770615545916371 s |
0.98 |
GenDot / JaXPipe / gpu / Primal |
0.00007696102970512583 s |
0.00007864025720627979 s |
0.98 |
GenDot / JaX / gpu / Forward |
0.0001189607416978 s |
0.0001047521258005 s |
1.14 |
GenDot / JaXPipe / gpu / Forward |
0.0001051759019959 s |
0.0001038891821051 s |
1.01 |
GenDot / JaX / gpu / BothRev |
0.0001028687915997 s |
0.0001059701371006 s |
0.97 |
GenDot / JaXPipe / gpu / PreRev |
0.0001045859346981 s |
0.0001060094214044 s |
0.99 |
GenDot / JaXPipe / gpu / PostRev |
0.0001095040092011 s |
0.0001043687017983 s |
1.05 |
GenDot / JaXPipe / gpu / BothRev |
0.0001246548113995 s |
0.0001038573455065 s |
1.20 |
GenDot / JaX / cpu / Primal |
0.000003733947999717202 s |
0.0000044494697999653 s |
0.84 |
GenDot / JaXPipe / cpu / Primal |
0.0000039676000000326895 s |
0.000004301712800042878 s |
0.92 |
GenDot / JaX / cpu / Forward |
0.0000056551309011410924 s |
0.000006534536899926025 s |
0.87 |
GenDot / JaXPipe / cpu / Forward |
0.000006361286000174005 s |
0.000007222284700037562 s |
0.88 |
GenDot / JaX / cpu / BothRev |
0.000005672235999372788 s |
0.00000653113630005464 s |
0.87 |
GenDot / JaXPipe / cpu / PreRev |
0.000006481445000099484 s |
0.000007231539700023859 s |
0.90 |
GenDot / JaXPipe / cpu / PostRev |
0.000005967179000435863 s |
0.000006630778999988252 s |
0.90 |
GenDot / JaXPipe / cpu / BothRev |
0.0000061590358993271365 s |
0.000007210804799979087 s |
0.85 |
GenDot / JaX / tpu / Primal |
0.0001416969158002 s |
0.0001336162628998 s |
1.06 |
GenDot / JaXPipe / tpu / Primal |
0.0001429140758002 s |
0.0001322941232996 s |
1.08 |
GenDot / JaX / tpu / Forward |
0.0002175676255996 s |
0.0002037468761998 s |
1.07 |
GenDot / JaXPipe / tpu / Forward |
0.0002197217856999 s |
0.0002175650440003 s |
1.01 |
GenDot / JaX / tpu / BothRev |
0.0002231334075986 s |
0.0002133815799999 s |
1.05 |
GenDot / JaXPipe / tpu / PreRev |
0.0002244213557001 s |
0.0002050958227999 s |
1.09 |
GenDot / JaXPipe / tpu / PostRev |
0.0002192814017005 s |
0.0002165789731996 s |
1.01 |
GenDot / JaXPipe / tpu / BothRev |
0.0002229245385999 s |
0.000208698716 s |
1.07 |
GenDot / JaX / cpu / Primal |
0.000005577266000000236 s |
0.0000044494697999653 s |
1.25 |
GenDot / JaXPipe / cpu / Primal |
0.000005912888100010605 s |
0.000004301712800042878 s |
1.37 |
GenDot / JaX / cpu / Forward |
0.000008400676000019302 s |
0.000006534536899926025 s |
1.29 |
GenDot / JaXPipe / cpu / Forward |
0.000009623933700004272 s |
0.000007222284700037562 s |
1.33 |
GenDot / JaX / cpu / BothRev |
0.000007919538699934492 s |
0.00000653113630005464 s |
1.21 |
GenDot / JaXPipe / cpu / PreRev |
0.000009656183599963697 s |
0.000007231539700023859 s |
1.34 |
GenDot / JaXPipe / cpu / PostRev |
0.00000815024440007619 s |
0.000006630778999988252 s |
1.23 |
GenDot / JaXPipe / cpu / BothRev |
0.000009730992799995876 s |
0.000007210804799979087 s |
1.35 |
GenDot / JaX / cpu / Primal |
0.000003409191699938674 s |
0.0000044494697999653 s |
0.77 |
GenDot / JaXPipe / cpu / Primal |
0.0000035208416000386933 s |
0.000004301712800042878 s |
0.82 |
GenDot / JaX / cpu / Forward |
0.000005384200000116835 s |
0.000006534536899926025 s |
0.82 |
GenDot / JaXPipe / cpu / Forward |
0.000006192120800005796 s |
0.000007222284700037562 s |
0.86 |
GenDot / JaX / cpu / BothRev |
0.000005396504100099264 s |
0.00000653113630005464 s |
0.83 |
GenDot / JaXPipe / cpu / PreRev |
0.0000059299874999851455 s |
0.000007231539700023859 s |
0.82 |
GenDot / JaXPipe / cpu / PostRev |
0.000005502708399944822 s |
0.000006630778999988252 s |
0.83 |
GenDot / JaXPipe / cpu / BothRev |
0.000006043787499947939 s |
0.000007210804799979087 s |
0.84 |
hlo_ffi / JaX / cpu / Primal |
0.000005901686299966969 s |
0.000005847319300028176 s |
1.01 |
hlo_ffi / JaXPipe / cpu / Primal |
0.000005750949799994487 s |
0.00000600905880000937 s |
0.96 |
hlo_ffi / JaX / cpu / Forward |
0.000009387307600036366 s |
0.000010458107100021152 s |
0.90 |
hlo_ffi / JaXPipe / cpu / Forward |
0.00000935685409999678 s |
0.000010544669800037809 s |
0.89 |
hlo_ffi / JaX / cpu / BothRev |
0.00000887118929999815 s |
0.000010068745799981116 s |
0.88 |
hlo_ffi / JaXPipe / cpu / PreRev |
0.000009006379300035406 s |
0.0000102450964000127 s |
0.88 |
hlo_ffi / JaXPipe / cpu / PostRev |
0.000009036349300004077 s |
0.00001021032769995145 s |
0.89 |
hlo_ffi / JaXPipe / cpu / BothRev |
0.000008911941999986083 s |
0.000009814178699980402 s |
0.91 |
hlo_ffi / JaX / cpu / Primal |
0.00001038980419980362 s |
0.000005847319300028176 s |
1.78 |
hlo_ffi / JaXPipe / cpu / Primal |
0.000009960035304538906 s |
0.00000600905880000937 s |
1.66 |
hlo_ffi / JaX / cpu / Forward |
0.000015598911803681402 s |
0.000010458107100021152 s |
1.49 |
hlo_ffi / JaXPipe / cpu / Forward |
0.00001595911559415981 s |
0.000010544669800037809 s |
1.51 |
hlo_ffi / JaX / cpu / BothRev |
0.00001490812300471589 s |
0.000010068745799981116 s |
1.48 |
hlo_ffi / JaXPipe / cpu / PreRev |
0.000015583891491405667 s |
0.0000102450964000127 s |
1.52 |
hlo_ffi / JaXPipe / cpu / PostRev |
0.00001497886690776795 s |
0.00001021032769995145 s |
1.47 |
hlo_ffi / JaXPipe / cpu / BothRev |
0.00001533490939764306 s |
0.000009814178699980402 s |
1.56 |
hlo_ffi / JaX / cpu / Primal |
0.000005177496999385767 s |
0.000005847319300028176 s |
0.89 |
hlo_ffi / JaXPipe / cpu / Primal |
0.000004981345000851434 s |
0.00000600905880000937 s |
0.83 |
hlo_ffi / JaX / cpu / Forward |
0.000007536344999971334 s |
0.000010458107100021152 s |
0.72 |
hlo_ffi / JaXPipe / cpu / Forward |
0.000007841493000159971 s |
0.000010544669800037809 s |
0.74 |
hlo_ffi / JaX / cpu / BothRev |
0.000007591574000252877 s |
0.000010068745799981116 s |
0.75 |
hlo_ffi / JaXPipe / cpu / PreRev |
0.000007327418999921065 s |
0.0000102450964000127 s |
0.72 |
hlo_ffi / JaXPipe / cpu / PostRev |
0.0000076197409987798894 s |
0.00001021032769995145 s |
0.75 |
hlo_ffi / JaXPipe / cpu / BothRev |
0.000007595693001348991 s |
0.000009814178699980402 s |
0.77 |
hlo_ffi / JaX / tpu / Primal |
0.00006731245789997047 s |
0.0000534417354996549 s |
1.26 |
hlo_ffi / JaXPipe / tpu / Primal |
0.00006545744989998639 s |
0.00006595535249944078 s |
0.99 |
hlo_ffi / JaX / tpu / Forward |
0.00009244867579982384 s |
0.00008828348919996642 s |
1.05 |
hlo_ffi / JaXPipe / tpu / Forward |
0.00009408290190040134 s |
0.00009019750969964662 s |
1.04 |
hlo_ffi / JaX / tpu / BothRev |
0.00009376757080026437 s |
0.00008760884430012083 s |
1.07 |
hlo_ffi / JaXPipe / tpu / PreRev |
0.0000935300779005047 s |
0.00008698806940010399 s |
1.08 |
hlo_ffi / JaXPipe / tpu / PostRev |
0.0000927415027996176 s |
0.00008632134159997804 s |
1.07 |
hlo_ffi / JaXPipe / tpu / BothRev |
0.0000934878068001126 s |
0.00008898016899984213 s |
1.05 |
hlo_ffi / JaX / cpu / Primal |
0.000007828233599957458 s |
0.000005847319300028176 s |
1.34 |
hlo_ffi / JaXPipe / cpu / Primal |
0.000007398215499961225 s |
0.00000600905880000937 s |
1.23 |
hlo_ffi / JaX / cpu / Forward |
0.000011239203500008444 s |
0.000010458107100021152 s |
1.07 |
hlo_ffi / JaXPipe / cpu / Forward |
0.000011655158399935315 s |
0.000010544669800037809 s |
1.11 |
hlo_ffi / JaX / cpu / BothRev |
0.000011028819900002416 s |
0.000010068745799981116 s |
1.10 |
hlo_ffi / JaXPipe / cpu / PreRev |
0.00001148168280005848 s |
0.0000102450964000127 s |
1.12 |
hlo_ffi / JaXPipe / cpu / PostRev |
0.000011289216699969985 s |
0.00001021032769995145 s |
1.11 |
hlo_ffi / JaXPipe / cpu / BothRev |
0.000011278157200013084 s |
0.000009814178699980402 s |
1.15 |
hlo_ffi / JaX / cpu / Primal |
0.000005022154200014483 s |
0.000005847319300028176 s |
0.86 |
hlo_ffi / JaXPipe / cpu / Primal |
0.0000056268708000061455 s |
0.00000600905880000937 s |
0.94 |
hlo_ffi / JaX / cpu / Forward |
0.000008863041599943244 s |
0.000010458107100021152 s |
0.85 |
hlo_ffi / JaXPipe / cpu / Forward |
0.00000820656659998349 s |
0.000010544669800037809 s |
0.78 |
hlo_ffi / JaX / cpu / BothRev |
0.000007775387500078069 s |
0.000010068745799981116 s |
0.77 |
hlo_ffi / JaXPipe / cpu / PreRev |
0.000007783079100045143 s |
0.0000102450964000127 s |
0.76 |
hlo_ffi / JaXPipe / cpu / PostRev |
0.000007694033299958392 s |
0.00001021032769995145 s |
0.75 |
hlo_ffi / JaXPipe / cpu / BothRev |
0.000008001700000022538 s |
0.000009814178699980402 s |
0.82 |
llama / JaXPipe / cpu / Primal |
0.0008790733800015 s |
0.0009070620700003 s |
0.97 |
llama / JaX / cpu / Primal |
0.0008664061600029 s |
0.0009183065100023 s |
0.94 |
llama / HLOOpt / cpu / Primal |
0.0009387064299971 s |
0.0009796581499995 s |
0.96 |
llama / PartOpt / cpu / Primal |
0.0008618008599978 s |
0.0008917101899987 s |
0.97 |
llama / DefOpt / cpu / Primal |
0.0009418346899974 s |
0.0010110093999992 s |
0.93 |
llama / IPartOpt / cpu / Primal |
0.0008601739799996 s |
0.0008663884799989 s |
0.99 |
llama / IDefOpt / cpu / Primal |
0.000942871899997 s |
0.0009648596399983 s |
0.98 |
llama / JaXPipe / cpu / Forward |
0.0025014982400034 s |
0.0026344963300016 s |
0.95 |
llama / JaX / cpu / Forward |
0.0025063768600011 s |
0.0027127474899953 s |
0.92 |
llama / HLOOpt / cpu / Forward |
0.0024978925200002 s |
0.0026552233999973 s |
0.94 |
llama / PartOpt / cpu / Forward |
0.0025101504299982 s |
0.0026478437200057 s |
0.95 |
llama / DefOpt / cpu / Forward |
0.0025200728300023 s |
0.0027174104200003 s |
0.93 |
llama / IPartOpt / cpu / Forward |
0.0025140163800006 s |
0.0026898783199976 s |
0.93 |
llama / IDefOpt / cpu / Forward |
0.0024844729100004 s |
0.0027245889099958 s |
0.91 |
llama / JaXPipe / cpu / PreRev |
0.0023279723299992 s |
0.0023776364099921 s |
0.98 |
llama / JaXPipe / cpu / PostRev |
0.0022288783300018 s |
0.0023017529899971 s |
0.97 |
llama / JaXPipe / cpu / BothRev |
0.0022629422299996 s |
0.0025017026099976 s |
0.90 |
llama / JaX / cpu / BothRev |
0.0029604959900007 s |
0.0023901793300046 s |
1.24 |
llama / HLOOpt / cpu / PreRev |
0.0022342365199983 s |
0.0023743590400044 s |
0.94 |
llama / HLOOpt / cpu / PostRev |
0.0031285665299992 s |
0.0025415789399994 s |
1.23 |
llama / HLOOpt / cpu / BothRev |
0.0022747619799974 s |
0.0023912669799938 s |
0.95 |
llama / PartOpt / cpu / PreRev |
0.0032696734099999 s |
0.0024195069899997 s |
1.35 |
llama / PartOpt / cpu / PostRev |
0.0022351727799969 s |
0.0025650117500026 s |
0.87 |
llama / PartOpt / cpu / BothRev |
0.002254141190001 s |
0.0030476186499981 s |
0.74 |
llama / DefOpt / cpu / PreRev |
0.0022714347100009 s |
0.0024065999800041 s |
0.94 |
llama / DefOpt / cpu / PostRev |
0.0021115485800009 s |
0.0025496250000014 s |
0.83 |
llama / DefOpt / cpu / BothRev |
0.0022421263199976 s |
0.00235812807 s |
0.95 |
llama / IPartOpt / cpu / PreRev |
0.0022785207700007 s |
0.0023482803099977 s |
0.97 |
llama / IPartOpt / cpu / PostRev |
0.0022602568899992 s |
0.0026355046899971 s |
0.86 |
llama / IPartOpt / cpu / BothRev |
0.0022635631899993 s |
0.0023735800800022 s |
0.95 |
llama / IDefOpt / cpu / PreRev |
0.0022475459400038 s |
0.0023762336199979 s |
0.95 |
llama / IDefOpt / cpu / PostRev |
0.0022911898099982 s |
0.0026436076799927 s |
0.87 |
llama / IDefOpt / cpu / BothRev |
0.0022574747499993 s |
0.0026667473199995 s |
0.85 |
llama / JaXPipe / gpu / Primal |
0.0004311890900135 s |
0.0004405646680388 s |
0.98 |
llama / JaX / gpu / Primal |
0.0004254667439963 s |
0.000442761927843 s |
0.96 |
llama / HLOOpt / gpu / Primal |
0.0004362207720987 s |
0.0004625917419325 s |
0.94 |
llama / PartOpt / gpu / Primal |
0.0004227303420193 s |
0.0004412848060019 s |
0.96 |
llama / DefOpt / gpu / Primal |
0.0004339543480891 s |
0.0004683263779152 s |
0.93 |
llama / IPartOpt / gpu / Primal |
0.0004235758960712 s |
0.0004279165139887 s |
0.99 |
llama / IDefOpt / gpu / Primal |
0.0004432579400017 s |
0.0004204566678963 s |
1.05 |
llama / JaXPipe / gpu / Forward |
0.0007122303738724 s |
0.000731867399998 s |
0.97 |
llama / JaX / gpu / Forward |
0.0006970712421461 s |
0.0007423659800551 s |
0.94 |
llama / HLOOpt / gpu / Forward |
0.0007364804260432 s |
0.00073685156391 s |
1.00 |
llama / PartOpt / gpu / Forward |
0.0007248893980868 s |
0.000728781085927 s |
0.99 |
llama / DefOpt / gpu / Forward |
0.0007087210400495 s |
0.0007214763539377 s |
0.98 |
llama / IPartOpt / gpu / Forward |
0.0007434450581204 s |
0.0007417426460888 s |
1.00 |
llama / IDefOpt / gpu / Forward |
0.000738361551892 s |
0.00071872548596 s |
1.03 |
llama / JaXPipe / gpu / PreRev |
0.0008112011461053 s |
0.0008271565560717 s |
0.98 |
llama / JaXPipe / gpu / PostRev |
0.000802167084068 s |
0.0007912808379624 s |
1.01 |
llama / JaXPipe / gpu / BothRev |
0.0007816833900287 s |
0.0007749211560003 s |
1.01 |
llama / JaX / gpu / BothRev |
0.0007702336199581 s |
0.0008062848078552 s |
0.96 |
llama / HLOOpt / gpu / PreRev |
0.0008215378639288 s |
0.0008034545360133 s |
1.02 |
llama / HLOOpt / gpu / PostRev |
0.0007941843760199 s |
0.0007821569701191 s |
1.02 |
llama / HLOOpt / gpu / BothRev |
0.0008251205959822 s |
0.0007988020300399 s |
1.03 |
llama / PartOpt / gpu / PreRev |
0.0008105801860801 s |
0.0007789686140604 s |
1.04 |
llama / PartOpt / gpu / PostRev |
0.0008206182359717 s |
0.0007617104218807 s |
1.08 |
llama / PartOpt / gpu / BothRev |
0.0007675314799416 s |
0.0007873950500506 s |
0.97 |
llama / DefOpt / gpu / PreRev |
0.0007925384538248 s |
0.00077811361989 s |
1.02 |
llama / DefOpt / gpu / PostRev |
0.0007629182240925 s |
0.0007980892502237 s |
0.96 |
llama / DefOpt / gpu / BothRev |
0.000775857924018 s |
0.0007797850021161 s |
0.99 |
llama / IPartOpt / gpu / PreRev |
0.000824996773852 s |
0.0007723383239936 s |
1.07 |
llama / IPartOpt / gpu / PostRev |
0.0007614350139629 s |
0.0007983354900497 s |
0.95 |
llama / IPartOpt / gpu / BothRev |
0.0007843219821806 s |
0.0008121179000008 s |
0.97 |
llama / IDefOpt / gpu / PreRev |
0.0008118488839827 s |
0.0007694613360799 s |
1.06 |
llama / IDefOpt / gpu / PostRev |
0.000837805734016 s |
0.0008346295780502 s |
1.00 |
llama / IDefOpt / gpu / BothRev |
0.0008061606080736 s |
0.0007705401838757 s |
1.05 |
llama / JaXPipe / tpu / Primal |
0.0003641692600212 s |
0.0003700193600088 s |
0.98 |
llama / JaX / tpu / Primal |
0.0003725840199913 s |
0.0003667153220012 s |
1.02 |
llama / HLOOpt / tpu / Primal |
0.0003493754579976 s |
0.0003621892400115 s |
0.96 |
llama / PartOpt / tpu / Primal |
0.0003572249599965 s |
0.0003722160180041 s |
0.96 |
llama / DefOpt / tpu / Primal |
0.0003298471599991 s |
0.0003532161019975 s |
0.93 |
llama / IPartOpt / tpu / Primal |
0.0003640346000029 s |
0.0003683392780076 s |
0.99 |
llama / IDefOpt / tpu / Primal |
0.0003565773179871 s |
0.0003619300999998 s |
0.99 |
llama / JaXPipe / tpu / Forward |
0.0005655303800012 s |
0.0005549284959997 s |
1.02 |
llama / JaX / tpu / Forward |
0.0007045067999861 s |
0.0006880788080015 s |
1.02 |
llama / HLOOpt / tpu / Forward |
0.000557998698001 s |
0.0005495719179889 s |
1.02 |
llama / PartOpt / tpu / Forward |
0.0005614707780187 s |
0.0005504490180028 s |
1.02 |
llama / DefOpt / tpu / Forward |
0.0005657782780181 s |
0.0005544459579978 s |
1.02 |
llama / IPartOpt / tpu / Forward |
0.0005490356780064 s |
0.000556601098011 s |
0.99 |
llama / IDefOpt / tpu / Forward |
0.0005632427599921 s |
0.0005599457759963 s |
1.01 |
llama / JaXPipe / tpu / PreRev |
0.0004078950599941 s |
0.0004063968900009 s |
1.00 |
llama / JaXPipe / tpu / PostRev |
0.0003579566380067 s |
0.0003579894800059 s |
1.00 |
llama / JaXPipe / tpu / BothRev |
0.0003915941799932 s |
0.0003915591539989 s |
1.00 |
llama / JaX / tpu / BothRev |
0.0003576611600001 s |
0.0003577332599961 s |
1.00 |
llama / HLOOpt / tpu / PreRev |
0.0003915397600212 s |
0.0003914795940072 s |
1.00 |
llama / HLOOpt / tpu / PostRev |
0.0003901715200045 s |
0.0003903120339964 s |
1.00 |
llama / HLOOpt / tpu / BothRev |
0.0003917970799957 s |
0.0003915782719996 s |
1.00 |
llama / PartOpt / tpu / PreRev |
0.000391575799993 s |
0.0003912430719938 s |
1.00 |
llama / PartOpt / tpu / PostRev |
0.0003722512999956 s |
0.0003719408580072 s |
1.00 |
llama / PartOpt / tpu / BothRev |
0.0003915020179993 s |
0.000391519013996 s |
1.00 |
llama / DefOpt / tpu / PreRev |
0.0003914260000165 s |
0.000391232274007 s |
1.00 |
llama / DefOpt / tpu / PostRev |
0.0003828718180011 s |
0.0003829775159974 s |
1.00 |
llama / DefOpt / tpu / BothRev |
0.0003919046800001 s |
0.0003912133319972 s |
1.00 |
llama / IPartOpt / tpu / PreRev |
0.0003914849600114 s |
0.0003914084940042 s |
1.00 |
llama / IPartOpt / tpu / PostRev |
0.0003719533999974 s |
0.0003719622580101 s |
1.00 |
llama / IPartOpt / tpu / BothRev |
0.0003916088000114 s |
0.0003912826739979 s |
1.00 |
llama / IDefOpt / tpu / PreRev |
0.00039188301799 s |
0.0003913627120055 s |
1.00 |
llama / IDefOpt / tpu / PostRev |
0.0003993371599935 s |
0.0003992297720105 s |
1.00 |
llama / IDefOpt / tpu / BothRev |
0.0003915145579958 s |
0.0003912762940017 s |
1.00 |
llama / JaXPipe / cpu / Primal |
0.0011827042300046 s |
0.0009070620700003 s |
1.30 |
llama / JaX / cpu / Primal |
0.0011317790800057 s |
0.0009183065100023 s |
1.23 |
llama / HLOOpt / cpu / Primal |
0.0011683593699945 s |
0.0009796581499995 s |
1.19 |
llama / PartOpt / cpu / Primal |
0.0011567602500053 s |
0.0008917101899987 s |
1.30 |
llama / DefOpt / cpu / Primal |
0.0011728591700011 s |
0.0010110093999992 s |
1.16 |
llama / IPartOpt / cpu / Primal |
0.0010938474700014 s |
0.0008663884799989 s |
1.26 |
llama / IDefOpt / cpu / Primal |
0.0011412098799974 s |
0.0009648596399983 s |
1.18 |
llama / JaXPipe / cpu / Forward |
0.0035031534300014 s |
0.0026344963300016 s |
1.33 |
llama / JaX / cpu / Forward |
0.0036933634199976 s |
0.0027127474899953 s |
1.36 |
llama / HLOOpt / cpu / Forward |
0.0039050821100045 s |
0.0026552233999973 s |
1.47 |
llama / PartOpt / cpu / Forward |
0.0035310587100048 s |
0.0026478437200057 s |
1.33 |
llama / DefOpt / cpu / Forward |
0.0035345260199937 s |
0.0027174104200003 s |
1.30 |
llama / IPartOpt / cpu / Forward |
0.0035073648500019 s |
0.0026898783199976 s |
1.30 |
llama / IDefOpt / cpu / Forward |
0.0036750978000054 s |
0.0027245889099958 s |
1.35 |
llama / JaXPipe / cpu / PreRev |
0.004674913939998 s |
0.0023776364099921 s |
1.97 |
llama / JaXPipe / cpu / PostRev |
0.0038870319899979 s |
0.0023017529899971 s |
1.69 |
llama / JaXPipe / cpu / BothRev |
0.0042478380699958 s |
0.0025017026099976 s |
1.70 |
llama / JaX / cpu / BothRev |
0.0037664859200049 s |
0.0023901793300046 s |
1.58 |
llama / HLOOpt / cpu / PreRev |
0.0046310562999951 s |
0.0023743590400044 s |
1.95 |
llama / HLOOpt / cpu / PostRev |
0.00398862516 s |
0.0025415789399994 s |
1.57 |
llama / HLOOpt / cpu / BothRev |
0.0042612219199963 s |
0.0023912669799938 s |
1.78 |
llama / PartOpt / cpu / PreRev |
0.0041838683100013 s |
0.0024195069899997 s |
1.73 |
llama / PartOpt / cpu / PostRev |
0.0040821811899968 s |
0.0025650117500026 s |
1.59 |
llama / PartOpt / cpu / BothRev |
0.0041917116199965 s |
0.0030476186499981 s |
1.38 |
llama / DefOpt / cpu / PreRev |
0.0041972936800084 s |
0.0024065999800041 s |
1.74 |
llama / DefOpt / cpu / PostRev |
0.0041700734000005 s |
0.0025496250000014 s |
1.64 |
llama / DefOpt / cpu / BothRev |
0.0042314808499941 s |
0.00235812807 s |
1.79 |
llama / IPartOpt / cpu / PreRev |
0.0042748618600035 s |
0.0023482803099977 s |
1.82 |
llama / IPartOpt / cpu / PostRev |
0.004090286770006 s |
0.0026355046899971 s |
1.55 |
llama / IPartOpt / cpu / BothRev |
0.0042200766000041 s |
0.0023735800800022 s |
1.78 |
llama / IDefOpt / cpu / PreRev |
0.0042764093700043 s |
0.0023762336199979 s |
1.80 |
llama / IDefOpt / cpu / PostRev |
0.0046539674799987 s |
0.0026436076799927 s |
1.76 |
llama / IDefOpt / cpu / BothRev |
0.0045957202099998 s |
0.0026667473199995 s |
1.72 |
llama / JaXPipe / cpu / Primal |
0.0015783020900016 s |
0.0009070620700003 s |
1.74 |
llama / JaX / cpu / Primal |
0.0015568662499936 s |
0.0009183065100023 s |
1.70 |
llama / HLOOpt / cpu / Primal |
0.0016686466700048 s |
0.0009796581499995 s |
1.70 |
llama / PartOpt / cpu / Primal |
0.0015393366599892 s |
0.0008917101899987 s |
1.73 |
llama / DefOpt / cpu / Primal |
0.0020607333299994 s |
0.0010110093999992 s |
2.04 |
llama / IPartOpt / cpu / Primal |
0.0015239512499829 s |
0.0008663884799989 s |
1.76 |
llama / IDefOpt / cpu / Primal |
0.0020287945900054 s |
0.0009648596399983 s |
2.10 |
llama / JaXPipe / cpu / Forward |
0.004846803750006 s |
0.0026344963300016 s |
1.84 |
llama / JaX / cpu / Forward |
0.0044243399999868 s |
0.0027127474899953 s |
1.63 |
llama / HLOOpt / cpu / Forward |
0.0045189637500152 s |
0.0026552233999973 s |
1.70 |
llama / PartOpt / cpu / Forward |
0.0041757208299895 s |
0.0026478437200057 s |
1.58 |
llama / DefOpt / cpu / Forward |
0.0041765691699947 s |
0.0027174104200003 s |
1.54 |
llama / IPartOpt / cpu / Forward |
0.0041769620900049 s |
0.0026898783199976 s |
1.55 |
llama / IDefOpt / cpu / Forward |
0.0041393891699954 s |
0.0027245889099958 s |
1.52 |
llama / JaXPipe / cpu / PreRev |
0.0054887808399871 s |
0.0023776364099921 s |
2.31 |
llama / JaXPipe / cpu / PostRev |
0.0089257158300097 s |
0.0023017529899971 s |
3.88 |
llama / JaXPipe / cpu / BothRev |
0.0058713929199984 s |
0.0025017026099976 s |
2.35 |
llama / JaX / cpu / BothRev |
0.0068934879199878 s |
0.0023901793300046 s |
2.88 |
llama / HLOOpt / cpu / PreRev |
0.0072235820799869 s |
0.0023743590400044 s |
3.04 |
llama / HLOOpt / cpu / PostRev |
0.0047854295899924 s |
0.0025415789399994 s |
1.88 |
llama / HLOOpt / cpu / BothRev |
0.0050135379099992 s |
0.0023912669799938 s |
2.10 |
llama / PartOpt / cpu / PreRev |
0.0057779925000068 s |
0.0024195069899997 s |
2.39 |
llama / PartOpt / cpu / PostRev |
0.007452587079988 s |
0.0025650117500026 s |
2.91 |
llama / PartOpt / cpu / BothRev |
0.0054640275000019 s |
0.0030476186499981 s |
1.79 |
llama / DefOpt / cpu / PreRev |
0.0064895804099978 s |
0.0024065999800041 s |
2.70 |
llama / DefOpt / cpu / PostRev |
0.0046924208299969 s |
0.0025496250000014 s |
1.84 |
llama / DefOpt / cpu / BothRev |
0.0053904416600016 s |
0.00235812807 s |
2.29 |
llama / IPartOpt / cpu / PreRev |
0.0065752849999989 s |
0.0023482803099977 s |
2.80 |
llama / IPartOpt / cpu / PostRev |
0.0074076679100107 s |
0.0026355046899971 s |
2.81 |
llama / IPartOpt / cpu / BothRev |
0.0067902704100015 s |
0.0023735800800022 s |
2.86 |
llama / IDefOpt / cpu / PreRev |
0.0051041420799992 s |
0.0023762336199979 s |
2.15 |
llama / IDefOpt / cpu / PostRev |
0.0064526429099896 s |
0.0026436076799927 s |
2.44 |
llama / IDefOpt / cpu / BothRev |
0.0068671083300068 s |
0.0026667473199995 s |
2.58 |
scatter_sum / JaX / cpu / Primal |
0.000004832363500008796 s |
0.0000048909934999755936 s |
0.99 |
scatter_sum / JaXPipe / cpu / Primal |
0.000004754684499994255 s |
0.000004885282499981259 s |
0.97 |
scatter_sum / JaX / cpu / Primal |
0.00000904619509819895 s |
0.0000048909934999755936 s |
1.85 |
scatter_sum / JaXPipe / cpu / Primal |
0.000009014616999775172 s |
0.000004885282499981259 s |
1.85 |
scatter_sum / JaX / gpu / Primal |
0.00009171389249386266 s |
0.00008237985650775954 s |
1.11 |
scatter_sum / JaXPipe / gpu / Primal |
0.00008020986129995435 s |
0.00008149092170642689 s |
0.98 |
scatter_sum / JaX / cpu / Primal |
0.0000044346619994030335 s |
0.0000048909934999755936 s |
0.91 |
scatter_sum / JaXPipe / cpu / Primal |
0.000004420882000704296 s |
0.000004885282499981259 s |
0.90 |
scatter_sum / JaX / tpu / Primal |
0.0001419277717999 s |
0.0001356298545004 s |
1.05 |
scatter_sum / JaXPipe / tpu / Primal |
0.0001317760477992 s |
0.0001356425694 s |
0.97 |
scatter_sum / JaX / cpu / Primal |
0.000006343503100015369 s |
0.0000048909934999755936 s |
1.30 |
scatter_sum / JaXPipe / cpu / Primal |
0.000006200589000036416 s |
0.000004885282499981259 s |
1.27 |
scatter_sum / JaX / cpu / Primal |
0.0000040987292000863816 s |
0.0000048909934999755936 s |
0.84 |
scatter_sum / JaXPipe / cpu / Primal |
0.000004008666700065078 s |
0.000004885282499981259 s |
0.82 |
slicing / JaX / cpu / Primal |
0.00000400879779999741 s |
0.000003731735099972866 s |
1.07 |
slicing / JaXPipe / cpu / Primal |
0.000003754732800007332 s |
0.000003779990699968039 s |
0.99 |
slicing / JaX / cpu / Forward |
0.000006020776099967406 s |
0.000005950906900034169 s |
1.01 |
slicing / JaXPipe / cpu / Forward |
0.000006037684999955673 s |
0.000005920857099954447 s |
1.02 |
slicing / JaX / cpu / BothRev |
0.0000065530998000213004 s |
0.000006349113600026612 s |
1.03 |
slicing / JaXPipe / cpu / PreRev |
0.000006415151100009098 s |
0.000006378296399998362 s |
1.01 |
slicing / JaXPipe / cpu / PostRev |
0.000006437805199993818 s |
0.0000063514679999570944 s |
1.01 |
slicing / JaXPipe / cpu / BothRev |
0.000006442564999997557 s |
0.000006379920299968944 s |
1.01 |
slicing / JaX / cpu / Primal |
0.000006904559303075075 s |
0.000003731735099972866 s |
1.85 |
slicing / JaXPipe / cpu / Primal |
0.00000769358049146831 s |
0.000003779990699968039 s |
2.04 |
slicing / JaX / cpu / Forward |
0.000010330012103077024 s |
0.000005950906900034169 s |
1.74 |
slicing / JaXPipe / cpu / Forward |
0.000010205377906095236 s |
0.000005920857099954447 s |
1.72 |
slicing / JaX / cpu / BothRev |
0.000011530319997109471 s |
0.000006349113600026612 s |
1.82 |
slicing / JaXPipe / cpu / PreRev |
0.000010969558695796876 s |
0.000006378296399998362 s |
1.72 |
slicing / JaXPipe / cpu / PostRev |
0.000010931938595604153 s |
0.0000063514679999570944 s |
1.72 |
slicing / JaXPipe / cpu / BothRev |
0.00001151031949557364 s |
0.000006379920299968944 s |
1.80 |
slicing / JaX / gpu / Primal |
0.00007423781889956444 s |
0.00008208741940325127 s |
0.90 |
slicing / JaXPipe / gpu / Primal |
0.00007395321869989857 s |
0.0000772790311020799 s |
0.96 |
slicing / JaX / gpu / Forward |
0.0001015023936051 s |
0.000107533605094 s |
0.94 |
slicing / JaXPipe / gpu / Forward |
0.0001101642986992 s |
0.0001029518752009 s |
1.07 |
slicing / JaX / gpu / BothRev |
0.0001240822475985 s |
0.0001048876040964 s |
1.18 |
slicing / JaXPipe / gpu / PreRev |
0.000104320134106 s |
0.0001056880464078 s |
0.99 |
slicing / JaXPipe / gpu / PostRev |
0.0001059047498041 s |
0.0001046807646052 s |
1.01 |
slicing / JaXPipe / gpu / BothRev |
0.0001027458920958 s |
0.0001054405164904 s |
0.97 |
slicing / JaX / cpu / Primal |
0.0000034296720012207515 s |
0.000003731735099972866 s |
0.92 |
slicing / JaXPipe / cpu / Primal |
0.000003790075000142679 s |
0.000003779990699968039 s |
1.00 |
slicing / JaX / cpu / Forward |
0.0000052724850000231525 s |
0.000005950906900034169 s |
0.89 |
slicing / JaXPipe / cpu / Forward |
0.000005315870000049472 s |
0.000005920857099954447 s |
0.90 |
slicing / JaX / cpu / BothRev |
0.0000054021880001528185 s |
0.000006349113600026612 s |
0.85 |
slicing / JaXPipe / cpu / PreRev |
0.0000052081179994274864 s |
0.000006378296399998362 s |
0.82 |
slicing / JaXPipe / cpu / PostRev |
0.000005716494999069255 s |
0.0000063514679999570944 s |
0.90 |
slicing / JaXPipe / cpu / BothRev |
0.000005382362999080215 s |
0.000006379920299968944 s |
0.84 |
slicing / JaX / tpu / Primal |
0.0001380333647 s |
0.0001359409484 s |
1.02 |
slicing / JaXPipe / tpu / Primal |
0.0001432499237998 s |
0.0001371587371002 s |
1.04 |
slicing / JaX / tpu / Forward |
0.0002022993186998 s |
0.0002078258362002 s |
0.97 |
slicing / JaXPipe / tpu / Forward |
0.0002218983406011 s |
0.0001961933669001 s |
1.13 |
slicing / JaX / tpu / BothRev |
0.000233848411699 s |
0.0002003272309004 s |
1.17 |
slicing / JaXPipe / tpu / PreRev |
0.0002691642966005 s |
0.0002009658026996 s |
1.34 |
slicing / JaXPipe / tpu / PostRev |
0.0002233608196998 s |
0.0001998378019998 s |
1.12 |
slicing / JaXPipe / tpu / BothRev |
0.0002024495656994 s |
0.0002012461467005 s |
1.01 |
slicing / JaX / cpu / Primal |
0.000004812631299955683 s |
0.000003731735099972866 s |
1.29 |
slicing / JaXPipe / cpu / Primal |
0.000005208300100002816 s |
0.000003779990699968039 s |
1.38 |
slicing / JaX / cpu / Forward |
0.000007152159300039784 s |
0.000005950906900034169 s |
1.20 |
slicing / JaXPipe / cpu / Forward |
0.000007214449399998557 s |
0.000005920857099954447 s |
1.22 |
slicing / JaX / cpu / BothRev |
0.000007795112799976778 s |
0.000006349113600026612 s |
1.23 |
slicing / JaXPipe / cpu / PreRev |
0.000007527211199976591 s |
0.000006378296399998362 s |
1.18 |
slicing / JaXPipe / cpu / PostRev |
0.000008171359000061785 s |
0.0000063514679999570944 s |
1.29 |
slicing / JaXPipe / cpu / BothRev |
0.000008217502899969987 s |
0.000006379920299968944 s |
1.29 |
slicing / JaX / cpu / Primal |
0.0000030225750000681726 s |
0.000003731735099972866 s |
0.81 |
slicing / JaXPipe / cpu / Primal |
0.000002962795799976448 s |
0.000003779990699968039 s |
0.78 |
slicing / JaX / cpu / Forward |
0.0000048160833001020365 s |
0.000005950906900034169 s |
0.81 |
slicing / JaXPipe / cpu / Forward |
0.000004900837499917543 s |
0.000005920857099954447 s |
0.83 |
slicing / JaX / cpu / BothRev |
0.000005454875000032189 s |
0.000006349113600026612 s |
0.86 |
slicing / JaXPipe / cpu / PreRev |
0.000005222341700027755 s |
0.000006378296399998362 s |
0.82 |
slicing / JaXPipe / cpu / PostRev |
0.000005207612500089453 s |
0.0000063514679999570944 s |
0.82 |
slicing / JaXPipe / cpu / BothRev |
0.0000051280499999847966 s |
0.000006379920299968944 s |
0.80 |
sum / JaX / cpu / Primal |
0.000005000576499969611 s |
0.000004994745799922385 s |
1.00 |
sum / JaXPipe / cpu / Primal |
0.00000501835029999711 s |
0.0000050674021000304495 s |
0.99 |
sum / JaX / cpu / Forward |
0.000008460579900020094 s |
0.000008495447899986174 s |
1.00 |
sum / JaXPipe / cpu / Forward |
0.000008415546000014728 s |
0.000008384031599962326 s |
1.00 |
sum / JaX / cpu / BothRev |
0.00000769319760001963 s |
0.00000756230199995116 s |
1.02 |
sum / JaXPipe / cpu / PreRev |
0.000007450494199974855 s |
0.000007305450000058044 s |
1.02 |
sum / JaXPipe / cpu / PostRev |
0.0000073901105000004466 s |
0.0000073038833000282465 s |
1.01 |
sum / JaXPipe / cpu / BothRev |
0.000007332328800021059 s |
0.000007324688100015919 s |
1.00 |
sum / JaX / cpu / Primal |
0.000010474413307383655 s |
0.000004994745799922385 s |
2.10 |
sum / JaXPipe / cpu / Primal |
0.000009953941102139653 s |
0.0000050674021000304495 s |
1.96 |
sum / JaX / cpu / Forward |
0.00001688075460260734 s |
0.000008495447899986174 s |
1.99 |
sum / JaXPipe / cpu / Forward |
0.000016117179393768312 s |
0.000008384031599962326 s |
1.92 |
sum / JaX / cpu / BothRev |
0.000014656212797854096 s |
0.00000756230199995116 s |
1.94 |
sum / JaXPipe / cpu / PreRev |
0.000013954302505590022 s |
0.000007305450000058044 s |
1.91 |
sum / JaXPipe / cpu / PostRev |
0.000014004719094373286 s |
0.0000073038833000282465 s |
1.92 |
sum / JaXPipe / cpu / BothRev |
0.000014472233096603303 s |
0.000007324688100015919 s |
1.98 |
sum / JaX / gpu / Primal |
0.00007706668010214344 s |
0.00007297450290061534 s |
1.06 |
sum / JaXPipe / gpu / Primal |
0.00007460516550345347 s |
0.00007500766330631449 s |
0.99 |
sum / JaX / gpu / Forward |
0.0001040763012948 s |
0.00009954135379521176 s |
1.05 |
sum / JaXPipe / gpu / Forward |
0.0001119275278062 s |
0.000104072627495 s |
1.08 |
sum / JaX / gpu / BothRev |
0.0001090507415006 s |
0.0001044097900041 s |
1.04 |
sum / JaXPipe / gpu / PreRev |
0.0001039508720976 s |
0.000104695389897 s |
0.99 |
sum / JaXPipe / gpu / PostRev |
0.0001077813245006 s |
0.0001131524424999 s |
0.95 |
sum / JaXPipe / gpu / BothRev |
0.0001105986081995 s |
0.0001031012017047 s |
1.07 |
sum / JaX / cpu / Primal |
0.000004460813998593949 s |
0.000004994745799922385 s |
0.89 |
sum / JaXPipe / cpu / Primal |
0.000004507654999906663 s |
0.0000050674021000304495 s |
0.89 |
sum / JaX / cpu / Forward |
0.000006999936999636703 s |
0.000008495447899986174 s |
0.82 |
sum / JaXPipe / cpu / Forward |
0.000007007574899762404 s |
0.000008384031599962326 s |
0.84 |
sum / JaX / cpu / BothRev |
0.00000635134900076082 s |
0.00000756230199995116 s |
0.84 |
sum / JaXPipe / cpu / PreRev |
0.000006122780000441708 s |
0.000007305450000058044 s |
0.84 |
sum / JaXPipe / cpu / PostRev |
0.000006399792000593152 s |
0.0000073038833000282465 s |
0.88 |
sum / JaXPipe / cpu / BothRev |
0.00000614823600044474 s |
0.000007324688100015919 s |
0.84 |
sum / JaX / tpu / Primal |
0.0001303706527993 s |
0.0001351165986001 s |
0.96 |
sum / JaXPipe / tpu / Primal |
0.0001312217837999 s |
0.0001330015779996 s |
0.99 |
sum / JaX / tpu / Forward |
0.0001965279337004 s |
0.0002075366742996 s |
0.95 |
sum / JaXPipe / tpu / Forward |
0.0002000890707 s |
0.0002011080066993 s |
0.99 |
sum / JaX / tpu / BothRev |
0.0002025171187007 s |
0.0001998442750002 s |
1.01 |
sum / JaXPipe / tpu / PreRev |
0.0002027557646986 s |
0.0001997326721 s |
1.02 |
sum / JaXPipe / tpu / PostRev |
0.0002025383686996 s |
0.0001958091610002 s |
1.03 |
sum / JaXPipe / tpu / BothRev |
0.0002075592057008 s |
0.0001974218725001 s |
1.05 |
sum / JaX / cpu / Primal |
0.000006841778099988005 s |
0.000004994745799922385 s |
1.37 |
sum / JaXPipe / cpu / Primal |
0.000006773068800066539 s |
0.0000050674021000304495 s |
1.34 |
sum / JaX / cpu / Forward |
0.00001051428640002996 s |
0.000008495447899986174 s |
1.24 |
sum / JaXPipe / cpu / Forward |
0.000010538372700011676 s |
0.000008384031599962326 s |
1.26 |
sum / JaX / cpu / BothRev |
0.000009628234000047088 s |
0.00000756230199995116 s |
1.27 |
sum / JaXPipe / cpu / PreRev |
0.000009252007699979005 s |
0.000007305450000058044 s |
1.27 |
sum / JaXPipe / cpu / PostRev |
0.000009653888600041682 s |
0.0000073038833000282465 s |
1.32 |
sum / JaXPipe / cpu / BothRev |
0.000009274795399960566 s |
0.000007324688100015919 s |
1.27 |
sum / JaX / cpu / Primal |
0.00000420453329988959 s |
0.000004994745799922385 s |
0.84 |
sum / JaXPipe / cpu / Primal |
0.000004748716599897307 s |
0.0000050674021000304495 s |
0.94 |
sum / JaX / cpu / Forward |
0.000007150462500067079 s |
0.000008495447899986174 s |
0.84 |
sum / JaXPipe / cpu / Forward |
0.000006845187499857275 s |
0.000008384031599962326 s |
0.82 |
sum / JaX / cpu / BothRev |
0.000005786054200143554 s |
0.00000756230199995116 s |
0.77 |
sum / JaXPipe / cpu / PreRev |
0.000005644354200012458 s |
0.000007305450000058044 s |
0.77 |
sum / JaXPipe / cpu / PostRev |
0.000006364983300045424 s |
0.0000073038833000282465 s |
0.87 |
sum / JaXPipe / cpu / BothRev |
0.000005752433399902657 s |
0.000007324688100015919 s |
0.79 |
jaxmd40 / JaXPipe / gpu / Primal |
0.0010172354057431 s |
0.0010001704911701 s |
1.02 |
jaxmd40 / JaX / gpu / Primal |
0.0010172858950681 s |
0.0010029645985923 s |
1.01 |
jaxmd40 / HLOOpt / gpu / Primal |
0.0009585542953573 s |
0.0009456511004827 s |
1.01 |
jaxmd40 / PartOpt / gpu / Primal |
0.0009529570001177 s |
0.0009376848000101 s |
1.02 |
jaxmd40 / DefOpt / gpu / Primal |
0.0006873928010463 s |
0.0006839173031039 s |
1.01 |
jaxmd40 / IPartOpt / gpu / Primal |
0.0009502304950729 s |
0.000953156594187 s |
1.00 |
jaxmd40 / IDefOpt / gpu / Primal |
0.0007023785030469 s |
0.0007083631004206 s |
0.99 |
jaxmd40 / JaX / gpu / Forward |
0.0012639587977901 s |
0.0012411848059855 s |
1.02 |
jaxmd40 / JaXPipe / gpu / PostRev |
0.0040215761051513 s |
0.0039659466943703 s |
1.01 |
jaxmd40 / JaX / gpu / BothRev |
0.0040334549965336 s |
0.0039610521052964 s |
1.02 |
jaxmd40 / HLOOpt / gpu / PostRev |
0.0040183909004554 s |
0.0039657035958953 s |
1.01 |
jaxmd40 / PartOpt / gpu / PostRev |
0.004088949796278 s |
0.0040391021058894 s |
1.01 |
jaxmd40 / DefOpt / gpu / PostRev |
0.0022164603928104 s |
0.0021969946916215 s |
1.01 |
jaxmd40 / IPartOpt / gpu / PostRev |
0.0041022886056452 s |
0.0040737178060226 s |
1.01 |
jaxmd40 / IDefOpt / gpu / PostRev |
0.0025620555039495 s |
0.0025359919993206 s |
1.01 |
jaxmd40 / JaXPipe / tpu / Primal |
0.00009052999957930296 s |
0.00009086499994737096 s |
1.00 |
jaxmd40 / JaX / tpu / Primal |
0.00009974099957617 s |
0.00009508499933872372 s |
1.05 |
jaxmd40 / HLOOpt / tpu / Primal |
0.0001051560000632 s |
0.00009627190011087804 s |
1.09 |
jaxmd40 / PartOpt / tpu / Primal |
0.0001024059994961 s |
0.00009868599954643288 s |
1.04 |
jaxmd40 / DefOpt / tpu / Primal |
0.0001017610004055 s |
0.00009729200028232298 s |
1.05 |
jaxmd40 / IPartOpt / tpu / Primal |
0.0001013629997032 s |
0.00009904199978336692 s |
1.02 |
jaxmd40 / IDefOpt / tpu / Primal |
0.00009613900037948042 s |
0.0001052719999279 s |
0.91 |
jaxmd40 / JaX / tpu / Forward |
0.000175777998811 s |
0.0001827300002332 s |
0.96 |
jaxmd40 / JaXPipe / tpu / PostRev |
0.0001896571004181 s |
0.0001836818999436 s |
1.03 |
jaxmd40 / JaX / tpu / BothRev |
0.0002049339993391 s |
0.00018644390002 s |
1.10 |
jaxmd40 / HLOOpt / tpu / PostRev |
0.0001939510009833 s |
0.0001877629998489 s |
1.03 |
jaxmd40 / PartOpt / tpu / PostRev |
0.0002162780001526 s |
0.0001905639001051 s |
1.13 |
jaxmd40 / DefOpt / tpu / PostRev |
0.0001798499011783 s |
0.0001950009995198 s |
0.92 |
jaxmd40 / IPartOpt / tpu / PostRev |
0.0002004689988098 s |
0.0002046408997557 s |
0.98 |
jaxmd40 / IDefOpt / tpu / PostRev |
0.0002008789990213 s |
0.000191778899898 s |
1.05 |
jaxmd40 / JaXPipe / cpu / Primal |
0.00004659340002035606 s |
0.00006384750004144735 s |
0.73 |
jaxmd40 / JaX / cpu / Primal |
0.00003970959996877354 s |
0.00005445619999591145 s |
0.73 |
jaxmd40 / HLOOpt / cpu / Primal |
0.00006340410000120756 s |
0.00006921930007592891 s |
0.92 |
jaxmd40 / PartOpt / cpu / Primal |
0.00007062989998303237 s |
0.00006391070000972831 s |
1.11 |
jaxmd40 / DefOpt / cpu / Primal |
0.000054476000059366925 s |
0.00006338410003081663 s |
0.86 |
jaxmd40 / IPartOpt / cpu / Primal |
0.00006889109999974607 s |
0.00006723950000377953 s |
1.02 |
jaxmd40 / IDefOpt / cpu / Primal |
0.00006541850007124594 s |
0.00006504289995064027 s |
1.01 |
jaxmd40 / JaX / cpu / Forward |
0.0001052214000083 s |
0.0001060336000591 s |
0.99 |
jaxmd40 / JaXPipe / cpu / PostRev |
0.0001080519999959 s |
0.0001510443999904 s |
0.72 |
jaxmd40 / JaX / cpu / BothRev |
0.0001423084000634 s |
0.0001601678000042 s |
0.89 |
jaxmd40 / HLOOpt / cpu / PostRev |
0.0001274494000426 s |
0.0001304764000451 s |
0.98 |
jaxmd40 / PartOpt / cpu / PostRev |
0.0001518611000392 s |
0.0001408208999237 s |
1.08 |
jaxmd40 / DefOpt / cpu / PostRev |
0.0001082072999452 s |
0.0001125810999837 s |
0.96 |
jaxmd40 / IPartOpt / cpu / PostRev |
0.000148586799969 s |
0.0001393724000081 s |
1.07 |
jaxmd40 / IDefOpt / cpu / PostRev |
0.0001205858000503 s |
0.0001274018999538 s |
0.95 |
This comment was automatically generated by workflow using github-action-benchmark.
|
@copilot the format is wrong. see ci |
- Added function declaration to Utils.h - Implemented adaptToCallingConvention in Utils.cpp - Function creates wrapper that handles element type conversion and byte offsets - Uses stablehlo::ConvertOp for type conversion - Uses stablehlo::SliceOp for byte offset handling Co-authored-by: wsmoses <[email protected]>
- Convert linear byte offset to proper multi-dimensional indices - Use row-major (C-style) layout for offset calculation - Handle arbitrary tensor shapes correctly Co-authored-by: wsmoses <[email protected]>
- Remove trailing whitespace - Add proper handling for complex types in byte offset calculation - Add validation that byte offset aligns with element boundaries - Add bounds checking to prevent out-of-bounds slice access - Add assertions for invalid element types Co-authored-by: wsmoses <[email protected]>
- Use BitcastConvertOp instead of ConvertOp as requested - Handle cases where element sizes differ (add/remove dimensions) - Support both dynamic and static reshape operations - Add element size calculation for complex types - Match pattern from AffineToStableHLORaising.cpp Co-authored-by: wsmoses <[email protected]>
- Add validation that element sizes divide evenly - Use consistent loop variable types (size_t) - Extract size ratio to avoid repeated division - Fix potential integer truncation issues Co-authored-by: wsmoses <[email protected]>
- Add include for Interfaces/AutoDiffTypeInterface.h - Replace manual bit width calculation with getApproxSize() - Simplifies byte offset calculation for all element types - Simplifies size calculation for BitcastConvertOp logic Co-authored-by: wsmoses <[email protected]>
- Rename elementBytes to elementSizeBytes - Rename currentSize/targetSize to currentSizeBytes/targetSizeBytes - Makes it clear that sizes are in bytes, not bits Co-authored-by: wsmoses <[email protected]>
- Add mlir/Dialect/Func/IR/FuncOps.h include to Utils.h - Fixes compilation error: no member named 'func' in namespace 'mlir' Co-authored-by: wsmoses <[email protected]>
- Change ConstantOp to use RankedTensorType::get({}, i32Type) for scalar tensors
- Use cast<ElementsAttr>(makeAttr(...)) for creating constant values
- Fixes compilation error with stablehlo::ConstantOp::build
- Also fix GetDimensionSizeOp to return tensor type instead of scalar
Co-authored-by: wsmoses <[email protected]>
- Define scalarI32Type once at the beginning of type conversion block - Remove duplicate definitions within nested scopes - Improves code maintainability and reduces duplication Co-authored-by: wsmoses <[email protected]>
- Remove conditional branches for static vs dynamic reshape - Always use GetDimensionSizeOp for all dimensions (optimized away for static shapes) - Remove explicit type parameter from ConstantOp::create (type deduced automatically) - Simplifies code and relies on compiler optimizations Co-authored-by: wsmoses <[email protected]>
Run clang-format on Utils.cpp and Utils.h to fix formatting issues Co-authored-by: wsmoses <[email protected]>
f132335 to
0bf3be0
Compare
Implementation Complete ✅
Changes Made:
adaptToCallingConventionfunction declaration toUtils.hadaptToCallingConventioninUtils.cppFunction Capabilities:
The
adaptToCallingConventionutility function:stablehlo::BitcastConvertOpfor element type conversionAutoDiffTypeInterface::getApproxSize()for clean element size calculationstablehlo::SliceOpImplementation Details for Type Conversion:
The implementation follows the pattern from
AffineToStableHLORaising.cppwith improved validation, type safety, clean maintainable code usingAutoDiffTypeInterface, proper MLIR tensor types, and simplified reshape logic that relies on compiler optimizations.Build Status:
✅ All compilation errors fixed
✅ Simplified implementation using uniform GetDimensionSizeOp approach
✅ Code formatting fixed
Original prompt
✨ Let Copilot coding agent set things up for you — coding agent works faster and does higher quality work when set up for your repo.