You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
NVidia recently open sourced their CudaTile Dialect onto Github (https://github.com/NVIDIA/cuda-tile)! This is super exciting since we can now see the guts inside their cuTile DSL. Namely, I can foresee a path to lower from the cuTile DSL into the CudaTile IR, and then lower this "kernel" specific IR into the D2M dialect (specifically the d2m.generic for the kernels).
Considering the massive industry steam behind nvidia and the plethora of example kernels in https://github.com/NVIDIA/TileGym, I think this would be an exciting path to increasing the amount of support and entrypoints into the TT Stack.
I'm curious to hear if this is on the roadmap / planning for TT-Forge, what the approach will potentially look like (what would the entry point into the TT-MLIR Stack look like), and the thoughts of the team on the situation. It seems like there is a new "tile-based parallel DSL" being released every month, hopefully this project has the chance to create unity between the largest "moat" in the industry and TT.
Below I've just added some interesting code snippets from the various nvidia projects:
# Write MLIR module to fileifCUDA_TILE_DUMP_TILEIRisnotNone:
try:
fromcuda.tile_internal._internal_cextimportbytecode_to_mlir_textmlir_text=bytecode_to_mlir_text(bytecode_buf)
ifnotos.path.exists(CUDA_TILE_DUMP_TILEIR):
os.makedirs(CUDA_TILE_DUMP_TILEIR)
base_filename=os.path.basename(func_ir.loc.filename.split(".")[0])
path=os.path.join(
CUDA_TILE_DUMP_TILEIR, f"{base_filename}.ln{func_ir.loc.line}.cuda_tile.mlir"
)
print(f"Dumping TILEIR MLIR module to file:{path}", file=sys.stderr)
withopen(path, "w") asf:
print(mlir_text, file=f)
exceptImportError:
print("Can't print MLIR because the internal extension is missing. ""This is currently not a public feature.", file=sys.stderr)
I find it interesting that it seems like the cuTile code is still using some internal version of what I can only assume is the CudaTile dialect in their compilation flow. This file was last updated 2 weeks ago, I wonder if the development between the 2 will unify or if there will be a "lag" between what users in cuTile are exposed to and what's open in the dialect.
Their structure for data and tiles themselves is pretty cool, the abstraction makes sense considering the architecture differences, but I think it generalizes pretty well to the TT-MLIR Stack. It will be interesting to see how this would mesh together.
In general, this is pretty exciting! Thought I would probe around to ask questions in my fav open source AI compiler project 😆
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
Uh oh!
There was an error while loading. Please reload this page.
-
Hey all!
NVidia recently open sourced their CudaTile Dialect onto Github (https://github.com/NVIDIA/cuda-tile)! This is super exciting since we can now see the guts inside their cuTile DSL. Namely, I can foresee a path to lower from the cuTile DSL into the CudaTile IR, and then lower this "kernel" specific IR into the D2M dialect (specifically the d2m.generic for the kernels).
Considering the massive industry steam behind nvidia and the plethora of example kernels in https://github.com/NVIDIA/TileGym, I think this would be an exciting path to increasing the amount of support and entrypoints into the TT Stack.
I'm curious to hear if this is on the roadmap / planning for TT-Forge, what the approach will potentially look like (what would the entry point into the TT-MLIR Stack look like), and the thoughts of the team on the situation. It seems like there is a new "tile-based parallel DSL" being released every month, hopefully this project has the chance to create unity between the largest "moat" in the industry and TT.
Below I've just added some interesting code snippets from the various nvidia projects:
I find it interesting that it seems like the cuTile code is still using some internal version of what I can only assume is the CudaTile dialect in their compilation flow. This file was last updated 2 weeks ago, I wonder if the development between the 2 will unify or if there will be a "lag" between what users in cuTile are exposed to and what's open in the dialect.
Their structure for data and tiles themselves is pretty cool, the abstraction makes sense considering the architecture differences, but I think it generalizes pretty well to the TT-MLIR Stack. It will be interesting to see how this would mesh together.
In general, this is pretty exciting! Thought I would probe around to ask questions in my fav open source AI compiler project 😆
Beta Was this translation helpful? Give feedback.
All reactions