Skip to content

[WIP] perf: common neural network workloads benchmarks#1026

Closed
avik-pal wants to merge 1 commit intomasterfrom
ap/nn_bench
Closed

[WIP] perf: common neural network workloads benchmarks#1026
avik-pal wants to merge 1 commit intomasterfrom
ap/nn_bench

Conversation

@avik-pal
Copy link
Copy Markdown
Member

@avik-pal avik-pal commented Aug 11, 2024

currently add CPU versions. should be easy to extend to GPU once SciMLBenchmarks has GPU runners available.

TODOs

  • Restore Manifest to use the released versions of packages
  • Boilerplate
    • Inference
    • Inference Plots
    • Training
    • Training Plots
  • Models
  • Add Reactant Inference Code
    • Training code will have to wait till the compile PR lands in Lux

@ChrisRackauckas
Copy link
Copy Markdown
Member

@thazhemadam @staticfloat what's the easiest way to set this up so that benchmarks can choose a separate runner for GPUs?

@avik-pal avik-pal force-pushed the ap/nn_bench branch 4 times, most recently from 515068b to 3ae1755 Compare August 13, 2024 03:26
@avik-pal
Copy link
Copy Markdown
Member Author

avik-pal commented Aug 13, 2024

Allowing different runners based on the jmd metadata would be a nice way. Something like

---
title: Simple Neural Networks
author: Avik Pal
backend: CUDA <--- If nothing present then use CPU. In future we can allow AMDGPU/Metal/etc with the same syntax
---

@avik-pal avik-pal force-pushed the ap/nn_bench branch 6 times, most recently from 0a29e65 to 48aba8e Compare August 15, 2024 10:28
@avik-pal avik-pal force-pushed the ap/nn_bench branch 2 times, most recently from 2c539db to b88a3a0 Compare August 18, 2024 15:30
@avik-pal
Copy link
Copy Markdown
Member Author

Lux now matches SimpleChains in inference timings 😅. The ones where we fall behind are because Octavian is somewhat slow on EPYC machines so it is turned off

@ChrisRackauckas-Claude
Copy link
Copy Markdown
Contributor

Superseded by #1530 which completes this work: rebased onto master, updated to latest package versions, added Reactant.jl (via @compile and TrainState API), JAX and PyTorch benchmarks (via PythonCall with Python-side timing), completed all benchmark sections (MLP relu/gelu, MLP+BN, LeNet, ResNet) with both inference and training, and configured GPU runner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants