Replies: 2 comments 5 replies
-
Hmmm, I compiled again when I switched batchsize to 4 but then when I went back to backsize 1 it didn't recompile and I got even faster. I suspect the optimization isn't perfect and may improve over time. |
Beta Was this translation helpful? Give feedback.
-
torch.compile() GA. 50 it/s before I finally figured out the mystery of why my 5.8 GHz CPU was only giving my 5.5. |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
-
Just as @vladmandic had been doing I had also been working on trying pytorch 2.0's compile feature.
Today I finally got it working. Issues:
The PTXAS 7.4 llvm stuff refused to generate code for my 4090. Fixed by using some Triton 2.0.0.a2? version.
Then the code just crashed with a CUDA memory violation error. I reported this today to the torch folks with my debugging info and they actually checked in a fix for it within a few hours.
Then I ran into some other bugs that I debugged and fixed in my own local clone of the torch and related source.
Finally it looked like it was hung, which was a good sign, and after 15 minutes of it running internal benchmarks to optimize the code I got:
100%|████████████████| 20/20 [00:00<00:00, 43.44it/s]
100%|████████████████| 20/20 [00:00<00:00, 43.75it/s]
100%|████████████████| 20/20 [00:00<00:00, 43.30it/s]
100%|████████████████| 20/20 [00:00<00:00, 43.43it/s]
Perhaps only about 2.5% faster than I was getting before but the surprising thing was that my GPU was only about 88% busy instead of the normal 98%. So I tried optimizing for batchsize=4. With no compile() I got 13 it/s and with compile I got 15 it/s. A 15% improvement. And an effective 60 it/s with A1111. I suspect I'll see a similar speed up with 756x756 batchsize=1 processing.
But I'm more interested in trying backend="tensorrt" vs the default "inductor" backend. There's a chance that if it works then we'll get close to what VoltaML is doing.
So my next step is to build a local PyTorch with USE_TENSORRT support and see what happens.
Beta Was this translation helpful? Give feedback.
All reactions