Make faster matToNDArray safe with non-image number of channels#28
Make faster matToNDArray safe with non-image number of channels#28alanocallaghan merged 2 commits intomainfrom
Conversation
This reverts commit f2e5ab0.
|
Hmmm, guess it makes sense that OpenCV doesn't support other channel numbers. Unfortunate though, I have a vague memory that the performance cost of having to call I'll have a look tomorrow to see if I can figure out why I made this change, and why I didn't realise it was broken. I suppose a compromise might be to check the channel numbers? Even if it only works for 3 channels, that's potentially a common enough use case to be worthwhile. |
|
I have not understood the openCV/DJL code well enough to offer any real advice or objections on solutions, but generally speaking I'd say if it's faster and safe in some situations I think it's a good idea to use it then and only default to the slow version when necessary |
|
I wrote a script to debug it at https://gist.github.com/petebankhead/f088f5880e808a7976cabb7c974d76c3 Basically, I can confirm your observation that it's broken except for 1, 3 or 4 channels (which are all valid for images from OpenCV's perspective). I'm not seeing a substantial difference in performance in the script, either using CPU or GPU (MPS). What I think was happening is that it was giving me a performance bottleneck when running in real-world scenarios using MPS with InstanSeg and multithreading. The old code required creating lots of arrays and then concatenating them, and I think that ended up causing too much competition and shifting of data. VisualVM told me this was where the problem was, and switching to The other option I would have tried was to construct a Conclusion:
|
|
Reassuringly, when I change the top of the script to // How many channels do you want?
int nChannels = 4
boolean useGPU = true
try (var scope = new PointerScope()) {
Device device
if (useGPU)
device = GeneralTools.isMac() ? Device.fromName("mps") : Device.gpu()
else
device = Device.cpu()
try (var manager = NDManager.newBaseManager(device)) {
var mat = new Mat(32, 64, opencv_core.CV_32FC(nChannels))
opencv_core.randn(mat, OpenCVTools.scalarMat(1.0, mat.depth()), OpenCVTools.scalarMat(1.0, mat.depth()))
long startTime = System.nanoTime()
var array = matToNDArray(manager, mat, "NCHW", false)
var array2 = matToNDArray(manager, mat, "NCHW", true)
long endTime = System.nanoTime()
println array
println "Time: ${(endTime - startTime)/10e9} seconds"
println "Same? ${array.contentEquals(array2)}"
}
}it seems that the results are the same when using |
|
Seems like a general-purpose manual reshape would be the best way to go although I don't know that I'm volunteering to do it given I'd want to test it very well, and I certainly don't understand the task well enough to get started yet. For the time being, reverting everything but the special case sizes sounds sensible |
|
It's not immediately obvious to me how to test this code without packaging pytorch or tensorflow into the DJL extension, which we presumably don't want to do. Otherwise we get I'll revisit; with decent tests I'd be happy to write a manual transposer |
This reverts commit f2e5ab0.
The listed commit breaks matToNDArray in instanseg for at least some numbers of input channels. Seems like 1 or 3 input channels works, others fail with: