Make faster matToNDArray safe with non-image number of channels by alanocallaghan · Pull Request #28 · qupath/qupath-extension-djl

alanocallaghan · 2025-02-10T19:03:44Z

This reverts commit f2e5ab0.

The listed commit breaks matToNDArray in instanseg for at least some numbers of input channels. Seems like 1 or 3 input channels works, others fail with:

ai.djl.translate.TranslateException: java.lang.RuntimeException: OpenCV(4.10.0) /home/runner/work/javacpp-presets/javacpp-presets/opencv/cppbuild/linux-x86_64/opencv-4.10.0/modules/core/src/arithm.cpp:658: error: (-209:Sizes of input arguments do not match) The operation is neither 'array op array' (where arrays have the same size and the same number of channels), nor 'array op scalar', nor 'scalar op array' in function 'arithm_op'

	at ai.djl.inference.Predictor.batchPredict(Predictor.java:197)
	at ai.djl.inference.Predictor.predict(Predictor.java:133)
	at qupath.ext.instanseg.core.TilePredictionProcessor.process(TilePredictionProcessor.java:143)
	at qupath.ext.instanseg.core.TilePredictionProcessor.process(TilePredictionProcessor.java:35)
	at qupath.lib.experimental.pixels.PixelProcessor$ProcessorTask.run(PixelProcessor.java:273)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:572)
	at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:317)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1144)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:642)
	at java.base/java.lang.Thread.run(Thread.java:1583)
Caused by: java.lang.RuntimeException: OpenCV(4.10.0) /home/runner/work/javacpp-presets/javacpp-presets/opencv/cppbuild/linux-x86_64/opencv-4.10.0/modules/core/src/arithm.cpp:658: error: (-209:Sizes of input arguments do not match) The operation is neither 'array op array' (where arrays have the same size and the same number of channels), nor 'array op scalar', nor 'scalar op array' in function 'arithm_op'

	at org.bytedeco.opencv.global.opencv_dnn.blobFromImage(Native Method)
	at qupath.ext.djl.DjlTools.matToNDArray(DjlTools.java:415)
	at qupath.ext.instanseg.core.MatTranslator.processInput(MatTranslator.java:61)
	at qupath.ext.instanseg.core.MatTranslator.processInput(MatTranslator.java:20)
	at ai.djl.translate.Translator.batchProcessInput(Translator.java:103)
	at ai.djl.inference.Predictor.batchPredict(Predictor.java:185)
	... 11 common frames omitted
19:03:25.184 [JavaFX Application Thread] [INFO ] q.l.g.TaskRunnerFX$PluginProgressMonitorFX - Processing complete in 0.24 seconds
19:03:25.193 [JavaFX Application Thread] [INFO ] q.l.g.TaskRunnerFX$PluginProgressMonitorFX - Processing complete in 0.01 seconds
19:03:25.412 [instanseg1] [INFO ] q.ext.instanseg.ui.InstanSegTask - Results: InstanSegResults[nPixelsProcessed=262144, nTilesProcessed=1, nTilesFailed=1, nObjectsDetected=0, processingTimeMillis=856, wasInterrupted=false]
19:03:25.436 [instanseg1] [ERROR] q.ext.instanseg.ui.InstanSegTask - 1 tiles failed. This could be a memory issue.
Consider decreasing tile size or the number of threads used.

This reverts commit f2e5ab0.

petebankhead · 2025-02-10T19:33:16Z

Hmmm, guess it makes sense that OpenCV doesn't support other channel numbers.

Unfortunate though, I have a vague memory that the performance cost of having to call array.concat was much too big, at least under some conditions.

I'll have a look tomorrow to see if I can figure out why I made this change, and why I didn't realise it was broken.

I suppose a compromise might be to check the channel numbers? Even if it only works for 3 channels, that's potentially a common enough use case to be worthwhile.

alanocallaghan · 2025-02-10T19:35:32Z

I have not understood the openCV/DJL code well enough to offer any real advice or objections on solutions, but generally speaking I'd say if it's faster and safe in some situations I think it's a good idea to use it then and only default to the slow version when necessary

petebankhead · 2025-02-11T09:21:19Z

I wrote a script to debug it at https://gist.github.com/petebankhead/f088f5880e808a7976cabb7c974d76c3

Basically, I can confirm your observation that it's broken except for 1, 3 or 4 channels (which are all valid for images from OpenCV's perspective).

I'm not seeing a substantial difference in performance in the script, either using CPU or GPU (MPS).

What I think was happening is that it was giving me a performance bottleneck when running in real-world scenarios using MPS with InstanSeg and multithreading. The old code required creating lots of arrays and then concatenating them, and I think that ended up causing too much competition and shifting of data. VisualVM told me this was where the problem was, and switching to blobFromImage helped by simplifying to a single call.

The other option I would have tried was to construct a float[] or FloatBuffer on the Java side with everything reshaped properly, and then call manager.create to create the proper NDArray in a single step. But I didn't fancy having to figure out how to transpose everything properly, and I thought blobFromImage gave me an easy way out.

Conclusion:

I don't want to revert the commit entirely, but we do need to check if we have 3 or 4 channels and only use blobFromImage in that case.
We really need tests for this code... and also to confirm that we get the same results with/without blobFromImage.
Ideally we would have a more optimized conversion for other numbers of channels that doesn't rely on array.concat. For now, we could just add a comment in the code to that effect.

petebankhead · 2025-02-11T09:25:02Z

Reassuringly, when I change the top of the script to

// How many channels do you want?
int nChannels = 4

boolean useGPU = true

try (var scope = new PointerScope()) {
    Device device
    if (useGPU)
        device = GeneralTools.isMac() ? Device.fromName("mps") : Device.gpu()
    else
        device = Device.cpu()

    try (var manager = NDManager.newBaseManager(device)) {
        var mat = new Mat(32, 64, opencv_core.CV_32FC(nChannels))
        opencv_core.randn(mat, OpenCVTools.scalarMat(1.0, mat.depth()), OpenCVTools.scalarMat(1.0, mat.depth()))
        long startTime = System.nanoTime()
        var array = matToNDArray(manager, mat, "NCHW", false)
        var array2 = matToNDArray(manager, mat, "NCHW", true)
        long endTime = System.nanoTime()
        println array
        println "Time: ${(endTime - startTime)/10e9} seconds"
        println "Same? ${array.contentEquals(array2)}"
    }
}

it seems that the results are the same when using blobFromImage and skipping it to revert to the previous code.

alanocallaghan · 2025-02-11T10:00:49Z

Seems like a general-purpose manual reshape would be the best way to go although I don't know that I'm volunteering to do it given I'd want to test it very well, and I certainly don't understand the task well enough to get started yet.

For the time being, reverting everything but the special case sizes sounds sensible

alanocallaghan · 2025-02-11T19:39:37Z

It's not immediately obvious to me how to test this code without packaging pytorch or tensorflow into the DJL extension, which we presumably don't want to do. Otherwise we get ai.djl.engine.EngineException: No deep learning engine found., I think

I'll revisit; with decent tests I'd be happy to write a manual transposer

Revert "Faster Mat to NDArray channels-first conversion"

aed597b

This reverts commit f2e5ab0.

alanocallaghan requested a review from petebankhead February 10, 2025 19:29

Retain faster code for edge case

a0829a5

alanocallaghan changed the title ~~Revert "Faster Mat to NDArray channels-first conversion"~~ Make faster matToNDArray safe with non-image number of channels Feb 11, 2025

alanocallaghan merged commit 072b603 into main Feb 11, 2025
1 check passed

alanocallaghan deleted the revert-faster-conversion branch February 11, 2025 19:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make faster matToNDArray safe with non-image number of channels#28

Make faster matToNDArray safe with non-image number of channels#28
alanocallaghan merged 2 commits intomainfrom
revert-faster-conversion

alanocallaghan commented Feb 10, 2025 •

edited

Loading

Uh oh!

petebankhead commented Feb 10, 2025

Uh oh!

alanocallaghan commented Feb 10, 2025

Uh oh!

petebankhead commented Feb 11, 2025

Uh oh!

petebankhead commented Feb 11, 2025

Uh oh!

alanocallaghan commented Feb 11, 2025

Uh oh!

alanocallaghan commented Feb 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

alanocallaghan commented Feb 10, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

petebankhead commented Feb 10, 2025

Uh oh!

alanocallaghan commented Feb 10, 2025

Uh oh!

petebankhead commented Feb 11, 2025

Uh oh!

petebankhead commented Feb 11, 2025

Uh oh!

alanocallaghan commented Feb 11, 2025

Uh oh!

alanocallaghan commented Feb 11, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alanocallaghan commented Feb 10, 2025 •

edited

Loading