-
Hi, I had two simple question about the memory management in mitsuba in mode 'cuda_ad_rgb': Question 1: When calling Question2: In mode 'cuda_ad_rgb':
I have examples where the GPU memory usage increases at every iteration (similar to a leakage) and sometimes it runs out of memory after many iterations. How can I avoid this? Thanks |
Beta Was this translation helpful? Give feedback.
Replies: 4 comments 2 replies
-
Question 1: Are you using the Question 2: You will need schedule the sampler state after every call to render and force the evaluation of the resulting image. |
Beta Was this translation helpful? Give feedback.
-
Q1: Yes, but even for forward rendering, the minimum possible size of compuation is 'samples_per_pass=1' and that might not be small enough in high resolution and high spp. Is there any other way to make the computation units smaller? Q2: Aren't these already handled in mi.render()? Here: mitsuba3/src/render/integrator.cpp Line 663 in 2d89762 |
Beta Was this translation helpful? Give feedback.
-
A key limit is that OptiX cannot trace more than 1 billion rays in a single kernel. Dr.Jit can deal with at most 4 billion samples at a time. In the future, it might be possible to tweak Dr.Jit's OptiX backend so that it can launch up to 4 kernels in sequence without re-tracing to at least reach 2^32 rays. However, at that point the end is truly reached. You will need multiple passes if you want more samples. If you are rendering more than 2^32 pixels at a time, you will have to devise your own method of rendering multiple image sub-regions and piecing the result back together. However, I think that you will have difficulties opening the resulting image with standard tools. We will not be adapting the spiral class to the GPU, this is meant for interactive CPU rendering (which we in any case don't even support on Mitsuba at the moment). |
Beta Was this translation helpful? Give feedback.
-
By the way, as an idea, I still think a Spiral based rendering would be helpful on the GPU backends. Basically, it could be orthogonal to any backend. It is true that with a sample_per_pass = 1, you can fit your rendering in GPU memory in most cases, with high spp and high res using path tracing, but once you have additional componets in PyTorch (for exmple BRDF parameters etc.) the computation would be too expensive even at samples_per_pass = 1. I wonder what would be the best way to break the computation to even smaller peaces other than Spiral... |
Beta Was this translation helpful? Give feedback.
A key limit is that OptiX cannot trace more than 1 billion rays in a single kernel. Dr.Jit can deal with at most 4 billion samples at a time. In the future, it might be possible to tweak Dr.Jit's OptiX backend so that it can launch up to 4 kernels in sequence without re-tracing to at least reach 2^32 rays.
However, at that point the end is truly reached. You will need multiple passes if you want more samples. If you are rendering more than 2^32 pixels at a time, you will have to devise your own method of rendering multiple image sub-regions and piecing the result back together. However, I think that you will have difficulties opening the resulting image with standard tools.
We will not be…