How can I render volpath fast enough for learning from 50k images at 512x512 resolution? #635

osylum · 2023-04-03T12:56:12Z

osylum
Apr 3, 2023

Summary

As a target, I would like to learn SVBSSDF on 50k images at 1k resolution. But currently my volumetric renders are too slow (even without considering other operations during training). I don't know if I am reaching a hard limit in rendering time, or if I am not setting things up correctly. Could you please help me with that?

System configuration

OS: Windows-10
CPU: Intel64 Family 6 Model 165 Stepping 5, GenuineIntel
GPU: NVIDIA RTX A4000
Python: 3.9.7 (tags/v3.9.7:1016ef3, Aug 30 2021, 20:19:38) [MSC v.1929 64 bit (AMD64)]
NVidia driver: 517.40
CUDA: 10.0.130
LLVM: 15.-1.-1

Dr.Jit: 0.4.0
Mitsuba: 3.2.0
Is custom build? False
Compiled with: MSVC 19.34.31937.0
Variants:
scalar_rgb
scalar_spectral
cuda_ad_rgb
llvm_ad_rgb

Description

I am rendering 128x128 images using volpath (or pbrvolpath). Currently it takes in the order of 1s per image. A dataset of 10'000 images takes me about a day. Learning from half of the dataset for 50 epochs is about 20 days. This is already a lot for iterating on a solution - my training currently does not work yet. Now if I want to render 1k resolution (or at the very least 512 resolution) images and have 50'000 images, the renders are too slow.

How can I render it much faster?

Thank you

Steps to reproduce

The scene is shown below. Meshes have at most around 50k vertices. I can create a minimal example if necessary but would have to provide the model and env map as well.

def create_scene(meshmodel, angle,
			sigmaT, albedo, g,
			nsamples, render_resolution,
			integrator_type = 'vol'):

	meshmodel_dir = '../ITNSceneFiles/models'
	meshmodel_path = os.path.join(meshmodel_dir, meshmodel + ".obj")

	meshmodels = ['armadillo', 'buddha', 'bun', 'bunny', 'bust', 'cap', 'cube', 'dragon', 'lucy', 'soap', 'star_smooth']
	meshes_to_world = {
		'armadillo': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.5, -1.9], 'scale': [1., 1., 1.]}, # armadillo
		'buddha': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.5, -1.9], 'scale': [1.25, 1.25, 1.25]},  # buddha
		'bun': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.5, -1.75], 'scale': [1., 1., 1.]},  # bun
		'bunny': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.5, -1.75], 'scale': [1., 1., 1.]},  # bunny
		'bust': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.6, -2.2], 'scale': [1.2, 1.2, 1.2]},  # bust
		'cap': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.427, -1.75], 'scale': [1.1, 1.1, 1.1]},  # cap
		'cube': {'axis': [0., 1., 0.], 'angle': 45., 'translate': [0., 0.5, -1.5], 'scale': [0.72, 0.72, 0.72]},  # cube
		'dragon': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.44, -1.5], 'scale': [1.25, 1.25, 1.25]},  # dragon
		'lucy': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.625, -2.1875], 'scale': [1.25, 1.25, 1.25]},  # lucy
		'soap': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.39, -1.45], 'scale': [1.1, 1.1, 1.1]},  # soap
		'star_smooth': {'axis': [0., 1., 0.], 'angle': 180., 'translate': [0., 0.5, -1.75], 'scale': [1., 1., 1.]},  # star_smooth
	}
	meshes_center = {
		'armadillo': [-0.01517753, 0.56882549, -1.88521034],
		'buddha': [-0.02132128,  0.39635773, -1.8725358],
		'bun': [-0.0019998 ,  0.49937153, -1.73687635],
		'bunny': [0.06323654,  0.39033739, -1.81459443],
		'bust': [0.00833537,  0.65504917, -2.20630523],
		'cap': [4.69748088e-09, 5.44655545e-01, -1.74999995e+00],
		'cube': [-9.68450061e-09, 5.00000000e-01, -1.49999999e+00],
		'dragon': [0.03068777, 0.34255508, -1.47833066],
		'lucy': [0.00486583, 0.56312339, -2.18508213],
		'soap': [-2.49274629e-05, 3.96504139e-01, -1.44948833e+00],
		'star_smooth': [5.82000555e-08, 4.99999999e-01, -1.75000007e+00],
	}

	# main object
	phase = {
		'type': 'hg',
		'g': g
	}
	mesh_medium = {
		'type': 'homogeneous',
		'sigma_t': sigmaT,
		'albedo': albedo,
		'sample_emitters': True, # should be true with bsdf=null
		'phase': phase
	}
	mesh_translate = meshes_to_world[meshmodel]['translate']
	mesh_rotation_axis = meshes_to_world[meshmodel]['axis']
	mesh_rotation_angle = meshes_to_world[meshmodel]['angle']
	mesh_scale = meshes_to_world[meshmodel]['scale']
	mesh_to_world = mi.ScalarTransform4f.translate(mesh_translate).scale(mesh_scale).rotate(mesh_rotation_axis, mesh_rotation_angle) # note: reverse from xml
	if (integrator_type == 'vol') or (integrator_type == 'prbvol'):
		mesh = {
			'type': 'obj',
			'filename': meshmodel_path,
			'bsdf': {'type': 'null'},
			'interior': mesh_medium,
			'face_normals': True, # buddha has wrong face normals
			'flip_normals': False,
			'to_world': mesh_to_world
		}
	else:
		mesh = {
			'type': 'obj',
			'filename': meshmodel_path,
			'bsdf': {'type': 'diffuse'},
			'face_normals': True, # buddha has wrong face normals
			'flip_normals': False,
			'to_world': mesh_to_world
		}

	# floor
	if False:
		rectangle_reflectance_texture = {
			"type": "bitmap",
			"filename": "../ITNSceneFiles/textures/concrete_darker.png",
			'to_uv': mi.ScalarTransform4f.scale([20., 20., 0.])
		}
		rectangle_bsdf = {
			'type': 'diffuse',
			'reflectance': rectangle_reflectance_texture
		}
		rectangle_to_world = mi.ScalarTransform4f.scale([20.,20.,20.]).rotate([1.,0.,0.], -90.) # note: reverse from xml
		rectangle = {
			'type': 'rectangle',
			'to_world': rectangle_to_world,
			'bsdf': rectangle_bsdf
		}

	# light
	# sunsky emitter does not exist in mitsuba3
	if True:
		envmap_to_world = mi.ScalarTransform4f.rotate([0.,1.,0.], angle)
		envmaps_dir = "../ITNSceneFiles/envmaps"
		envmap_filepath =os.path.join(envmaps_dir, 'dreifaltigkeitsberg_512.exr')
		emitter = {
			'type': 'envmap',
			'filename': envmap_filepath,
			'scale': 1.,
			'to_world': envmap_to_world
		}
	else:
		origin = [0.0, 8.784608840942383, -32.78460693359375]
		light_to_world = mi.ScalarTransform4f.rotate([0.,1.,0.], 60.)
		origin = light_to_world @ origin
		emitter = {
			'type': 'point',
			'position': origin,
			'intensity': 2000.,
			#'to_world': light_to_world
		}

	if True: # from xml
		fov = 2.5
		camera_to_world = mi.ScalarTransform4f.rotate([1., 0., 0.], -30.).look_at( #-30
			origin=[0, 24, -24],
			target=[0, 23, -23],
			up=[0, 1, 1]
		) # note: reverse from xml
		if False:
			origin, target, up = inverse_look_at(camera_to_world)
			print(f'origin: {origin}')
			print(f'target: {target}')
			print(f'up: {up}')
	else: # bring camera closer instead of changing fov (does not make rendering faster)
		fov = 10.5
		mesh_center = [-0.01517753, 0.56882549, -1.88521034]
		origin = [0.0, 8.784608840942383, -32.78460693359375]
		target = mesh_center #[0.0, 8.525790214538574, -31.818681716918945]
		up = [0.0, 0.9659256935119629, 0.25881898403167725]
		dir = np.array(target) - np.array(origin)
		dist = np.linalg.norm(dir)
		print(f'{dist=}')
		dir /= dist
		origin = target - dir * dist * 0.25
		camera_to_world = mi.ScalarTransform4f.look_at(
			origin=origin,
			target=target,
			up=up
		)
	sensor = mi.load_dict({
		"type": "perspective",
		#"fov_axis": "x",
		"fov": fov, #10.5, #2.5, 43.5
		"film": {
			"type": "hdrfilm",
			"width": render_resolution[0],
			"height": render_resolution[1],
			"pixel_format": "luminance", # "rgb", "rgba", luminance
			"rfilter": {"type": "gaussian"}, #, "stddev": 0.25},
			# "rfilter": {"type": "lanczos"},
			"banner": False
		},
		'near_clip': 1.0,
		'far_clip': 100.,
		"to_world": camera_to_world,
		'sampler': {
			'type': 'ldsampler', # ldsampler, independent, stratified, multijitter, orthogonal
			'sample_count': nsamples # seems to read value slightly above nsamples and then upsample to next power of 2
		}
	})

	# integrator
	if integrator_type == 'aov':
		integrator = {
			"type": "aov",
			"aovs": "pp:position,nn:sh_normal",  # gnn:geo_normal,uv:uv",
			"shadow_integrator": {
				"type": "direct",
				"hide_emitters": True,
			}
		}
	elif integrator_type == 'vol':
		integrator = {
			'type': 'volpath', # volpath, volpathmis, prbvolpath
			'max_depth': -1,
			'rr_depth': 5,  # 5: 24s, 20: 31s
			"hide_emitters": True,
		}
	elif integrator_type == 'prbvol':
		integrator = {
			'type': 'volpath',  # volpath, volpathmis, prbvolpath
			'max_depth': -1,
			'rr_depth': 5,  # 5: 24s, 20: 31s
			"hide_emitters": True,
		}
	else:
		integrator = {
			'type': 'direct', # volpath, volpathmis, prbvolpath
			"hide_emitters": False,
		}


	# scene
	scene = mi.load_dict(
		{
			"type": "scene",
			"emitter": emitter,
			"object": mesh,
			#"floor": rectangle,
			"sensor": sensor,
			"integrator": integrator
		}
	)

	return scene, sensor

njroussel · 2023-04-03T14:04:52Z

njroussel
Apr 3, 2023
Collaborator

Hi @osylum

These numbers don't seem unreasonable. I think you might just have hit a system limit. Is there anything in particular that leaves you to believe this can be speedup ?

0 replies

osylum · 2023-04-03T14:25:12Z

osylum
Apr 3, 2023
Author

I see. Thanks, it is helpful to know that I reached a limit.

I was just unclear if it was possible because of my lack of experience in rendering. I was comparing with NeRF that can learn 1k images and get per-voxel color and density. It is not the same type of learning of course. But I thought I could learn the BSSDF still as fast. From my understanding, NeRF only need to do ray casting for rendering. Maybe that is why it is significantly faster - not sure though.

1 reply

njroussel Apr 3, 2023
Collaborator

I mean, I obviously can't tell for sure, but nothing in what you sent seems to be super unreasonable to me.

I'm a bit confused by your setup. You are generating reference images with Mitsuba? Are you actually using any inverse rendering features? Or is this just traditional rendering?

osylum · 2023-04-03T14:33:35Z

osylum
Apr 3, 2023
Author

I was surprised that activating cuda variant was not faster than using llvm. Maybe because of the loading of the data to gpu. Should I however expect a (strong) speed up in this case?

1 reply

njroussel Apr 4, 2023
Collaborator

The only migrations you should need are loading the scene onto the gpu and then migrating the rendered output to the host. That's it. You should also only load the scene once, and re-use it as much as possible.

osylum · 2023-04-03T15:42:51Z

osylum
Apr 3, 2023
Author

I am first generating a dataset using traditional rendering. The dataset is created from 10 different models, variations in the 3 BSSDF parameters and rotation of the env map, giving a total of around 10k images. Then I use the dataset (part of it as train set) for inverse rendering (pbrvolpath) to learn the BSSDF model parameters through a torch CNN neural network. At each epoch I have to render the whole train set using the predicted BSSDF model parameters, which is expensive.

0 replies

osylum · 2023-04-04T12:51:15Z

osylum
Apr 4, 2023
Author

I believe I re-use the scene. I define first a list of scenes for each model and rotations of the env map (maybe the later can be changed with traverse as well). In the rendering loop I pick one scene from the list, traverse, update the bssdf parameters and render (mi.render(scene)).

Here are some timings for rendering 11 images (a bust geo model):

using llvm and picking one of the scenes (mi.render(scene)): total time: 73.1724349, time per image: 6.652039536363636
using cuda and picking one of the scenes (mi.render(scene)): total time: 125.7214474, time per image: 11.42922249090909
using cuda and picking one of the scenes (mi.render(scene_00)): total time: 55.500948400000006, time per image: 5.045540763636364
note: IO time is negligeable here in comparison

rendered image for illustration:

Below is the part of the code I use for rendering. Enabling the line mi.render(scene_00) was faster for cuda as compared to grabing the scene from the scenes list, but not significantly different than using llvm.

I refer to https://mitsuba.readthedocs.io/en/stable/src/rendering/editing_a_scene.html to re-use/update the scene, but maybe I am doing something no optimal here in the re-use of the scene?

    render_resolution = [128,128]
    nsamples = 2048 # 1024 2048
    # note: taking half because mitsuba raise to next power of two
    nsamples_str = f'{nsamples*2}'
    traintest = 'test' # train, test, all
    
    dataset_path = os.path.join(datasets_path, f'{render_resolution[0]}x{render_resolution[1]}')
    print(f'{dataset_path=}')
    
    
    # define mesh models
    meshmodels_train = ['armadillo', 'buddha', 'bun', 'cube', 'dragon', 'star_smooth'] # train models
    meshmodels_test = ['bust', 'cap'] #['bunny', 'bust', 'cap', 'lucy', 'soap'] # test models
    meshmodels_all = ['armadillo', 'buddha', 'bun', 'bunny', 'bust', 'cap', 'cube', 'dragon', 'lucy', 'soap', 'star_smooth']
    numvertices = [43243,        49929,  23787,   34835,   18187, 158708, 8887,   50000,   50001,   1233,    1152]
    
    if traintest == 'train':
    meshmodels = meshmodels_train
    elif traintest == 'test':
    meshmodels = meshmodels_test
    elif traintest == 'all':
    meshmodels = meshmodels_all
    
    list_angle = [0., -30., -60., -90., -120., -150., -180.]
    list_sigma_t = [30,  36,  44,  54,  67,  82, 100, 122, 150, 184, 225, 276]
    list_albedo = [0.39, 0.59, 0.74, 0.87, 0.95]
    list_g = [0., 0.2, 0.4, 0.6, 0.8]
    dataset_params = [{
			    'filename': f"{meshmodel}_e{angle}_d{sigma_t}_a{albedo}_g{g}_q{nsamples_str}.exr",
			    'filename_weights': f"{meshmodel}_e{angle}_d{sigma_t}_a{albedo}_g{g}_q{nsamples_str}.exr",
			    'filename_mask': f"{meshmodel}_mask.exr",
			    'meshmodel':meshmodel,
			    'angle': angle,
			    'sigma_t':sigma_t,
			    'albedo':albedo,
			    'g':g
		    }
			     for meshmodel in meshmodels
			     for angle in list_angle
			     for sigma_t in list_sigma_t
			     for albedo in list_albedo
			     for g in list_g
	    ]
    print(f'len(dataset_params): {len(dataset_params)}')
    dataset_numsamples = len(meshmodels) * len(list_angle) * len(list_sigma_t) * len(list_albedo) * len(list_g)
    print(f'dataset_numsamples: {dataset_numsamples}')
    print(f'dataset_params\n: {dataset_params[:10]}')

    renders_path = os.path.join(dataset_path, 'renders')
    if not os.path.exists(renders_path):
	    os.makedirs(renders_path)
    
    # create scene for each model
    scenes = {}
    for meshmodel in meshmodels:
	    for angle in list_angle: # FIXME: how to update angle
		    scenes[meshmodel + '_' + str(angle)], _ = create_scene(meshmodel, angle,
															 	    sigmaT = 30., albedo = 0.8, g = 0.2,
															 	    nsamples = nsamples, render_resolution = render_resolution,
																    integrator_type='vol')
    # access param and define keys to change params values
    print('scenes keys:', scenes.keys())
    scene_name = meshmodels[0] + '_' + str(list_angle[0])
    print(f'{scene_name=}')
    scene_00 = scenes[scene_name]
    scene_params = mi.traverse(scene_00)
    print(scene_params)
    key_sigma_t = 'object.interior_medium.sigma_t.value.value'
    key_albedo = 'object.interior_medium.albedo.value.value'
    key_g = 'object.interior_medium.phase_function.g'
    
    # export renders
    meshmodel_prev = ""
    start_time = timer()
    for i, dataset_param in enumerate(dataset_params):
    
	    if i > 10: # REMOVE ME
		    break
    
	    meshmodel = dataset_param['meshmodel']
    
	    angle = dataset_param['angle']
    
	    sigma_t = dataset_param['sigma_t']
	    albedo = dataset_param['albedo']
	    g = dataset_param['g']
    
	    #scene = scenes[meshmodel + '_' + str(angle)]
    
	    scene_params = mi.traverse(scene_00)
	    scene_params[key_sigma_t] = sigma_t
	    scene_params[key_albedo] = albedo
	    scene_params[key_g] = g
	    scene_params.update()
    
	    #img = mi.render(scene).numpy()
	    img = mi.render(scene_00).numpy()
    
	    filename = f"{meshmodel}_e{angle}_d{sigma_t}_a{albedo}_g{g}_q{nsamples_str}"
	    filepath = os.path.join(renders_path, filename + '.exr')
	    pyexr.write(filepath, img.astype(np.float32))
    
	    elapsed_time = timer() - start_time
	    print(f'wrote file {i}: {filename} (elapsed time: {elapsed_time}, per step: {elapsed_time / (i+1)})')

1 reply

njroussel Apr 4, 2023
Collaborator

I think there is a typo in your message:

using cuda and picking one of the scenes (mi.render(scene)): total time: 125.7214474, time per image: 11.42922249090909
using cuda and picking one of the scenes (mi.render(scene)): total time: 55.500948400000006, time per image: 5.045540763636364

What's the difference between these two?

osylum · 2023-04-04T13:37:08Z

osylum
Apr 4, 2023
Author

Yes, last line should be scene_00. I corrected.

0 replies

How can I render volpath fast enough for learning from 50k images at 512x512 resolution? #635

Uh oh!

Uh oh!

osylum Apr 3, 2023

Summary

System configuration

Description

Steps to reproduce

Replies: 6 comments · 3 replies

Uh oh!

njroussel Apr 3, 2023 Collaborator

Uh oh!

osylum Apr 3, 2023 Author

Uh oh!

njroussel Apr 3, 2023 Collaborator

Uh oh!

osylum Apr 3, 2023 Author

Uh oh!

njroussel Apr 4, 2023 Collaborator

Uh oh!

osylum Apr 3, 2023 Author

Uh oh!

Uh oh!

osylum Apr 4, 2023 Author

Uh oh!

njroussel Apr 4, 2023 Collaborator

Uh oh!

osylum Apr 4, 2023 Author

osylum
Apr 3, 2023

Replies: 6 comments 3 replies

njroussel
Apr 3, 2023
Collaborator

osylum
Apr 3, 2023
Author

njroussel Apr 3, 2023
Collaborator

osylum
Apr 3, 2023
Author

njroussel Apr 4, 2023
Collaborator

osylum
Apr 3, 2023
Author

osylum
Apr 4, 2023
Author

njroussel Apr 4, 2023
Collaborator

osylum
Apr 4, 2023
Author