Potential optimizations

Returning early from interact kernels before constructing thread views (@pcanal)
Pulling RNG state into local memory in RNGEngine constructor, then writing back to global memory in the RNGEngine destructor
Rearrange memory layout of data to have more struct-of-array accesses (e.g. have MaterialTrackView::element_scratch aligned and strided by number of tracks, change particle data to have energy and def_id as separate contiguous arrays)
Possibly allow inter-thread cooperation, refactoring track views and such so that they have null-ops for inactivate threads (except when being cooperative with other threads) rather than just returning early
For EM cross sections: instead of splitting the energy range into a regular scaling and a 1/E scaling, store the actual cross section values and just change the interpolation from special-casing 1/E to using log/semilog interpolation.

Provide feedback