-
Notifications
You must be signed in to change notification settings - Fork 45
Development
Encapsulation is about making a piece of code into a black box. The fewer lines connecting these black boxes, the more maintainable the code. Black boxes can often be improved internally by making tiny black boxes inside the larger black box.
Motivation:
- Developers don't have to understand implementation details when looking at a class interface.
- Compilers can optimize better when dealing with more localized components.
- Good encapsulation allows components to be interchanged easily because they have well-defined interfaces.
- Pausing to think about how to minimize input and output from an algorithm can improve make it easier to write.
Applications:
- Refactor large functions (> 50 statements ish?) into small functors that take
"invariant" values (the larger context) for constructors and use
operator()to transform some input into the desired output - Use only
constdata when sharing. Non-const shared data is almost like using global variables. - Use
OpaqueIdinstead of integers and magic sentinel values for integer identifiers that aren't supposed to be arithmetical.
Examples:
- Random number sampling: write a unit sphere sampling functor instead of replicating a polar-to-cartesian transform in a thousand places
- Cell IDs: Opaque IDs add type safety so that you can't accidentally convert a cell identifier into a double or switch a cell and material ID. Also makes code more readable of course.
No explanation needed.
Code performance is important, but so is developer time. When possible, minimize the amount of code touched by NVCC. (NVCC's error output is also rudimentary compared to modern clang/gcc, so that's another reason to prefer them compiling your code.)
As much as possible, make classes "complete" and valid after calling the constructor. Don't have a series of functions that have to be called in a specific order to put the class in a workable state.
When a class has a single function (especially if you name that function
operator()), its usage is obvious. The reader also doesn't have to know
whether a class uses doIt or do_it or build.
When you have a class that needs a lot of data to start in a valid state, use a
struct of intuitive objects to pass the data to the class's constructor.
The constructor can do any necessary validation on the input data.
Other entities devoted to sustainable programming have their own guidelines. The ISO C++ guidelines are very long but offer a number of insightful suggestions about C++ programming. The Google style guide is a little more targeted toward legacy code and large production environments, but it still offers good suggestions.
Functions should use programmatic assertions whenever assumptions are made:
- Use the
REQUIRE(x)assertion macro to test preconditions about incoming data or initial internal states - Use
CHECK(x)to express an assumption internal to a function (e.g. "this index is not out of range of the array") - Use
ENSURE(x)to mark expectations about data being returned from a function and side effects resulting from the function.
Each class must be thoroughly tested with an independent unit test in the test directory. For complete coverage, each function of the class must have at least as many tests as the number of possible code flow paths (cylcomatic complexity).
Implementation detail classes (in the celeritas::detail namespace, in detail/
subdirectories) are exempt from the testing requirement, but testing the detail classes is
a good way to simplify edge case testing compared to testing the higher-level code.
Having a consistent code style makes it more readable and maintainable. (For example, you don't have to guess whether a symbol is a function or class.)
As a historical note, many of the style conventions in Celeritas derive from the Draco project style of which Tom Evans was primary author and which became the style standard for the GPU-enabled Monte Carlo code Shift.
Formatting is determined by the clang-format file inside the top-level
directory. One key restriction is the 80-column limit, which enables multiple
code windows to be open side-by-side. Generally, statements longer than 80
columns should be broken into sub-expressions for improved readability anyway
-- the auto keyword can help a lot with this.
There's a certain amount of decorations (separators, doxygen comment structure,
etc.) that is standard throughout the code. Use the celeritas-gen.py script
(in the scripts/dev directory) to generate skeletons for new files, and use
existing source code as a guide to how to structure the decorations.
Functions should be verbs; classes should be names. As in standard Python
(PEP-8-compliant) code, classes should use CapWordsStyle and variables use
snake_case_style.
Functors (classes whose instances act like a function) should be an agent noun: the noun form of an action verb. Instances of a functor should be a verb. For example:
ModelEvaluator evaluate_something(parameters...); auto result = evaluate_something(arguments...);
There are many opportunities to use OpaqueId in GPU code to indicate indexing into particular vectors. To maintain consistency, we let an index into a vector of Foo have a corresponding OpaqueId type:
using FooId = OpaqueId<Foo>;
and ideally be defined either immediately after Foo or in a Types.hh file. Some OpaqueId types may have only a "symbolic" corresponding type, in which case a tag struct can be be defined inline:
using BarId = OpaqueId<struct Bar>;
All __device__ and __global__ code must be compiled with NVCC to generate
device objects. However, code that merely uses CUDA API calls such as
cudaMalloc does not have to be compiled with NVCC. Instead, it only has to
be linked against the CUDA runtime library and include cuda_runtime_api.h.
The exception to this is VecGeom's code, which compiles differently when run
through NVCC. (Macro magic puts much of the code in a different namespace.)
Since NVCC is slower and other compilers' warning/error output is more readable, it's preferable to use NVCC for as little compilation as possible. Furthermore, not requiring NVCC lets us play nicer with downstream libraries and front-end apps. Host code will not be restricted to the minimum version supported by NVCC (C++14).
Of course, the standard compilers cannot include any CUDA code containing
kernel launches, since those require special parsing by the compiler. So kernel
launches and __global__ code must be in a .cu file. However, the
CUDA runtime does define the special __host__ and __device__ macros (among
others). Therefore it is OK for a CUDA file to be included by host code as long
as it #include s the CUDA API. (Note that if such a file is to be included by
downstream code, it will also have to propagate the CUDA include directories.)
Choosing to compile code with the host compiler rather than NVCC also provides a check against surprise kernel launches. For example, the declaration:
thrust::device_vector<double> dv(10);
actually launches a kernel to fill the vector's initial state. The code will
not compile in a .cc file run through the host compiler, but it will
automatically (and silently) generate kernel code when run through NVCC.
Finally, we choose the convention of .cc for C++ translation units and
corresponding .hh files for C++ headers.
Thus we have the following rules:
-
.hhis for C++ header code compatible with host compilers. The code in this file can be compatible with device code if it uses theCELER_FUNCTIONfamily of macros defined inbase/Macros.hh. -
.ccis for C++ code that will invariably be compiled by the host compiler. -
.cuis for__global__kernels and functions that launch them, including code that initializes device memory. -
.cuhis for header files that require compilation by NVCC: contain__device __-only code or include CUDA directives without#include <cuda_runtime_api.h>. -
.cuda.hhand.cuda.ccrequire CUDA to be enabled and use CUDA runtime libraries and headers, but they do not execute any device code and thus can be built by a host compiler. If the CUDA-related code is guarded by#if CELERITAS_USE_CUDAmacros then the special extension is not necessary. A.nocuda.ccfile can specify alternative code paths to.cuda.ccfiles (mainly for error checking code). -
.mpi.ccand.nompi.cccode requires MPI to be enabled or disabled, respectively.
Some files have special extensions:
-
.i.hhis forinlinefunction implementations. If a function or member function is markedinlinein the main header file, its definition should be provided here. Noinlinemodifier is needed for the.i.hhdefinition but it must be present in the.hhfile. -
.t.hhis for non-inlinedtemplateimplementations: if they're markedinlinein their corresponding declaration in the.hh, their implementation should be provided here. -
.test.ccare unit test executables corresponding to the main.ccfile. These should only be in the main/testdirectory.
If there are only a few short inline methods (and especially if it's for a small class) they can be included at the bottom of the main header file under a suitable separator that demarcates the declarations from the definitions.
Generally speaking, variables should have short lifetimes and should be
self-documenting. Avoid shorthand and "transliterated" mathematical
expressions: prefer constants::na_avogadro to N_A (or express the
constant functionally with atoms_per_mole) and use atomic_number
instead of Z. Physical constants should try to have the symbol concatenated
to the context or meaning (e.g. c_light or h_planck).
Use scoped enumerations (enum class) where possible (named like classes) so
their values can safely be named like member variables (lowercase with
underscores).
Although struct and class are interchangeable for class definitions
(modifying only the default visibility as public or private), use struct
for data-oriented classes that don't declare constructors and have only
public data; and class for classes designed to encapsulate functionality
and/or data.
With template parameters, typename T and class T are also interchangeable,
but use template <class T> to be consistent internally and with the
standard library. (It's also possible to have template <typename where
typename doesn't mean a class: namely,
template <typename U::value_type Value>.)
Data management should be isolated from data use as much as possible. This allows low-level physics classes to operate on references to data using the exact same device/host code. Furthermore, state data (one per track) and shared data (definitions, persistent data, model data) should be separately allocated and managed.
- Store
- Generic name for a class that manages GPU data by means of a host class,
using
celeritas::DeviceVector(orthrustorVecGeomwrappers as needed) to manage the on-device data. Use DeviceVectors for containers that don't need special initialization (i.e. have "plain old data"). - Params (model parameters)
- Provide a CPU-based interface to manage and provide access to constant shared
GPU data, usually model parameters or the like. The Params class itself can
only be accessed via host code, but it should use the
celeritas::DeviceVectororthrustorVecGeomwrappers to manage on-device data. A params class can contain metadata (string names, etc.) suitable for host-side debug output and for helping related classes convert from user-friendly input (e.g. particle name) to device-friendly IDs (e.g. particle def ID). - State (state variables)
- Thread-local data specifying the state of a single particle track with
respect to a corresponding model (
FooParams). The state data resides on device but is managed by a host classFooStateStoreusingDeviceVectoror the like. It is an implementation detail whether the state data is stored as a struct of arrays (SOA) or an array of structs (AOS), but if stored as AOS then the per-track state struct should be namedTrackFooState. - Pointers
- A standalone, plain-old-data struct to data owned by another class (e.g. a Params class) but stored on the GPU. This struct is used to transfer GPU pointers and other kernel parameters between host and device. A Pointers struct can hold other Pointers structs as data members. Inside unit tests for debugging, Pointers can reference host data if the corresponding functions being called are also on-host. Defining Pointers structs in separate files from the memory management classes means that NVCC doesn't have to include those headers, speeding up compilation and perhaps allowing the host code to use newer implementations of the C++ standard.
- TrackView
- Device-instantiated class that provides read/write access to the data for a single track, in the spirit of std::string_view which adds functionality to data owned by someone else. It combines the state variables and model parameters into a single class. The constructor always takes const references to ParamsPointers and StatePointers as well as the track ID. It encapsulates the storage/layout of the state and parameters, as well as what (if any) data is cached in the state.
- View
- Device-instantiated class with read/write access for data shared across threads. For example, allocation for Secondary particles is performed on device, but the data is not specific to a thread.
.. example:: All SM physics particles share a common set of properties such as mass, charge; and each instance of particle has a particular set of associated variables such as kinetic energy. The shared data (SM parameters) reside in ``ParticleParams``, and the particle track properties are managed by a ``ParticleStateStore`` class. A separate class, the ``ParticleTrackView``, is instantiated with a specific thread ID so that it acts as an accessor to the stored data for a particular track. It can calculate properties that depend on both the state and parameters. For example, momentum depends on both the mass of a particle (constant, set by the model) and the speed (variable, depends on particle track state).