All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog.
- add vcpkg manifest #700
- add missing license identifier in all source files #707
- add
View::extents()#718 - add
View::blobs()accessor #719 - add a CUDA viewcopy benchmark #721, #834
- allow string literals and reflection in/of record dimension #737
- add
Stackedaccessor #755 - support
RecordRef::at<"str">()#757 - specialize
std::hashforRecordRef#758 - add a concept for
View#759 - demonstrate mapping detecting false sharing #772
- add
Lockedaccessor #773 - add a compile time benchmark #796, #805, #806
- add oneAPI SYCL n-body #799
- allow
SubViewto default construct inner view #801 - add gather/scatter to SIMD traits #815
- support a color palette and custom text color for
toSvg()#846
- require Boost 1.74 #710
- rename
forEachADCoordtoforEachArrayIndex#763 - upgrade to alpaka 1.0 #771
- rename linearizers towards mdspan jargon #784
- align AoSoA sub arrays #786
- relicense all LGPL-3.0+ content to MPL-2.0 #808
- extend output of
FieldAccessCount#703 - small improvements to
Heatmap#704 - improve LHCb HEP analysis example #706, #711, #731, #741, #746, #824, #825, #826, #827, #828
- CI fixes and improvements: #708, #720, #723, #725, #732, #745, #747, #748, #750, #752, #756, #762, #764, #778, #781, #783, #785, #789, #791, #807, #812, #820, #775, #749
- documentation fixes: #709, #714, #727, #731
- unit test improvements: #712, #765, #770, #779
- add diagnostic when CUDA is not found #726
- remove nvcc flattener workaround for nvcc > 11.6 #734
- small refactoring #740, #701, #761, #769, #774, #802, #810
- small internal fixes #716, #735, #736, #742, #743, #776, #804, #836, #837, #840
- improve color hashing for dumping images #744
- drop nested
ArrayIndexrequirement for mappings #751 - make view transformations forward views #753
- make
RecordRefweakly-equality-comparable #760 - allow STREAM with larger problem sizes #767
- add SSE4a implementation of NT-store accessor in STREAM #768
- separate record dim flattening from field permutations #782
- annotate all public APIs with
LLAMA_EXPORT#787 - improve n-body #798, #813, #814, #816, #818, #822, #823, #829, #832, #833, #794,
- fix
structNamefor single letter structs #800 - improve benchmark plots and reported statistics #803
- generalize blob copying functions #809
- extend viewcopy benchmark and improve
llama::copy#811, #842, #843, #844 - improve SIMD loading/storing #817
- implement SIMD load/store between different record dimensions #819
- align BabelStream with alpaka version #831
- add missing host/acc macro to decayCopy #835
- fix initialization of device memory from host in async blur example #838
- add a dump test for packed AoSoA4 #845
- refine
AnyViewconcept and test semiregularity #847 - update readthedocs config to version 2 #849
- fix most documentation issues #850
- drop support for nvcc < 11.6 #766, #797
- remove support for clang < 12 #792, #802
- disable alpaka PIC example for recent MSVC #793
- drop support for g++-9 #830
- remove deprecated tree mapping #839
- allow record coords in
llama::mapping::ChangeType's replacement map #468 - converted the daxpy example to alpaka, so it can be used on more architectures #469
- added new CUDA demo for pitched allocation #473
- added small utilities
llama::divCeil,llama::roundToMultipleandllama::dot(Array)#477 - added support for new compilers/OSes: clang-14 #484, clang-15 #590, gcc-12 #490, nvcc 11.7 #501, nvcc 11.8 #591, nvcc 12.0 #654, MacOS-12 #540, nvc++ 22.9 (nvhpc) #547, #589
- support array extents with arbitrary value types #488
- the creation of the single amalgamated header is now available as script: #497, #535
- a single amalgamated header from LLAMA is now published on each commit: #535
- the
Tracemapping is now supported on GPUs #503 - the
Heatmapmapping is now supported on GPUs #587 - added macros for likely and unlikely attributes #506
- added
front()andback()tollama::Array#517, #528 - added
data()tollama::Array#553 - allow in-place construction of
llama::Trace's inner mapping #517 - make printing API in
llama::Tracemore versatile #517 - added a documentation section comparing C++ and LLAMA data structure access #522
- documented interplay of member functions and proxy references #524
- added new utility functions
llama::transformBlobs()andllama::shallowCopy()#525 - added
llama::isTracetrait #529 - documented how to form references to
llama::One#532 - documented new LLAMA mappings and accessors #545, #583, #640
- added
llama::isOnetrait #549 - added
llama::isProxyReferencetrait #550 - added
llama::ScopedUpdate, a tool to generically update values through a (proxy) reference #550 - added an API for explicit SIMD programming #577, #578, #581
- data access can now be customized using accessors #579, #611, #612, #642
- the
README.mdhas been updated with a link to our first publication on LLAMA #596 - all mappings now re-expose their template parameters as nested types/values #599
- added the
ProjectionandByteswapmappings #607, #612 - added an example viewing a memory mapped file #608
- heatmaps can now be written to binary files in addition to ASCII #615
- added meta mapping
llama::mapping::PermuteArrayIndexto permute array indices #616, #636 - heatmap output can be trimmed #618
- added blob allocator
llama::bloballoc::UniquePtr#630 - added STREAM benchmark #643
- added some preliminary support for HIP (not CI tested yet) #651
- added the BabelStream benchmark #650
- added ROOT LHCB B2HHH analysis example #660, #672, #684
- the
Splitmapping now additionally supports tag lists as selectors #674 - allow the
BitPackedInt*mappings to omit the sign bit #675 - added new mapping
BitPackedIntAoS#678 - added new mapping
BitPackedFloatAoS#687 - improved array handling of
recordCoordTags#693
- the template parameter list for
llama::ArrayExtentschanged to support specifying the index type: #488 - the CI now uses alpaka 0.9 and not the development version #492
- LLAMA's cmake project now builds in Release mode by default with tests/examples off #509
- the unit tests now require Catch2 v3 to build, which can be downloaded automatically or taken from the system #511, #570
- cmake 3.18.3 is now required by LLAMA and all examples #526
- renamed
llama::VirtualRecordintollama::RecordRef#551 - the
Vclibrary has been replaced byxsimdfor explicit vectorization #557 - the requirements on computed mappings have been tightened #627
- renamed blob allocator
llama::bloballoc::Stacktollama::bloballoc::Array#629 - renamed
llama::VirtualViewtollama::SubView#638 - the
SoAmapping now aligns subarrays by default if a single blob is used #648 - replaced Boolean parameters of mappings by enums to increase readability #655
- the
Tracemapping has been renamed toFieldAccessCount#690 - replaced
.zenodo.jsonbyCITATION.cff#696 - renamed
recordCoordTagsintoprettyRecordCoord#693
- fixed various compilation flags #470
- aligned
std::vectorindaxpybaseline benchmark #471 - refactored common mapping code into a shared base class #472
- fixed alpaka examples to support alpaka 0.9 #474, #504
- made Codecov reports on PRs less verbose and allow for small coverage decreases #475
- removed some MSVC workarounds #476
- various minor CI fixes and updates: #478, #479, #483, #485, #491, #493, #494, #505, #512, #515, #519, #533, #538, #546, #556, #558, #562, #569, #571, #586, #600, #601, #602, #619, #620, #621, #622, #645, #646, #686, #688
- various small code fixes: #486, #489, #495, #500, #502, #507, #527, #560, #575, #584, #597, #598, #603, #617, #631, #632, #641, #649, #658, #659, #673
- various documentation fixes: #496, #514, #543, #563, #588, #624, #644, #649, #689
- various unit test improvements: #531, #534, #537, #568, #609, #613, #661, #698
- fixed empty base optimization for MSVC: #499
llama::structName<T>()andllama::recordCoordTags<T>are nowconstexpr#521- cmake variables from Catch are now hidden by default in cmake guis #548
- fixed warnings and asserts, and improve bitpacked mappings #549, #671, #677, #681
- fixed some edge cases and improved mapping dumping #552, #647
- allow assigning Trace references directly to each other #555
- the naming of identifiers in LLAMA code is now enforced by
clang-tidy#565 - code formatting now requires
clang-format-15#508, #564, #685 - support proxy references in RecordRef tuple interface #572
- comply to new CRP clang-tidy checks #573
- the runs of the n-body example are now verified against each other #574
- suppress unnecessary CUDA warnings #580
- the n-body and alpaka n-body example are now more similar and support explicit SIMD #582
- the gnuplot scripts for heatmaps have been improved #623
- a view constructed without a blob array argument will now value initialize the blob array #649
- the
SoAmapping's performance has been improved when the array extents are fully known at compile time #653 - fix
llama::structName<T>()forTs in unnamed namespaces
- support for Visual Studio 2019 has been dropped #539
- support for MacOS 11.15 has been dropped #561
- support for AppleClang has been dropped, use brew's clang on MacOS #593
- the obsolete
nbody_benchmarkexample has been removed #595
- added
operator<<forllama::VirtualRecord,llama::RecordCoord,llama::Arrayandllama::ArrayExtents#279, #243, #373, #374 - allow to use static arrays as record dimension #285, #244
- added
llama::copyfor layout aware copying between two views #289 - added
llama::Vectoras analog tostd::vector, but supports LLAMA mappings #296, #300 - added CI tests for MacOS 10.15 and 11 #297, #306, #393
- added
push_front,pop_front,push_backandpop_backforllama::Array#307 - added
operator==andoperator!=forllama::RecordCoord#308 - support arbitrary many record coords in
llama::Catandllama::cat#308 - added example showing a particle-in-cell (PIC) simulation #319
llama::Arraynow has a member functionsize#325- added
llama::isComputed<Mapping, RecordCoord>to query whether a field is computed by a mapping #325 - added
llama::swapforVirtualRecord, used by STL algorithms #344 - extended blob allocators to allow requesting blob alignment, now used by
llama::allocView#339, #355 - added
llama::alignOfandllama::flatAlignOf#355 - added traits to detect whether a type is a certain LLAMA mapping #359, #456
- added
TransformLeaves<RecordDim, TypeFunctor>meta function #365 - added macros
LLAMA_FORCE_INLINEandLLAMA_HOST_ACC#366 - support clang as CUDA compiler #366
llama::mapping::SoAandllama::mapping::AoSoAnow support custom record dimension flatteners #371- added the
llama::mapping::PermuteFieldsIncreasingAlignment,llama::mapping::PermuteFieldsDecreasingAlignmentandllama::mapping::PermuteFieldsMinimizePaddingrecord dimension flatteners #371 - added new mapping
llama::mapping::BitPackedIntSoAbitpacking integers in the record dimension into SoA arrays, and added new example #372, #427, #441, #446 - added new mapping
llama::mapping::BitPackedFloatSoAbitpacking floating-point types in the record dimension into SoA arrays, and added new example #414, #427, #446 LLAMA_FORCE_INLINEviews can be created onconstblobs #375- added
llama::allocViewUninitializedto create allama::Viewwithout running the field type's constructors #377 - added
llama::constructFieldsto run the constructors of all field type's in a view #377 - LLAMA's unit tests can now be run from the
ctesttest driver (not recommended because slower) #384 - added support for compile time array dimensions with new classes
llama::ArrayExtents#391 - allow suppressing console output from
llama::mapping::Traceon destruction #391 - added new mapping
llama::mapping::Bytesplitthat allows to split each field type into a byte array and map using a further mapping, and added example #395, #398, #399, #441 - added macro
LLAMA_UNROLLto request unrolling of a loop #403 - allow
llama::VirtualViewto store its inner view #406 llama::mapping::Splitnow supports multiple record coords to select how the record dimension is split #407- added clang-12, clang-13, g++-9, g++-11, nvcc 11.3, 11.4, 11.5, 11.6, Visual Studio 2022 to CI #314, #315, #317, #335, #396, #408, #412, #461
- added
CopyConsttype function #419 - added new mapping
llama::mapping::ChangeTypethat replaces types from the record dimension for other types when storing #421, #441 - added new mixin
llama::ProxyRefOpMixinto help supporting compount assignment and increment/decrement operators on proxy references #430 - added unit test coverage analysis and reports for each PR #432
- added new
llama::mapping::Nullmapping, that maps elements to nothing, discarding written values and returning default constructed values when reading #442 - added new example
daxpyfocusing on the mappingsllama::mapping::BitPackedFloatSoA,llama::mapping::Bytesplitandllama::mapping::ChangeType#450, #452, #455 - added
llama::ReplacePlaceholdersmeta function #451
- develop is the new default branch on GitHub, master was deleted #280
llama::Oneis now a zero-dimensional view (instead of one-dimensional) #286llama::mapping::AoSis aligned andllama::mapping::SoAis multiblob by default #312- all alpaka examples now require alpaka 0.7 #321
- updated clang-format to version 12.0.1 #326, #404
- stricter checking whether a type is allowed as field type in general #284
- stricter checking whether a type is allowed as field type in
llama::copy#329 llama::allocViewwill now execute the constructors of the field type's #377- brightened the colors used for dumped mapping visualizations #387
- renamed
llama::forEachLeaftollama::forEachLeafCoordand added newllama::forEachLeafiterating over the fields of a record #388 - replaced
llama::ArrayDimsbyllama::ArrayExtentsandllama::ArrayIndex#391 - renamed
llama::ArrayDimsIndexIteratortollama::ArrayIndexIterator#391 - renamed
llama::ArrayDimsIndexRangetollama::ArrayIndexRange#391 - renamed
llama::mapping::Mapping::arrayDims()-tollama::mapping::Mapping::extents()#391 - the
ASAN_FOR_TESTSCMake option has been renamed toLLAMA_ENABLE_ASAN_FOR_TESTS#425 - renamed all
llama::mapping::PreconfiguredMappingmeta functions tollama::mapping::BindMapping#456
- updated zenodo file and provided a DOI to LLAMA's releases #282, #291, #292
- views can be indexed with signed integer types as well #283
- improve
LLAMA_INDEPENDENT_DATAfor clang compilers and the Intel LLVM compiler (icx) #302, #411 - fixed a missing include #304
- made
llama::Tuplemore similar tostd::tuple#309 - added clang-tidy CI checks #310, #367
- all CMake projects now only request C++ as language #321
llama::Onenow respects the field type's alignment and minimizes its size #323- fixed
LLAMA_LAMBDA_INLINE_WITH_SPECIFIERSfor nvcc when using MSVC as host compiler #334 - fixed AoSoA blob size when the flat array extent is not divisible by the
Lanesparameter #336 - switched MSVC C++ standard flag from
/std:c++20to/std:c++latestfor unit tests #338, #443 - added more unit tests for
std::transformon LLAMA views #343 - fixed
value_typeofView::iteratorto be STL compatible #346 - fixed default arguments for
llama::mapping::PreconfiguredAoSto matchllama::mapping::AoS#347 - fixed default arguments for
llama::mapping::PreconfiguredSoAto matchllama::mapping::SoA#369 - improved
llama::VirtualRecord's andllama::View's size using empty base optimization #348 - updated
stbthird-party libraries #352 - ensured proper truncation of empty space after
hostname()in common example utilities #353 - a mapping's
blobNrAndOffsetcan now deduce the record coordinates from a passed instance ofllama::RecordCoord#368 - provided
boost::mp11::mp_flattenif Boost version is too old #370 - ensured that
llama::VirtualViewsupports negative indices #379 - documented the behavior of the array extents linearizers #380
- the fmt library is now an optional dependency for the llama CMake target #382, #383
- the unit tests now compile with higher warning levels #386
- better checking for unnecessary
constqualifiers onMappingandArrayExtentstemplate arguments #391 - refactoring CMake optimization flags #392
- refactored unit tests #299, #397
- added more unit tests for
llama::bloballoc::AlignedAllocatorandllama::mapping::Trace#437 - fixed generating invalid CSS class names for HTML dumps #410
- avoid blurry heatmaps dumped by
llama::mapping::Heatmap#416 - ensure that a fully-static
llama::ArrayExtentsandllama::mapping::Oneare stateless #417 DumpMapping.hppis now included viallama.hpp(with disabled content when the fmt library is not available) #251, #422- added
BytesplitandBitpackedFloatSoAmappings to n-body and heatequation examples #431 - simplified implementation of
llama::tupleReplace#435 llama::Tupledoes no longer reserve space for empty types #436- improved documentation and README.md #440, #445, #453, #454, #457
- fixed detection whether compilers support C++20 ranges #443
- refined mapping related concepts #444
- CI switched to Boost 1.74 because of alpaka
- support templates in
llama::structName#449
- dropped support for the Intel C++ Compiler Classic (icp) #351
- removed
llama::Array::rank, replaced byllama::Array::size#391
- added multi-blob SoA mapping allowing to map to one blob per field #111
- added
llama::FlatRecordDimandllama::flatRecordCoordto flatten record dimension and record coordinates #241 - added
llama::VirtualRecord::asTupleandllama::VirtualRecord::asFlatTupleto create tuples of references from virtual records #139, #141 - added an iterator for
llama::Viewandllama::View::begin/llama::View::end, allowing it to be used with the STL #158, #162, #207, #259, #259 - added support for arrays of static size inside
llama::Field#164 - added
llama::mapping::maxLanesto help building AoSoA mappings #181 - added
llama::flatSizeOfandllama::flatOffetOfworking on type lists #241 - added
llama::fieldCount<RecordDim>#241 - added
llama::LeafRecordCoordscreating a type list of record coordinates of the leaf fields #254 - added literal operator
_RCto easy creating of record coordinates #144 - added concepts
llama::BlobAllocatorandllama::StorageBlob#145, #146 - added new
Heatmapmapping, tracking bytewise memory access #192 - added
llama::forEachADCoordto iterate over array dimensions #198 - added a parameter to AoS to support alignment #156
- made ArrayDomainIndexIterator and ArrayDomainIndexRange constexpr #130
- added support for structured bindings on
llama::VirtualRecord#142 - added load and store support between virtual datum and any tuple like type #143
- added prototype of computed properties (experimental and undocumented) #171
- added
LLAMA_LAMBDA_INLINEto force inlining of lambda functions #264 - added CUDA n-body #129, #132, #220, #221
- extended n-body, vectoradd and heatequation examples with more variants #115, #116, #118, #124, #133, #134, #135, #207, #213, #216, #270, #273
- added new bufferguard example #166
- added new viewcopy example, comparing various approaches to copy between
llama::Views #119, #120, #223, #224, #25, #228, #235, #247, #268 - added new alpaka nbody example using Vc #128
- added icpc, icpx and clang to CI #157, #172
- added .clang-tidy file #195
- added clang-format check to CI #127
- extended
llama::sizeOfandllama::offsetOfto support alignment and padding #156 llama::ArrayDomainIndexIteratoris now random access and supports C++20 ranges #199llama::structNamecan now be used with a type argument as well #241llama::Onecan be constructed from other virtual records #256- added
llama::AlignedAllocator - made
llama::forEachLeafconstexpr - made all mappings constexpr
- renamed datum domain to record dimension, including corresponding files, helper functions, variables etc. #194, notably:
- renamed
llama::VirtualDatumtollama::VirtualRecord - renamed
llama::DatumCoordtollama::RecordCoord - renamed
llama::DatumStructtollama::Record - renamed
llama::DatumElementtollama::Field
- renamed
- replaced
llama::allocVirtualDatumStackbyllama::One#140 - bumped required alpaka version in examples to 0.7 and adapt to changes in alpaka #113
- bumped required CMake version to 3.16 #122
- added
arrayDimsgetter to all mappings and madeArrayDimsmember private #210 - renamed
llama::mapping::SplitMappingtollama::mapping::Split#155 - renamed namespace
llama::allocatortollama::bloballoc#188 - renamed
getBlobSize/getBlobNrAndOffsetin all mappings toblobSize/blobNrAndOffset#191 - removed unnecessary size argument of
llama::VirtualViewconstructor - replaced parallel STL by OpenMP in examples to remove dependency on TBB #198
- switched to clang-format 12 #202
- reorganized internal LLAMA headers #123
llama::offsetOfnow requires aRecordCoordinstead of integral indices- bumped required Boost version to 1.70
- added a few missing asymmetric arithmetic and relational operators to
llama::VirtualRecord#115 - fixed blob splitting in
llama::mapping::Split#155 - only write back velocity in n-body example #249
- improved output of dumped mapping visualizations #154, #265
- improved and expanded documentation and add new figures
- improved compilation time #241, #246, #254
- improve annotation of llama functions with LLAMA_FN_HOST_ACC_INLINE #152
- removed some dependencies on Boost #161, #204, #266
- updated .zenodo.json #121
- fix wrong distance calculation in body example
- refactored common timing functions in examples into class
Stopwatch - CMakeLists.txt cleanup
- refactored internals
- removed
llama::Index<I>, usellama::RecordCoordnow #144 - removed
llama::DatumArrayandllama::DA, replaced by using C++ native fixed size arrays #164
- C++17 and CUDA 11
- MSVC support
- improved API using C++17 CTAD
- improved integration with Alpaka
- dump mapping visualizations
- add experimental Trace and Split meta mappings
- lots of refactoring and code improvements
- greatly updated documentation
- turn some examples into proper unit tests
- add more unit tests
- CI support with unit tests, address sanitizer, amalgamated llama.hpp, doxygen etc.
- replace png++ by stb_image
- added .clang-format file
Basic functionality implemented