Serialize arrays, making the code much simpler and more general#1293
Serialize arrays, making the code much simpler and more general#1293feldergast merged 4 commits intosstsimulator:develfrom
Conversation
|
CLANG-FORMAT TEST - FAILED (on last commit): |
|
CMAKE-FORMAT TEST - PASSED |
|
CLANG-FORMAT TEST - PASSED |
|
CMAKE-FORMAT TEST - PASSED |
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
CLANG-FORMAT TEST - FAILED (on last commit): |
|
CMAKE-FORMAT TEST - PASSED |
|
CMAKE-FORMAT TEST - PASSED |
|
CLANG-FORMAT TEST - PASSED |
|
Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
| serialize_array_element(serializer& ser, void* data, size_t index) | ||
| { | ||
| return pvt::ser_array_wrapper<void, IntType>(buf, size); | ||
| ser& static_cast<ELEM_T*>(data)[index]; |
There was a problem hiding this comment.
None of the serializer internals should use operator& as it's being deprecated. sst_map_object() is the entry point to serialization (I have a PR almost ready that renames this to sst_ser_object() to make it more clear that it is used for all serialization modes, including mapping). This is the function that the SST_SER() macro will also call when it's used.
There was a problem hiding this comment.
I know but I didn't want to use it until everything was settled as far as macro names, etc. Should a macro be called or a function, etc.? Is there a different function/macro to be called or a different argument to use when doing sub-objects which don't need annotation, etc.?
I was waiting for all of this stuff to settle.
sst_map_object() seems very map-centric and does not sound like a general serialization function.
There was a problem hiding this comment.
You can just use SST_SER.
| raw_ptr(TPtr*& ptr) | ||
| // Return a new map representing canonical fixed sized array (whether the original array came from ELEM_T[SIZE] or | ||
| // std::array<ELEM_T, SIZE>) | ||
| template <typename ELEM_T, size_t SIZE> |
There was a problem hiding this comment.
Why does this this need to be templated on array size? This would seem to create a new template expansion of ObjectMapContainer for every array size that is compiled. Seems like the size could be a runtime parameter passed to ObjectMapContainer.
Though I suppose I have less of an issue with the function being templated on size (as the function is likely to just get inlined), than I do having ObjectMapContainer depend on the array size through ELEM_T[SIZE]. The ObjectMapContainer doesn't care whether the data you put in it comes from a fixed or dynamic sized array. Can we just cast this to the right base pointer type that doesn't indicate whether it's fixed or dynamic? That way we only template on element type.
There was a problem hiding this comment.
It needs to be templated for array size because array size is a part of the type. We have to use templating on the array size somewhere. I have abstracted it out as much as possible to where you don't get large functions instantiated for each array size. Only a very small bit of code is instantiated for each array size, and then it calls a function passing it the array size as a runtime argument. I spent many days optimizing this and choosing the right balance. Some things just must be dependent on template parameters, such as the function which serializes an element (depends on element type as a template parameter) or the code cannot work in general. I've abstracted out the parts which are not dependent on template parameters such as ELEM_T and SIZE, and isolated the parts which are.
The code has been written in such a way as to reduce code size across a large number of types and fixed array sizes, by separating out the parts which depend on the element type and the array size. Where there is a template dependency on the element type or a fixed array size (which is a part of the type), it is necessary to include templated code, and trying to split it out much further would not be beneficial -- I've tried abstracting it out in different ways, and this current way produces a small number of Lines of Code (LOC) while still not making template instantiations grow the code size too large.
if constexpris used to make parts of the code conditional and optimized out where possible.
The ObjectMapContainer is still in flux and I wanted to discuss with you. For now there is a simply a function which creates an ObjectMapContainer which must be templated on ELEM_T and SIZE but we can turn SIZE into a runtime parameter if we want to. I wanted to discuss how to create an array MapObject and then we could construct it. This was lower priority than getting the array serialization stuff working for std::array
There was a problem hiding this comment.
I'm not sure ObjectMapContainer even needs to be templated anymore with the new way the ObjectMap is created in the new container serialization. The idea was that this would be a base class for a specific object map for each type of container so you could do the mapping appropriately for each. The new code just sticks the contents in as variables.
| */ | ||
| template <class T, size_t N> | ||
| class serialize_impl<T[N], std::enable_if_t<!std::is_arithmetic_v<T> && !std::is_enum_v<T>>> | ||
| // Serialize fixed arrays |
There was a problem hiding this comment.
The probably needs more comments, including what the difference between OBJ_TYPE and ELEM_T is.
There was a problem hiding this comment.
I can add comments, but I thought it was obvious.
| template <class T> | ||
| void | ||
| sst_map_object(serializer& ser, T& t, std::string name = "") | ||
| sst_map_object(serializer& ser, T&& obj) |
There was a problem hiding this comment.
Can you change it so that the basic sst_map_object always takes a const char* name? You could also have another variant that takes string or something convertible to string. I want this because the PR I'm working on no longer has a default for name, as we are moving to SST_SER being the primary user facing API for serialization, and it will always pass in something for name. I want it to be const char* because that is what the macro will put in and I don't want to have to generate a string every time this is called. We will also allow the user to directly call sst_map_object if, for some reason, they want to give the variable a different name in the ObjectMap than what it is called in the class. The operator& and operator| should just pass in "" for this variable.
There was a problem hiding this comment.
It adds runtime overhead if we always pass strings as arguments -- we should have two versions, one without a name argument and one with one. The one without one is simple and doesn't have to do any mapping. You were concerned about non-mapping serialization speed.
If we are passing a const char* in, I prefer using nullptr instead of "" for empty strings, since then the pointer just needs to be tested without any dereferencing to see if the string is empty.
I don't like the name sst_map_object() for serialization, since it is map-centric and imposes overhead of a name argument and "mapping" to all serialization. The name should not use "map" but be something like sst_ser and mapping should be an "afterthought", an "optional argument".
There was a problem hiding this comment.
The macro interface will always pass a name, so having a version without name is not needed.
With the new API, there is an option to control whether mapping of an object happens or not, so the name variable should never need to be tested unless you are checking for errors. So, passing either "" or nullptr should work, but really, the only time the function would be called directly is if you want to have a different name for mapping than the variable name in the class, in which case you wouldn't pass either of those.
I've already changed the name to sst_ser_object() to mirror the SST_SER macro.
| When ObjectMapContext is destroyed, the serializer goes back to the previous ObjectMapContext. | ||
| */ | ||
|
|
||
| class ObjectMapContext |
There was a problem hiding this comment.
This seems like a good replacement for the name stack. However, the plan is to still also pass an integer options value to each serialize_impl call as a bitwise map of options. These options are needed in SIZE, PACK and UNPACK mode and we're not willing to create a context object in those modes as they are used in the serialization for synchronization fast path.
There was a problem hiding this comment.
I think the flags should be a mostly-opaque type, such as SerFlag. I would not want uint32_t or uint64_t to be exposed in the API as the type of the flags. There should be a separate using (liketypedef) or enum class for the flags, using whatever integer underlying type wanted. The names should be prevented from collision with other names, such as putting them in an enum class or an enum inside the public section of a class, or a namespace with a short name like Ser.
Like how RevFlag is implemented.
There was a problem hiding this comment.
That's how I've got them implemented.
|
|
||
| // TODO: Implement mapping mode | ||
| switch ( const auto mode = ser.mode() ) { | ||
| case serializer::MAP: |
There was a problem hiding this comment.
Can you add this mode? I know of several dynamically allocated arrays that will need this feature immediately.
There was a problem hiding this comment.
I can add mapping mode but I was waiting to see how the ObjectMap should be implemented. We have a container ObjectMap -- should arrays just use ObjectMapContainer or create a ObjectMapArray class which inherits from ObjectMapContainer? How will we store the array size in it, and how does the size stored change w.r.t. serialization (I mean, how to keep the ObjectMapArray class or whatever in-sync with the array address and size in the program? Or do we only need the ObjectMapArray to live the life of a single serialization call and then it can be destroyed and none of the array or other information stored in it matters antmore?).
| // TODO: Implement mapping mode | ||
| switch ( ser.mode() ) { | ||
| case serializer::MAP: | ||
| // TODO: Implement mapping mode |
There was a problem hiding this comment.
Mapping mode does not not need to be implemented for raw pointers. If it's being serialized as just the value of the pointer, then we don't want to follow it to map the data it points to.
There was a problem hiding this comment.
I can remove that comment then and clean up the code. The previous code had a TODO about mapping mode in the raw pointers serialization, but I wasn't sure what needed to be done.
|
Even though changers are requested, I'm going to release this for testing to make sure it is compatible with sst-macro. |
|
Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed. |
|
Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects: Pull Request Auto Testing STARTING (click to expand)Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-macro_withsstcore
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-core_Make-Dist
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_Clang-Format_sst-core
Build InformationTest Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements
Build InformationTest Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-macro_withsstcore
Using Repos:
Pull Request Author: leekillough |
|
Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED Pull Request Auto Testing has PASSED (click to expand)Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-macro_withsstcore
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-core_Make-Dist
Build InformationTest Name: SST__AutotestGen2_NewFW_sst-test_Clang-Format_sst-core
Build InformationTest Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements
Build InformationTest Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-macro_withsstcore
|
|
Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging |
|
All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur... |
|
Since this passed testing and we're getting close to release, I'm going to just merge this and we can revisit the comments in another PR. Getting this merged now will let us start testing functionality (I'll add the tests to my next PR since I'm adding a lot of other tests to the serialization test suite). There will be some final cleanup across all of the serialization we'll probably want to do as we finalize the APIs and we can address these then. |
feldergast
left a comment
There was a problem hiding this comment.
Looks good. Can address comments in a future PR. See prior comment regarding this.
Do you want me to merge |
|
Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ feldergast ]! |
|
Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge |
No. That's an unfortunate name clash. sst-macro is a separate repository. It's a network simulator that can use set-core as its pdes framework. It isn't a branch of this repo, which I suspect is what you were thinking. I'll do the merge with my work and figure out all the clashes. I had also already made the array() function work for general types and based a lot of what I was doing off of that. Once I get everything merged and all the tests added, it would be great if you could review things. |
Please hold off. I have more commits in response to your comments, including the name string. |
|
I will reopen another PR |
This rewrites the array serialization, adding support for
std::arrayand dynamically allocated arrays of types besides arithmetic andenum.It still supports the existing wrapper classes
SST::Core:Serialization::array(ptr, size)andSST::Core::Serialization::raw_ptr(ptr)which, when serialized, serialize a dynamic array and avoid*raw pointer, respectively.ser& SST::Core::Serialization::array(ptr, size)during deserialization, will getsizeand allocateptrwithnew ELEM_T[size]whereELEM_Tis the type thatptrpoints to.std::array<ELEM_T,SIZE>andELEM_T[SIZE], and pointers to them, are handled with the same templated class. A pointer to a fixed array, likeint (*aptr)[10], will be allocated when deserialized.The old
serializer::array()function was renamedserializer::raw()and no longer uses the parameter syntaxarray(T a[N])becauseais turned into a pointer, and it cannot match array arguments that way (it would need to usearray(T (&a)[N])to match a reference to an array).serializer::raw()just takes avoid*andsize_tand reads/writes the raw data.Right now the code potentially leaks because it does not call
deleteordelete[]on the original pointer before callingnewornew[]to set it. This may need to be fixed but if the pointer is uninitialized, it could pose a problem. If the serialization documentation says that pointers must be initialized tonullptror some value returned bynewornew[]before serializing them, this would help.The code has been written in such a way as to reduce code size across a large number of types and fixed array sizes, by separating out the parts which depend on the element type and the array size. Where there is a template dependency on the element type or a fixed array size (which is a part of the type), it is necessary to include templated code, and trying to split it out much further would not be beneficial -- I've tried abstracting it out in different ways, and this current way produces a small number of Lines of Code (LOC) while still not making template instantiations grow the code size too large.
if constexpris used to make parts of the code conditional and optimized out where possible.All existing Core and Elements test pass; I did not see where in the tests where array serialization was tested. We may need to add it, particularly for
std::array.An
ObjectMapContextclass has been created which can be used to push options for mapping mode on the stack without having to usestd::stackor any other containers. A pointer in theserializerclass points to the currentObjectMapContext. The creation of a newObjectMapContextpoints the serializer to it and saves the old context. When destroyed, it automatically restores the old context. A spurious compiler warning about pointers to local variables had to be suppressed (I verified a Bugzilla was submitted against it on GCC 13, which is my current compiler).Even though
ObjectMapContextis lightweight and doesn't dynamically allocate memory (except forstd::string), it still does not need to be used in the non-mapping mode, as shown in the code above. Besides thenamestring, other attributes can be added later and will be accessible through accessor functions in theserializerclass without having to pass them in every serialization call. Even if you have a bitwiseflagsoption passed in macros, it doesn't have to be passed in every function call, just passed to anObjectMapContextconstructor and then adding a method to return it (if attributes like flags apply to more than mapping mode, a different name likeSerializerContextcan be used). If performance is a concern with multiple pointer dereferences,ObjectMapContextcan store any attributes passed to it directly into theserializerclass (and restore the old one upon destruction).Because the
SST::Core::Serialization::array(ptr, size)wrapper class is passed at the time of serialization and is an Rvalue, universal/forwarding references are used, but within the serialization code, the reference is treated as an Lvalue reference so that functions which expect Lvalue references can be called:As the serialization code is changed to use macros, the functions that get called need to support serializing Rvalue references as well as Lvalue references in case wrappers are used. The old code used a kludgy workaround by providing its own overloads for
operator&(ser, SST::Core::Serialization::array ary)which accepted wrapper classes by value instead of the usual Lvalue reference. But with universal/forwarding references at the top level, and then using the reference name as an Lvalue, that's not necessary.Cc: @kpgriesser