Skip to content

Serialize arrays, making the code much simpler and more general#1293

Merged
feldergast merged 4 commits intosstsimulator:develfrom
leekillough:serialize_array
Apr 14, 2025
Merged

Serialize arrays, making the code much simpler and more general#1293
feldergast merged 4 commits intosstsimulator:develfrom
leekillough:serialize_array

Conversation

@leekillough
Copy link
Contributor

This rewrites the array serialization, adding support for std::array and dynamically allocated arrays of types besides arithmetic and enum.

It still supports the existing wrapper classes SST::Core:Serialization::array(ptr, size) and SST::Core::Serialization::raw_ptr(ptr) which, when serialized, serialize a dynamic array and a void* raw pointer, respectively.

ser& SST::Core::Serialization::array(ptr, size) during deserialization, will get size and allocate ptr with new ELEM_T[size] where ELEM_T is the type that ptr points to.

std::array<ELEM_T,SIZE> and ELEM_T[SIZE], and pointers to them, are handled with the same templated class. A pointer to a fixed array, like int (*aptr)[10], will be allocated when deserialized.


The old serializer::array() function was renamed serializer::raw() and no longer uses the parameter syntax array(T a[N]) because a is turned into a pointer, and it cannot match array arguments that way (it would need to use array(T (&a)[N]) to match a reference to an array). serializer::raw() just takes a void* and size_t and reads/writes the raw data.


Right now the code potentially leaks because it does not call delete or delete[] on the original pointer before calling new or new[] to set it. This may need to be fixed but if the pointer is uninitialized, it could pose a problem. If the serialization documentation says that pointers must be initialized to nullptr or some value returned by new or new[] before serializing them, this would help.


The code has been written in such a way as to reduce code size across a large number of types and fixed array sizes, by separating out the parts which depend on the element type and the array size. Where there is a template dependency on the element type or a fixed array size (which is a part of the type), it is necessary to include templated code, and trying to split it out much further would not be beneficial -- I've tried abstracting it out in different ways, and this current way produces a small number of Lines of Code (LOC) while still not making template instantiations grow the code size too large. if constexpr is used to make parts of the code conditional and optimized out where possible.


All existing Core and Elements test pass; I did not see where in the tests where array serialization was tested. We may need to add it, particularly for std::array.


An ObjectMapContext class has been created which can be used to push options for mapping mode on the stack without having to use std::stack or any other containers. A pointer in the serializer class points to the current ObjectMapContext. The creation of a new ObjectMapContext points the serializer to it and saves the old context. When destroyed, it automatically restores the old context. A spurious compiler warning about pointers to local variables had to be suppressed (I verified a Bugzilla was submitted against it on GCC 13, which is my current compiler).

template <class T>
void
sst_map_object(serializer& ser, T& t, std::string_view name = "")
{
    if ( ser.mode() == serializer::MAP ) {
        if ( !name.empty() ) { // Do nothing if name is empty
            // A new ObjectMapContext is created with name and ser points to it
            ObjectMapContext c(ser, name); 
            serialize<T>()(t, ser);
           // c is destroyed and ser's old context is restored
        }
    }
    else {
        serialize<T>()(t, ser);
    }
}

Even though ObjectMapContext is lightweight and doesn't dynamically allocate memory (except for std::string), it still does not need to be used in the non-mapping mode, as shown in the code above. Besides the name string, other attributes can be added later and will be accessible through accessor functions in the serializer class without having to pass them in every serialization call. Even if you have a bitwise flags option passed in macros, it doesn't have to be passed in every function call, just passed to an ObjectMapContext constructor and then adding a method to return it (if attributes like flags apply to more than mapping mode, a different name like SerializerContext can be used). If performance is a concern with multiple pointer dereferences, ObjectMapContext can store any attributes passed to it directly into the serializer class (and restore the old one upon destruction).


Because the SST::Core::Serialization::array(ptr, size) wrapper class is passed at the time of serialization and is an Rvalue, universal/forwarding references are used, but within the serialization code, the reference is treated as an Lvalue reference so that functions which expect Lvalue references can be called:

template<typename T>
void operator&(serializer& ser, T&& t)    // T may be plain type or Lvalue reference to it; t is a Rvalue or Lvalue reference
{
   // t is used locally as an Lvalue to call functions which expect Lvalue references
}

As the serialization code is changed to use macros, the functions that get called need to support serializing Rvalue references as well as Lvalue references in case wrappers are used. The old code used a kludgy workaround by providing its own overloads for operator&(ser, SST::Core::Serialization::array ary) which accepted wrapper classes by value instead of the usual Lvalue reference. But with universal/forwarding references at the top level, and then using the reference name as an Lvalue, that's not necessary.

Cc: @kpgriesser

@github-actions github-actions bot added AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) AT: CLANG-FORMAT FAIL labels Apr 14, 2025
@github-actions
Copy link

CLANG-FORMAT TEST - FAILED (on last commit):
Run > ./scripts/clang-format-test.sh using clang-format v12 to check formatting

@github-actions github-actions bot added AT: CMAKE-FORMAT PASS and removed AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) labels Apr 14, 2025
@github-actions
Copy link

CMAKE-FORMAT TEST - PASSED

@github-actions github-actions bot added AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) AT: CLANG-FORMAT PASS and removed AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) AT: CLANG-FORMAT FAIL labels Apr 14, 2025
@github-actions
Copy link

CLANG-FORMAT TEST - PASSED

@github-actions
Copy link

CMAKE-FORMAT TEST - PASSED

@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

@github-actions github-actions bot added AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) AT: CLANG-FORMAT FAIL and removed AT: CLANG-FORMAT PASS labels Apr 14, 2025
@github-actions
Copy link

CLANG-FORMAT TEST - FAILED (on last commit):
Run > ./scripts/clang-format-test.sh using clang-format v12 to check formatting

@github-actions github-actions bot removed the AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) label Apr 14, 2025
@github-actions
Copy link

CMAKE-FORMAT TEST - PASSED

@github-actions github-actions bot added AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) AT: CLANG-FORMAT PASS and removed AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) labels Apr 14, 2025
@github-actions
Copy link

CMAKE-FORMAT TEST - PASSED

@github-actions
Copy link

CLANG-FORMAT TEST - PASSED

@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
NO INSPECTION HAS BEEN PERFORMED ON THIS PULL REQUEST! - This PR must be inspected by setting label 'AT: PRE-TEST INSPECTED'.

serialize_array_element(serializer& ser, void* data, size_t index)
{
return pvt::ser_array_wrapper<void, IntType>(buf, size);
ser& static_cast<ELEM_T*>(data)[index];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None of the serializer internals should use operator& as it's being deprecated. sst_map_object() is the entry point to serialization (I have a PR almost ready that renames this to sst_ser_object() to make it more clear that it is used for all serialization modes, including mapping). This is the function that the SST_SER() macro will also call when it's used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I know but I didn't want to use it until everything was settled as far as macro names, etc. Should a macro be called or a function, etc.? Is there a different function/macro to be called or a different argument to use when doing sub-objects which don't need annotation, etc.?

I was waiting for all of this stuff to settle.

sst_map_object() seems very map-centric and does not sound like a general serialization function.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can just use SST_SER.

raw_ptr(TPtr*& ptr)
// Return a new map representing canonical fixed sized array (whether the original array came from ELEM_T[SIZE] or
// std::array<ELEM_T, SIZE>)
template <typename ELEM_T, size_t SIZE>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why does this this need to be templated on array size? This would seem to create a new template expansion of ObjectMapContainer for every array size that is compiled. Seems like the size could be a runtime parameter passed to ObjectMapContainer.

Though I suppose I have less of an issue with the function being templated on size (as the function is likely to just get inlined), than I do having ObjectMapContainer depend on the array size through ELEM_T[SIZE]. The ObjectMapContainer doesn't care whether the data you put in it comes from a fixed or dynamic sized array. Can we just cast this to the right base pointer type that doesn't indicate whether it's fixed or dynamic? That way we only template on element type.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It needs to be templated for array size because array size is a part of the type. We have to use templating on the array size somewhere. I have abstracted it out as much as possible to where you don't get large functions instantiated for each array size. Only a very small bit of code is instantiated for each array size, and then it calls a function passing it the array size as a runtime argument. I spent many days optimizing this and choosing the right balance. Some things just must be dependent on template parameters, such as the function which serializes an element (depends on element type as a template parameter) or the code cannot work in general. I've abstracted out the parts which are not dependent on template parameters such as ELEM_T and SIZE, and isolated the parts which are.

The code has been written in such a way as to reduce code size across a large number of types and fixed array sizes, by separating out the parts which depend on the element type and the array size. Where there is a template dependency on the element type or a fixed array size (which is a part of the type), it is necessary to include templated code, and trying to split it out much further would not be beneficial -- I've tried abstracting it out in different ways, and this current way produces a small number of Lines of Code (LOC) while still not making template instantiations grow the code size too large. if constexpr is used to make parts of the code conditional and optimized out where possible.

The ObjectMapContainer is still in flux and I wanted to discuss with you. For now there is a simply a function which creates an ObjectMapContainer which must be templated on ELEM_T and SIZE but we can turn SIZE into a runtime parameter if we want to. I wanted to discuss how to create an array MapObject and then we could construct it. This was lower priority than getting the array serialization stuff working for std::array

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure ObjectMapContainer even needs to be templated anymore with the new way the ObjectMap is created in the new container serialization. The idea was that this would be a base class for a specific object map for each type of container so you could do the mapping appropriately for each. The new code just sticks the contents in as variables.

*/
template <class T, size_t N>
class serialize_impl<T[N], std::enable_if_t<!std::is_arithmetic_v<T> && !std::is_enum_v<T>>>
// Serialize fixed arrays
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The probably needs more comments, including what the difference between OBJ_TYPE and ELEM_T is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add comments, but I thought it was obvious.

template <class T>
void
sst_map_object(serializer& ser, T& t, std::string name = "")
sst_map_object(serializer& ser, T&& obj)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change it so that the basic sst_map_object always takes a const char* name? You could also have another variant that takes string or something convertible to string. I want this because the PR I'm working on no longer has a default for name, as we are moving to SST_SER being the primary user facing API for serialization, and it will always pass in something for name. I want it to be const char* because that is what the macro will put in and I don't want to have to generate a string every time this is called. We will also allow the user to directly call sst_map_object if, for some reason, they want to give the variable a different name in the ObjectMap than what it is called in the class. The operator& and operator| should just pass in "" for this variable.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It adds runtime overhead if we always pass strings as arguments -- we should have two versions, one without a name argument and one with one. The one without one is simple and doesn't have to do any mapping. You were concerned about non-mapping serialization speed.

If we are passing a const char* in, I prefer using nullptr instead of "" for empty strings, since then the pointer just needs to be tested without any dereferencing to see if the string is empty.

I don't like the name sst_map_object() for serialization, since it is map-centric and imposes overhead of a name argument and "mapping" to all serialization. The name should not use "map" but be something like sst_ser and mapping should be an "afterthought", an "optional argument".

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The macro interface will always pass a name, so having a version without name is not needed.

With the new API, there is an option to control whether mapping of an object happens or not, so the name variable should never need to be tested unless you are checking for errors. So, passing either "" or nullptr should work, but really, the only time the function would be called directly is if you want to have a different name for mapping than the variable name in the class, in which case you wouldn't pass either of those.

I've already changed the name to sst_ser_object() to mirror the SST_SER macro.

When ObjectMapContext is destroyed, the serializer goes back to the previous ObjectMapContext.
*/

class ObjectMapContext
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a good replacement for the name stack. However, the plan is to still also pass an integer options value to each serialize_impl call as a bitwise map of options. These options are needed in SIZE, PACK and UNPACK mode and we're not willing to create a context object in those modes as they are used in the serialization for synchronization fast path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the flags should be a mostly-opaque type, such as SerFlag. I would not want uint32_t or uint64_t to be exposed in the API as the type of the flags. There should be a separate using (liketypedef) or enum class for the flags, using whatever integer underlying type wanted. The names should be prevented from collision with other names, such as putting them in an enum class or an enum inside the public section of a class, or a namespace with a short name like Ser.

Like how RevFlag is implemented.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's how I've got them implemented.


// TODO: Implement mapping mode
switch ( const auto mode = ser.mode() ) {
case serializer::MAP:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add this mode? I know of several dynamically allocated arrays that will need this feature immediately.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add mapping mode but I was waiting to see how the ObjectMap should be implemented. We have a container ObjectMap -- should arrays just use ObjectMapContainer or create a ObjectMapArray class which inherits from ObjectMapContainer? How will we store the array size in it, and how does the size stored change w.r.t. serialization (I mean, how to keep the ObjectMapArray class or whatever in-sync with the array address and size in the program? Or do we only need the ObjectMapArray to live the life of a single serialization call and then it can be destroyed and none of the array or other information stored in it matters antmore?).

// TODO: Implement mapping mode
switch ( ser.mode() ) {
case serializer::MAP:
// TODO: Implement mapping mode
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mapping mode does not not need to be implemented for raw pointers. If it's being serialized as just the value of the pointer, then we don't want to follow it to map the data it points to.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can remove that comment then and clean up the code. The previous code had a TODO about mapping mode in the raw pointers serialization, but I wasn't sure what needed to be done.

@feldergast
Copy link
Contributor

Even though changers are requested, I'm going to release this for testing to make sure it is compatible with sst-macro.

@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Test Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED by label AT: PRE-TEST INSPECTED! Autotester is Removing Label; this inspection will remain valid until a new commit to source branch is performed.

@sst-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Testing Jenkins Projects:

Pull Request Auto Testing STARTING (click to expand)

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Build Num: 1962
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Build Num: 1918
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Build Num: 1917
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-macro_withsstcore

  • Build Num: 866
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-core_Make-Dist

  • Build Num: 719
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_Clang-Format_sst-core

  • Build Num: 673
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Build Num: 484
  • Status: STARTED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-macro_withsstcore

  • Build Num: 291
  • Status: STARTED

Using Repos:

Repo: CORE (leekillough/sst-core)
  • Branch: serialize_array
  • SHA: 3206073
  • Mode: TEST_REPO
Repo: SQE (sstsimulator/sst-sqe)
  • Branch: devel
  • SHA: ad7d76e1f96debf721ec577ebd005c3026a55edc
  • Mode: SUPPORT_REPO
Repo: ELEMENTS (sstsimulator/sst-elements)
  • Branch: devel
  • SHA: 5881035a5ff9312d0c3b84e4f1d84344f2726699
  • Mode: SUPPORT_REPO
Repo: MACRO (sstsimulator/sst-macro)
  • Branch: devel
  • SHA: 42e85e1689d473c65fdbcc008ce57fd53fe80865
  • Mode: SUPPORT_REPO

Pull Request Author: leekillough

@sst-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Jenkins Testing: all Jobs PASSED

Pull Request Auto Testing has PASSED (click to expand)

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements

  • Build Num: 1962
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MR-2

  • Build Num: 1918
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-elements_MT-2

  • Build Num: 1917
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-macro_withsstcore

  • Build Num: 866
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_OMPI-4.1.4_PY3.6_sst-core_Make-Dist

  • Build Num: 719
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_sst-test_Clang-Format_sst-core

  • Build Num: 673
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-elements

  • Build Num: 484
  • Status: PASSED

Build Information

Test Name: SST__AutotestGen2_NewFW_OSX-14-XC15-ARM2_OMPI-4.1.6_PY3.10_sst-macro_withsstcore

  • Build Num: 291
  • Status: PASSED

@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - - This Pull Request Requires Inspection... The code must be inspected by a member of the Team before Testing/Merging
THE LAST COMMIT TO THIS PULL REQUEST HAS BEEN REVIEWED, BUT NOT ACCEPTED OR REQUIRES CHANGES!

@sst-autotester
Copy link
Contributor

All Jobs Finished; status = PASSED, However Inspection must be performed before merge can occur...

@feldergast
Copy link
Contributor

Since this passed testing and we're getting close to release, I'm going to just merge this and we can revisit the comments in another PR. Getting this merged now will let us start testing functionality (I'll add the tests to my next PR since I'm adding a lot of other tests to the serialization test suite). There will be some final cleanup across all of the serialization we'll probably want to do as we finalize the APIs and we can address these then.

Copy link
Contributor

@feldergast feldergast left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Can address comments in a future PR. See prior comment regarding this.

@leekillough
Copy link
Contributor Author

Even though changers are requested, I'm going to release this for testing to make sure it is compatible with sst-macro.

Do you want me to merge sst-macro into this branch, resolving conflicts and making sure that it conforms to the new macro/function call syntax? Where is sst-macro?

@sst-autotester
Copy link
Contributor

Status Flag 'Pre-Merge Inspection' - SUCCESS: The last commit to this Pull Request has been INSPECTED AND APPROVED by [ feldergast ]!

@sst-autotester
Copy link
Contributor

Status Flag 'Pull Request AutoTester' - Pull Request MUST BE MERGED MANUALLY BY Project Team - This Repo does not support Automerge

@feldergast
Copy link
Contributor

Even though changers are requested, I'm going to release this for testing to make sure it is compatible with sst-macro.

Do you want me to merge sst-macro into this branch, resolving conflicts and making sure that it conforms to the new macro/function call syntax? Where is sst-macro?

No. That's an unfortunate name clash. sst-macro is a separate repository. It's a network simulator that can use set-core as its pdes framework. It isn't a branch of this repo, which I suspect is what you were thinking.

I'll do the merge with my work and figure out all the clashes. I had also already made the array() function work for general types and based a lot of what I was doing off of that. Once I get everything merged and all the tests added, it would be great if you could review things.

@feldergast feldergast merged commit e3e5c6f into sstsimulator:devel Apr 14, 2025
7 checks passed
@leekillough
Copy link
Contributor Author

Since this passed testing and we're getting close to release, I'm going to just merge this and we can revisit the comments in another PR. Getting this merged now will let us start testing functionality (I'll add the tests to my next PR since I'm adding a lot of other tests to the serialization test suite). There will be some final cleanup across all of the serialization we'll probably want to do as we finalize the APIs and we can address these then.

Please hold off. I have more commits in response to your comments, including the name string.

@leekillough
Copy link
Contributor Author

I will reopen another PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants