Skip to content

Conversation

@plavin
Copy link
Contributor

@plavin plavin commented Aug 15, 2025

This PR makes all classes in Miranda checkpointable.

  • serialize_order methods have been added
  • ImplementSerializeable macros have been added where necessary
  • Default constructors have been added
  • Whitespace has been cleaned up
  • ELI info has been rearranged. It must come before ImplementSerializeable

Two classes are not checkpointable:

  • GeneratorRequest (missing serialize_impl)
  • RequestGenCPU (depends on GeneratorRequest)

Todo:

  • Test checkpoint/restart functionality
  • Test what happens with default constructors that have side effects
    • MirandaRequestQueue
    • GeneratorRequest
  • Check proper visibility of default constructors and serialize_order methods

@plavin plavin added SST-miranda AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) labels Aug 15, 2025
@plavin
Copy link
Contributor Author

plavin commented Oct 20, 2025

I am getting a segfault when attempting to checkpoint. Would appreciate some guidance on this:

Segfault:

$ sst --checkpoint-period=500us tests/inorderstream.py
# Creating simulation checkpoint at simulated time period of 500us.
# Simulation Checkpoint: Simulated Time 500 us (Real CPU time since last checkpoint 1.88415 seconds)
[sst-devel-new:3855286] *** Process received signal ***
[sst-devel-new:3855286] Signal: Segmentation fault (11)
[sst-devel-new:3855286] Signal code: Address not mapped (1)
[sst-devel-new:3855286] Failing at address: 0x40
[sst-devel-new:3855286] [ 0] /lib64/libpthread.so.0(+0x12ce0)[0x7fda9b75bce0]
[sst-devel-new:3855286] [ 1] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST4Core13Serialization14serialize_implIPNS_10Statistics9StatisticImEEvEclERS6_RNS1_10serializerEj+0x73)[0x7fda7a708d67]
[sst-devel-new:3855286] [ 2] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST4Core13Serialization3pvt9serializeIPNS_10Statistics9StatisticImEEEclERS7_RNS1_10serializerEj+0x18f)[0x7fda7a6fbff3]
[sst-devel-new:3855286] [ 3] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST4Core13Serialization14sst_ser_objectIRPNS_10Statistics9StatisticImEEEEvRNS1_10serializerEOT_jPKc+0xc9)[0x7fda7a6f0ea2]
[sst-devel-new:3855286] [ 4] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST4Core13Serialization3pvt23serialize_array_elementIPNS_10Statistics9StatisticImEEEEvRNS1_10serializerEPvjm+0x3f)[0x7fda7a713784]
[sst-devel-new:3855286] [ 5] sst(_ZN3SST4Core13Serialization3pvt15serialize_arrayERNS1_10serializerEPvjmPFvS4_S5_jmE+0x3e)[0xc09224]
[sst-devel-new:3855286] [ 6] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST4Core13Serialization3pvt26serialize_impl_fixed_arrayIA4_PNS_10Statistics9StatisticImEES7_Lm4EEclERS8_RNS1_10serializerEj+0xed)[0x7fda7a708cc5]
[sst-devel-new:3855286] [ 7] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST4Core13Serialization3pvt9serializeIA4_PNS_10Statistics9StatisticImEEEclERS8_RNS1_10serializerEj+0x2e)[0x7fda7a6fbcce]
[sst-devel-new:3855286] [ 8] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST4Core13Serialization14sst_ser_objectIRA4_PNS_10Statistics9StatisticImEEEEvRNS1_10serializerEOT_jPKc+0xf5)[0x7fda7a6f0db7]
[sst-devel-new:3855286] [ 9] /home/prlavin/sst-miranda/install/lib/sst-elements-library/libmiranda.so(_ZN3SST7Miranda13RequestGenCPU15serialize_orderERNS_4Core13Serialization10serializerE+0x2c3)[0x7fda7a6e9f7f]
[sst-devel-new:3855286] [10] sst(_ZN3SST4Core13Serialization3pvt28SerializeBaseComponentHelper18size_basecomponentEPNS1_17serializable_baseERNS1_10serializerE+0x50)[0xab1e28]
[sst-devel-new:3855286] [11] sst(_ZN3SST4Core13Serialization14serialize_implIPNS_13BaseComponentEvEclERS4_RNS1_10serializerEj+0x73)[0xac4b3d]
[sst-devel-new:3855286] [12] sst(_ZN3SST4Core13Serialization3pvt9serializeIPNS_13BaseComponentEEclERS5_RNS1_10serializerEj+0x18f)[0xabd555]
[sst-devel-new:3855286] [13] sst(_ZN3SST4Core13Serialization14sst_ser_objectIRPNS_13BaseComponentEEEvRNS1_10serializerEOT_jPKc+0xc9)[0xab85bf]
[sst-devel-new:3855286] [14] sst(_ZN3SST13ComponentInfo14serialize_compERNS_4Core13Serialization10serializerE+0x30)[0xaebb9a]
[sst-devel-new:3855286] [15] sst(_ZN3SST13ComponentInfo15serialize_orderERNS_4Core13Serialization10serializerE+0x565)[0xaec18f]
[sst-devel-new:3855286] [16] sst(_ZN3SST4Core13Serialization14serialize_implIPNS_13ComponentInfoEvEclERS4_RNS1_10serializerEj+0x1e0)[0xac4036]
[sst-devel-new:3855286] [17] sst(_ZN3SST4Core13Serialization3pvt9serializeIPNS_13ComponentInfoEEclERS5_RNS1_10serializerEj+0x18f)[0xabca29]
[sst-devel-new:3855286] [18] sst(_ZN3SST4Core13Serialization14sst_ser_objectIRPNS_13ComponentInfoEEEvRNS1_10serializerEOT_jPKc+0xc9)[0xab7ddb]
[sst-devel-new:3855286] [19] sst(_ZN3SST15Simulation_impl10checkpointERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE+0x344)[0xbc4e18]
[sst-devel-new:3855286] [20] sst(_ZN3SST16CheckpointAction16createCheckpointEPNS_15Simulation_implE+0x4b3)[0xae2ad9]
[sst-devel-new:3855286] [21] sst(_ZN3SST16CheckpointAction7executeEv+0x28)[0xae25e4]
[sst-devel-new:3855286] [22] sst(_ZN3SST15Simulation_impl3runEv+0x333)[0xbc0fcd]
[sst-devel-new:3855286] [23] sst[0xa48f7e]
[sst-devel-new:3855286] [24] sst(main+0x1cfd)[0xa4b0a0]
[sst-devel-new:3855286] [25] /lib64/libc.so.6(__libc_start_main+0xf3)[0x7fda99620cf3]
[sst-devel-new:3855286] [26] sst(_start+0x2e)[0xa4655e]
[sst-devel-new:3855286] *** End of error message ***

Backtrace from gdb:

# Creating simulation checkpoint at simulated time period of 500us.
# Simulation Checkpoint: Simulated Time 500 us (Real CPU time since last checkpoint 2.54308 seconds)

Thread 1 "sstsim.x" received signal SIGSEGV, Segmentation fault.
0x00007fffd62d3d67 in SST::Core::Serialization::serialize_impl<SST::Statistics::Statistic<unsigned long>*, void>::operator() (this=0x7fffffff790d, s=@0x13c0450: 0x40, ser=..., options=0) at /home/prlavin/sst-miranda/install/include/sst/core/statapi/statbase.h:671
671                 std::string    stat_eli_type = s->getELIName();
Missing separate debuginfos, use: yum debuginfo-install glibc-2.28-189.5.el8_6.x86_64 hdf5-1.10.5-4.el8.x86_64 libaec-1.0.2-3.el8.x86_64 zlib-1.2.11-26.el8.x86_64
(gdb) bt
#0  0x00007fffd62d3d67 in SST::Core::Serialization::serialize_impl<SST::Statistics::Statistic<unsigned long>*, void>::operator() (this=0x7fffffff790d, s=@0x13c0450: 0x40, ser=..., options=0) at /home/prlavin/sst-miranda/install/include/sst/core/statapi/statbase.h:671
#1  0x00007fffd62c6ff3 in SST::Core::Serialization::pvt::serialize<SST::Statistics::Statistic<unsigned long>*>::operator() (this=0x7fffffff79af, t=@0x13c0450: 0x40, ser=..., options=0) at /home/prlavin/sst-miranda/install/include/sst/core/serialization/serialize.h:273
#2  0x00007fffd62bbea2 in SST::Core::Serialization::sst_ser_object<SST::Statistics::Statistic<unsigned long>*&> (ser=..., obj=@0x13c0450: 0x40, options=0, name=0x0) at /home/prlavin/sst-miranda/install/include/sst/core/serialization/serialize.h:370
#3  0x00007fffd62de784 in SST::Core::Serialization::pvt::serialize_array_element<SST::Statistics::Statistic<unsigned long>*> (ser=..., data=0x13c0440, opt=0, index=2) at /home/prlavin/sst-miranda/install/include/sst/core/serialization/impl/serialize_array.h:59
#4  0x0000000000c09224 in SST::Core::Serialization::pvt::serialize_array (ser=..., data=0x13c0440, opt=0, size=4,
    serialize_array_element=0x7fffd62de745 <SST::Core::Serialization::pvt::serialize_array_element<SST::Statistics::Statistic<unsigned long>*>(SST::Core::Serialization::serializer&, void*, unsigned int, unsigned long)>) at serialization/impl/serialize_array.cc:25
#5  0x00007fffd62d3cc5 in SST::Core::Serialization::pvt::serialize_impl_fixed_array<SST::Statistics::Statistic<unsigned long>* [4], SST::Statistics::Statistic<unsigned long>*, 4ul>::operator() (this=0x7fffffff7aef, ary=..., ser=..., opt=0)
    at /home/prlavin/sst-miranda/install/include/sst/core/serialization/impl/serialize_array.h:101
#6  0x00007fffd62c6cce in SST::Core::Serialization::pvt::serialize<SST::Statistics::Statistic<unsigned long>* [4]>::operator() (this=0x7fffffff7b3f, t=..., ser=..., options=0) at /home/prlavin/sst-miranda/install/include/sst/core/serialization/serialize.h:168
#7  0x00007fffd62bbdb7 in SST::Core::Serialization::sst_ser_object<SST::Statistics::Statistic<unsigned long>* (&) [4]> (ser=..., obj=..., options=0, name=0x7fffd6351b7b "statReqs") at /home/prlavin/sst-miranda/install/include/sst/core/serialization/serialize.h:365
#8  0x00007fffd62b4f7f in SST::Miranda::RequestGenCPU::serialize_order (this=0x13c0270, ser=...) at mirandaCPU.h:164
#9  0x0000000000ab1e28 in SST::Core::Serialization::pvt::SerializeBaseComponentHelper::size_basecomponent (s=0x13c0270, ser=...) at baseComponent.cc:1183
#10 0x0000000000ac4b3d in SST::Core::Serialization::serialize_impl<SST::BaseComponent*, void>::operator() (this=0x7fffffff7c6d, s=@0x1311338: 0x13c0270, ser=..., options=0) at ../../../src/sst/core/baseComponent.h:1377
#11 0x0000000000abd555 in SST::Core::Serialization::pvt::serialize<SST::BaseComponent*>::operator() (this=0x7fffffff7d0f, t=@0x1311338: 0x13c0270, ser=..., options=0) at ../../../src/sst/core/serialization/serialize.h:273
#12 0x0000000000ab85bf in SST::Core::Serialization::sst_ser_object<SST::BaseComponent*&> (ser=..., obj=@0x1311338: 0x13c0270, options=0, name=0xd2fb3e "component") at ../../../src/sst/core/serialization/serialize.h:370
#13 0x0000000000aebb9a in SST::ComponentInfo::serialize_comp (this=0x13112e0, ser=...) at componentInfo.cc:195
#14 0x0000000000aec18f in SST::ComponentInfo::serialize_order (this=0x13112e0, ser=...) at componentInfo.cc:288
#15 0x0000000000ac4036 in SST::Core::Serialization::serialize_impl<SST::ComponentInfo*, void>::operator() (this=0x7fffffff7fbd, t=@0x7fffffff8090: 0x13112e0, ser=..., options=0) at ../../../src/sst/core/serialization/serialize.h:148
#16 0x0000000000abca29 in SST::Core::Serialization::pvt::serialize<SST::ComponentInfo*>::operator() (this=0x7fffffff805f, t=@0x7fffffff8090: 0x13112e0, ser=..., options=0) at ../../../src/sst/core/serialization/serialize.h:273
#17 0x0000000000ab7ddb in SST::Core::Serialization::sst_ser_object<SST::ComponentInfo*&> (ser=..., obj=@0x7fffffff8090: 0x13112e0, options=0, name=0xd4eb39 "compinfo") at ../../../src/sst/core/serialization/serialize.h:370
#18 0x0000000000bc4e18 in SST::Simulation_impl::checkpoint (this=0x13c0730, checkpoint_filename=...) at simulation.cc:1844
#19 0x0000000000ae2ad9 in SST::CheckpointAction::createCheckpoint (this=0x7fffe938cf48, sim=0x13c0730) at checkpointAction.cc:194
#20 0x0000000000ae25e4 in SST::CheckpointAction::execute (this=0x7fffe938cf48) at checkpointAction.cc:148
#21 0x0000000000bc0fcd in SST::Simulation_impl::run (this=0x13c0730) at simulation.cc:980
#22 0x0000000000a48f7e in start_simulation (tid=0, info=..., barrier=..., currentSimCycle=0, currentPriority=0) at main.cc:649
#23 0x0000000000a4b0a0 in main (argc=3, argv=0x7fffffffb398) at main.cc:1224

@gvoskuilen @feldergast any tips?
I was not properly serializing arrays of stats.

@plavin
Copy link
Contributor Author

plavin commented Oct 29, 2025

Current status: The code segfaults in prepareForComplete. The steps to reproduce are below. This is the input file used:
inorderstream_noprefetch.py

SST-Core: ecc9bb124cb1218cc3f3405f58b91744d489c795

Create checkpoint: sst --checkpoint-period=500us inorderstream_noprefetch.py

Restart with sst --load-checkpoint <checkpoint>.sstcpt

Running in gdb gives the following:

Warning:  BaseComponent destructor failed to remove ComponentInfo from parent.

Thread 1 "sstsim.x" received signal SIGSEGV, Segmentation fault.
0x0000000000aec4e6 in SST::ComponentInfo::prepareForComplete (this=0x28) at componentInfo.cc:337
337         if ( nullptr != link_map ) {

Backtrace:

#0  0x00000000005bc6f7 in SST::ComponentInfo::prepareForComplete (this=this@entry=0x28) at componentInfo.cc:340
#1  0x00000000005bc73f in SST::ComponentInfo::prepareForComplete (this=<optimized out>) at componentInfo.cc:346
#2  0x0000000000672949 in SST::Simulation_impl::complete (this=this@entry=0xd55b70) at simulation.cc:846
#3  0x0000000000569cbe in start_simulation (tid=<optimized out>, info=..., barrier=..., currentSimCycle=<optimized out>, currentPriority=<optimized out>) at main.cc:710
#4  0x000000000053fba1 in main (argc=<optimized out>, argv=<optimized out>) at main.cc:1325

@hughes-c hughes-c added this to the SST v16.0.0 milestone Nov 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

AT: WIP Mark PR as a Work in Progress (No Autotesting Performed) SST-miranda

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants