Skip to content

Commit ce05312

Browse files
committed
[csm] regenerate all processes with colorsum/simd patches and a separate mgOnGpuVectorsSplitMerge.h
1 parent 4b3a141 commit ce05312

File tree

135 files changed

+9009
-4217
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

135 files changed

+9009
-4217
lines changed

epochX/cudacpp/ee_mumu.mad/CODEGEN_mad_ee_mumu_log.txt

Lines changed: 34 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -46,9 +46,10 @@ Please set the 'lhapdf' variable to the (absolute) /PATH/TO/lhapdf-config (inclu
4646
Note that you can still compile and run aMC@NLO with the built-in PDFs
4747
MG5_aMC> set lhapdf /PATH/TO/lhapdf-config
4848

49+
Using default text editor "vi". Set another one in ./input/mg5_configuration.txt
4950
Using default eps viewer "evince". Set another one in ./input/mg5_configuration.txt
5051
Using default web browser "firefox". Set another one in ./input/mg5_configuration.txt
51-
import /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu.mg
52+
import /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu.mg
5253
The import format was not given, so we guess it as command
5354
set stdout_level DEBUG
5455
set output information to level: 10
@@ -57,7 +58,7 @@ generate e+ e- > mu+ mu-
5758
No model currently active, so we import the Standard Model
5859
INFO: load particles
5960
INFO: load vertices
60-
DEBUG: model prefixing takes 0.004445075988769531 
61+
DEBUG: model prefixing takes 0.005223274230957031 
6162
INFO: Restrict model sm with file models/sm/restrict_default.dat .
6263
DEBUG: Simplifying conditional expressions 
6364
DEBUG: remove interactions: u s w+ at order: QED=1 
@@ -149,21 +150,21 @@ INFO: Checking for minimal orders which gives processes.
149150
INFO: Please specify coupling orders to bypass this step.
150151
INFO: Trying process: e+ e- > mu+ mu- WEIGHTED<=4 @1
151152
INFO: Process has 2 diagrams
152-
1 processes with 2 diagrams generated in 0.003 s
153+
1 processes with 2 diagrams generated in 0.004 s
153154
Total: 1 processes with 2 diagrams
154155
output madevent_simd ../TMPOUT/CODEGEN_mad_ee_mumu --hel_recycling=False --vector_size=32
155156
Output will be done with PLUGIN: CUDACPP_OUTPUT
156157
Addition matrix-element will be done with PLUGIN: CUDACPP_OUTPUT
157158
DEBUG: opt['output_options']['vector_size'] =  32 [export_v4.py at line 4168] 
158159
Output will be done with PLUGIN: CUDACPP_OUTPUT
159-
DEBUG: Entering PLUGIN_ProcessExporter.__init__ (initialise the exporter) [output.py at line 175] 
160+
DEBUG: Entering PLUGIN_ProcessExporter.__init__ (initialise the exporter) [output.py at line 176] 
160161
INFO: initialize a new directory: CODEGEN_mad_ee_mumu
161162
INFO: remove old information in CODEGEN_mad_ee_mumu
162-
DEBUG: Entering PLUGIN_ProcessExporter.copy_template (initialise the directory) [output.py at line 180] 
163-
WARNING: File exists /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu 
164-
INFO: Creating subdirectories in directory /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu
165-
WARNING: File exists /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards 
166-
WARNING: File exists /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/SubProcesses 
163+
DEBUG: Entering PLUGIN_ProcessExporter.copy_template (initialise the directory) [output.py at line 181] 
164+
WARNING: File exists /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu 
165+
INFO: Creating subdirectories in directory /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu
166+
WARNING: File exists /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards 
167+
WARNING: File exists /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/SubProcesses 
167168
INFO: Organizing processes into subprocess groups
168169
INFO: Generating Helas calls for process: e+ e- > mu+ mu- WEIGHTED<=4 @1
169170
INFO: Processing color information for process: e+ e- > mu+ mu- @1
@@ -179,18 +180,18 @@ INFO: Finding symmetric diagrams for subprocess group epem_mupmum
179180
DEBUG: iconfig_to_diag =  {1: 1, 2: 2} [model_handling.py at line 1576] 
180181
DEBUG: diag_to_iconfig =  {1: 1, 2: 2} [model_handling.py at line 1577] 
181182
Generated helas calls for 1 subprocesses (2 diagrams) in 0.004 s
182-
Wrote files for 8 helas calls in 0.060 s
183+
Wrote files for 8 helas calls in 0.068 s
183184
ALOHA: aloha starts to compute helicity amplitudes
184185
ALOHA: aloha creates FFV1 routines
185186
ALOHA: aloha creates FFV2 routines
186187
ALOHA: aloha creates FFV4 routines
187-
ALOHA: aloha creates 3 routines in 0.170 s
188+
ALOHA: aloha creates 3 routines in 0.188 s
188189
ALOHA: aloha starts to compute helicity amplitudes
189190
ALOHA: aloha creates FFV1 routines
190191
ALOHA: aloha creates FFV2 routines
191192
ALOHA: aloha creates FFV4 routines
192193
ALOHA: aloha creates FFV2_4 routines
193-
ALOHA: aloha creates 7 routines in 0.184 s
194+
ALOHA: aloha creates 7 routines in 0.240 s
194195
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
195196
<class 'aloha.create_aloha.AbstractRoutine'> FFV1
196197
<class 'aloha.create_aloha.AbstractRoutine'> FFV2
@@ -199,32 +200,32 @@ ALOHA: aloha creates 7 routines in 0.184 s
199200
<class 'aloha.create_aloha.AbstractRoutine'> FFV4
200201
<class 'aloha.create_aloha.AbstractRoutine'> FFV2_4
201202
<class 'aloha.create_aloha.AbstractRoutine'> FFV2_4
202-
FileWriter <class 'MG5aMC_PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/./HelAmps_sm.h
203-
INFO: Created file HelAmps_sm.h in directory /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/.
203+
FileWriter <class 'MG5aMC_PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/./HelAmps_sm.h
204+
INFO: Created file HelAmps_sm.h in directory /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/.
204205
super_write_set_parameters_onlyfixMajorana (hardcoded=False)
205206
super_write_set_parameters_onlyfixMajorana (hardcoded=True)
206-
FileWriter <class 'MG5aMC_PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/./Parameters_sm.h
207-
FileWriter <class 'MG5aMC_PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/./Parameters_sm.cc
207+
FileWriter <class 'MG5aMC_PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/./Parameters_sm.h
208+
FileWriter <class 'MG5aMC_PLUGIN.CUDACPP_OUTPUT.model_handling.PLUGIN_CPPWriter'> for /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/./Parameters_sm.cc
208209
INFO: Created files Parameters_sm.h and Parameters_sm.cc in directory
209-
INFO: /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/. and /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/.
210+
INFO: /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/. and /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/src/.
210211
The option zerowidth_tchannel is modified [True] but will not be written in the configuration files.
211212
If you want to make this value the default for future session, you can run 'save options --all'
212-
save configuration file to /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
213+
save configuration file to /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
213214
INFO: Use Fortran compiler gfortran
214215
INFO: Use c++ compiler g++
215216
INFO: Generate jpeg diagrams
216217
INFO: Generate web pages
217-
DEBUG: result.returncode =  0 [output.py at line 273] 
218-
Output to directory /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu done.
218+
DEBUG: result.returncode =  0 [output.py at line 274] 
219+
Output to directory /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu done.
219220
Type "launch" to generate events from this process, or see
220-
/home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/README
221+
/data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/README
221222
Run "open index.html" to see more information about this process.
222223
quit
223224

224-
real 0m2.396s
225-
user 0m1.798s
226-
sys 0m0.425s
227-
Code generation completed in 2 seconds
225+
real 0m2.135s
226+
user 0m1.760s
227+
sys 0m0.316s
228+
Code generation completed in 3 seconds
228229
************************************************************
229230
* *
230231
* W E L C O M E to *
@@ -245,9 +246,10 @@ Code generation completed in 2 seconds
245246
* Type 'help' for in-line help. *
246247
* *
247248
************************************************************
248-
INFO: load configuration from /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
249-
INFO: load configuration from /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/mg5amcnlo/input/mg5_configuration.txt
250-
INFO: load configuration from /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
249+
INFO: load configuration from /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
250+
INFO: load configuration from /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/mg5amcnlo/input/mg5_configuration.txt
251+
INFO: load configuration from /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
252+
Using default text editor "vi". Set another one in ./input/mg5_configuration.txt
251253
Using default eps viewer "evince". Set another one in ./input/mg5_configuration.txt
252254
Using default web browser "firefox". Set another one in ./input/mg5_configuration.txt
253255
treatcards run
@@ -274,9 +276,10 @@ launch in debug mode
274276
* Type 'help' for in-line help. *
275277
* *
276278
************************************************************
277-
INFO: load configuration from /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
278-
INFO: load configuration from /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/mg5amcnlo/input/mg5_configuration.txt
279-
INFO: load configuration from /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
279+
INFO: load configuration from /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
280+
INFO: load configuration from /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/mg5amcnlo/input/mg5_configuration.txt
281+
INFO: load configuration from /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/TMPOUT/CODEGEN_mad_ee_mumu/Cards/me5_configuration.txt
282+
Using default text editor "vi". Set another one in ./input/mg5_configuration.txt
280283
Using default eps viewer "evince". Set another one in ./input/mg5_configuration.txt
281284
Using default web browser "firefox". Set another one in ./input/mg5_configuration.txt
282285
treatcards param

epochX/cudacpp/ee_mumu.mad/Cards/me5_configuration.txt

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -235,7 +235,7 @@
235235
# pineappl = pineappl
236236

237237

238-
#mg5_path = /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/mg5amcnlo
238+
#mg5_path = /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/mg5amcnlo
239239

240240
# MG5 MAIN DIRECTORY
241-
#mg5_path = /home/dmass/Development/madgraph4gpu/release-v1.01.01/MG5aMC/mg5amcnlo
241+
#mg5_path = /data/avalassi/GPU2025/test-madgraph4gpu/MG5aMC/mg5amcnlo

epochX/cudacpp/ee_mumu.mad/SubProcesses/P1_epem_mupmum/color_sum.cc

Lines changed: 38 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -3,9 +3,16 @@
33
// Created by: A. Valassi (Sep 2025) for the MG5aMC CUDACPP plugin.
44
// Further modified by: A. Valassi (2025) for the MG5aMC CUDACPP plugin.
55

6+
#include "mgOnGpuConfig.h"
7+
8+
// For tests: disable autovectorization in gcc (in the cppnone mode only)
9+
//#ifndef MGONGPU_CPPSIMD
10+
//#pragma GCC optimize("no-tree-vectorize")
11+
//#endif
12+
613
#include "color_sum.h"
714

8-
#include "mgOnGpuConfig.h"
15+
#include "mgOnGpuVectorsSplitMerge.h"
916

1017
#include "MemoryAccessMatrixElements.h"
1118

@@ -95,60 +102,69 @@ namespace mg5amcCpu
95102
// and also use constexpr to compute "2*" and "/colorDenom[icol]" once and for all at compile time:
96103
// we gain (not a factor 2...) in speed here as we only loop over the up diagonal part of the matrix.
97104
// Strangely, CUDA is slower instead, so keep the old implementation for the moment.
98-
fptype_sv deltaMEs = { 0 };
99-
#if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
100-
fptype_sv deltaMEs_next = { 0 };
101-
// Mixed mode: merge two neppV vectors into one neppV2 vector
105+
fptype2_sv deltaMEs2 = { 0 };
106+
#if not defined MGONGPU_CPPSIMD or ( defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT )
107+
// Mixed mode: must convert from double to float and possibly merge SIMD vectors
108+
// Double/float mode without SIMD: pre-create jampR_sv/jampI_sv vectors (faster and more robust)
102109
fptype2_sv jampR_sv[ncolor];
103110
fptype2_sv jampI_sv[ncolor];
104111
for( int icol = 0; icol < ncolor; icol++ )
105112
{
113+
#if defined MGONGPU_CPPSIMD
114+
// Mixed mode with SIMD: merge two neppV double vectors into one neppV2 float vector
106115
jampR_sv[icol] = fpvmerge( cxreal( allJamp_sv[icol] ), cxreal( allJamp_sv[ncolor + icol] ) );
107116
jampI_sv[icol] = fpvmerge( cximag( allJamp_sv[icol] ), cximag( allJamp_sv[ncolor + icol] ) );
117+
#else
118+
// Mixed mode without SIMD: convert double to float
119+
// Double/float mode without SIMD: pre-create jampR_sv/jampI_sv vectors (faster and more robust)
120+
jampR_sv[icol] = cxreal( allJamp_sv[icol] );
121+
jampI_sv[icol] = cximag( allJamp_sv[icol] );
122+
#endif
108123
}
109124
#else
125+
// Double/float mode with SIMD: do not pre-create jampR_sv/jampI_sv vectors (would be slower)
110126
const cxtype_sv* jamp_sv = allJamp_sv;
111127
#endif
112128
// Loop over icol
113129
for( int icol = 0; icol < ncolor; icol++ )
114130
{
115131
// Diagonal terms
116-
#if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
117-
fptype2_sv& jampRi_sv = jampR_sv[icol];
118-
fptype2_sv& jampIi_sv = jampI_sv[icol];
132+
#if not defined MGONGPU_CPPSIMD or ( defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT )
133+
const fptype2_sv& jampRi_sv = jampR_sv[icol];
134+
const fptype2_sv& jampIi_sv = jampI_sv[icol];
119135
#else
120-
fptype2_sv jampRi_sv = (fptype2_sv)( cxreal( jamp_sv[icol] ) );
121-
fptype2_sv jampIi_sv = (fptype2_sv)( cximag( jamp_sv[icol] ) );
136+
const fptype2_sv& jampRi_sv = cxreal( jamp_sv[icol] );
137+
const fptype2_sv& jampIi_sv = cximag( jamp_sv[icol] );
122138
#endif
123139
fptype2_sv ztempR_sv = cf2.value[icol][icol] * jampRi_sv;
124140
fptype2_sv ztempI_sv = cf2.value[icol][icol] * jampIi_sv;
125141
// Loop over jcol
126142
for( int jcol = icol + 1; jcol < ncolor; jcol++ )
127143
{
128144
// Off-diagonal terms
129-
#if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
130-
fptype2_sv& jampRj_sv = jampR_sv[jcol];
131-
fptype2_sv& jampIj_sv = jampI_sv[jcol];
145+
#if not defined MGONGPU_CPPSIMD or ( defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT )
146+
const fptype2_sv& jampRj_sv = jampR_sv[jcol];
147+
const fptype2_sv& jampIj_sv = jampI_sv[jcol];
132148
#else
133-
fptype2_sv jampRj_sv = (fptype2_sv)( cxreal( jamp_sv[jcol] ) );
134-
fptype2_sv jampIj_sv = (fptype2_sv)( cximag( jamp_sv[jcol] ) );
149+
const fptype2_sv& jampRj_sv = cxreal( jamp_sv[jcol] );
150+
const fptype2_sv& jampIj_sv = cximag( jamp_sv[jcol] );
135151
#endif
136152
ztempR_sv += cf2.value[icol][jcol] * jampRj_sv;
137153
ztempI_sv += cf2.value[icol][jcol] * jampIj_sv;
138154
}
139-
fptype2_sv deltaMEs2 = ( jampRi_sv * ztempR_sv + jampIi_sv * ztempI_sv ); // may underflow #831
140-
#if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
141-
deltaMEs += fpvsplit0( deltaMEs2 );
142-
deltaMEs_next += fpvsplit1( deltaMEs2 );
143-
#else
144-
deltaMEs += deltaMEs2;
145-
#endif
155+
deltaMEs2 += ( jampRi_sv * ztempR_sv + jampIi_sv * ztempI_sv ); // may underflow #831
146156
}
147157
// *** STORE THE RESULTS ***
148158
using E_ACCESS = HostAccessMatrixElements; // non-trivial access: buffer includes all events
149159
fptype* MEs = E_ACCESS::ieventAccessRecord( allMEs, ievt0 );
150160
// NB: color_sum ADDS |M|^2 for one helicity to the running sum of |M|^2 over helicities for the given event(s)
151161
fptype_sv& MEs_sv = E_ACCESS::kernelAccess( MEs );
162+
#if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
163+
fptype_sv deltaMEs = fpvsplit0( deltaMEs2 );
164+
fptype_sv deltaMEs_next = fpvsplit1( deltaMEs2 );
165+
#else
166+
fptype_sv deltaMEs = deltaMEs2;
167+
#endif
152168
MEs_sv += deltaMEs; // fix #435
153169
#if defined MGONGPU_CPPSIMD and defined MGONGPU_FPTYPE_DOUBLE and defined MGONGPU_FPTYPE2_FLOAT
154170
fptype* MEs_next = E_ACCESS::ieventAccessRecord( allMEs, ievt0 + neppV );

0 commit comments

Comments
 (0)