benchmarks for openmp parallel for skip flag#7372
Conversation
|
if atomic write alone would do then seems best fit |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #7372 +/- ##
==========================================
- Coverage 99.11% 98.90% -0.22%
==========================================
Files 85 86 +1
Lines 16443 16479 +36
==========================================
Hits 16298 16298
- Misses 145 181 +36 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>
|
@ben-schwen it sems that new options are not much of use, maybe it is a compiler issue? I am on recent gcc, omp 201511. |
|
Interesting. With 20 threads I get this (which was my main motivation to include the reduction). I have gcc 11.4.0 and openmp 201511 halt variant V1 V2 V3 V4 V5 V6 V7 V8
<char> <char> <int> <int> <int> <int> <int> <int> <int> <int>
1: 1e8+1 nothing 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
2: 1e8+1 volatile 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
3: 1e8+1 volatile+shared 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
4: 1e8+1 atomic write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
5: 1e8+1 atomic read write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
6: 1e8+1 reduction 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
7: 1e8+1 cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
8: 1e7 nothing 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
9: 1e7 volatile 10000000 7523756 6139470 10163068 5543296 9819239 10011390 10223716
10: 1e7 volatile+shared 10000000 9721537 10062042 2737256 10091377 2760686 9725714 9918532
11: 1e7 atomic write 10000000 10143687 9958940 10128160 5549333 5473646 9742693 10022550
12: 1e7 atomic read write 10000000 9916360 9925820 2955716 10075786 10073474 9892110 2933117
13: 1e7 reduction 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
14: 1e7 cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
15: 10 nothing 10 0 0 12500000 0 12500000 0 0
16: 10 volatile 10 0 0 0 83056 0 0 74028
17: 10 volatile+shared 10 46021 19186 55439 0 28853 75632 0
18: 10 atomic write 10 90927 14187 0 95527 36864 0 142229
19: 10 atomic read write 10 0 69753 129379 0 0 90041 0
20: 10 reduction 10 12500000 12500000 12500000 12500000 12500000 12500000 12500000
21: 10 cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
halt variant V1 V2 V3 V4 V5 V6 V7 V8
<char> <char> <int> <int> <int> <int> <int> <int> <int> <int>
halt variant user.self sys.self elapsed user.child sys.child
<char> <char> <num> <num> <num> <num> <num>
1: 1e8+1 nothing 0.125 0.000 0.017 0.002 0.001
2: 1e8+1 volatile 0.055 0.000 0.012 0.002 0.000
3: 1e8+1 volatile+shared 0.080 0.000 0.014 0.003 0.000
4: 1e8+1 atomic write 0.047 0.000 0.010 0.002 0.000
5: 1e8+1 atomic read write 0.080 0.000 0.014 0.002 0.000
6: 1e8+1 reduction 0.033 0.000 0.009 0.002 0.001
7: 1e8+1 cancellation 0.964 0.000 0.126 0.002 0.000
8: 1e7 nothing 0.070 0.000 0.011 0.003 0.000
9: 1e7 volatile 0.053 0.000 0.012 0.002 0.000
10: 1e7 volatile+shared 0.057 0.000 0.011 0.002 0.000
11: 1e7 atomic write 0.015 0.004 0.007 0.003 0.000
12: 1e7 atomic read write 0.098 0.000 0.014 0.001 0.001
13: 1e7 reduction 0.062 0.000 0.011 0.002 0.000
14: 1e7 cancellation 1.012 0.000 0.123 0.002 0.001
15: 10 nothing 0.031 0.000 0.008 0.002 0.001
16: 10 volatile 0.069 0.000 0.010 0.002 0.000
17: 10 volatile+shared 0.054 0.000 0.010 0.002 0.000
18: 10 atomic write 0.001 0.000 0.003 0.001 0.002
19: 10 atomic read write 0.085 0.000 0.012 0.003 0.000
20: 10 reduction 0.031 0.000 0.008 0.001 0.002
21: 10 cancellation 1.061 0.000 0.152 0.002 0.000
halt variant user.self sys.self elapsed user.child sys.child
<char> <char> <num> <num> <num> <num> <num> |
|
@ben-schwen can you share also iterations made by each thread? |
|
@jangorecki I have added the iterations above. I have an |
Towards #7371
this code assumes you have 8+ threads
btw. I read volatile should not be used in favor of atomic