benchmarks for openmp parallel for skip flag by jangorecki · Pull Request #7372 · Rdatatable/data.table

jangorecki · 2025-10-17T07:05:13Z

Towards #7371

this code assumes you have 8+ threads

cc()
do = function(what) {
  n = 1e8
  rbindlist(lapply(c("1e8+1"=1e8+1, "1e7"=1e7, "10"=10), function(half) {
    rbindlist(lapply(c("nothing"=1L,"volatile"=2L,"volatile+shared"=3L,"atomic write"=4L,"atomic read write"=5L, "reduction"=6L, "cancellation"=7L), function(variant) {
      a2<-system.time(
        a1<-omp_flags(variant, n, half, 8)
      )
      if (what == "time") as.list(a2)
      else if (what == "iters") as.list(a1)
    }), idcol="variant")
  }), idcol="halt")
}
do("iters")
do("time")

      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
 1:  1e8+1           nothing 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 2:  1e8+1          volatile 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 3:  1e8+1   volatile+shared 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 4:  1e8+1      atomic write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 5:  1e8+1 atomic read write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 6:  1e8+1         reduction 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 7:  1e8+1      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 8:    1e7           nothing 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 9:    1e7          volatile 10000000  9368188  2219459  2374732  2843583  2794889  2256903  2389781
10:    1e7   volatile+shared 10000000  9561233  2291372  2512503  3533221  3080863  2444200  2472036
11:    1e7      atomic write 10000000  9732664  2249224  2339365  2973864  2526774  2218488  2205693
12:    1e7 atomic read write 10000000 10005487  1710672  1566658  1930782  1774714  1555453  1497075
13:    1e7         reduction 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
14:    1e7      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
15:     10           nothing       10        0        0        0        0        0        0        0
16:     10          volatile       10        0        0        0        0        0        0        0
17:     10   volatile+shared       10        0        0        0        0        0        0        0
18:     10      atomic write       10        0        0        0        0        0        0        0
19:     10 atomic read write       10        0        0        0        0        0        0        0
20:     10         reduction       10 12500000 12500000 12500000 12500000 12500000 12500000 12500000
21:     10      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>
 1:  1e8+1           nothing     3.596    0.000   0.472      0.002     0.002
 2:  1e8+1          volatile     3.588    0.001   0.481      0.000     0.005
 3:  1e8+1   volatile+shared     3.529    0.001   0.477      0.001     0.003
 4:  1e8+1      atomic write     3.689    0.001   0.494      0.002     0.003
 5:  1e8+1 atomic read write     3.668    0.000   0.493      0.001     0.004
 6:  1e8+1         reduction     3.493    0.000   0.467      0.000     0.004
 7:  1e8+1      cancellation     4.284    0.001   0.572      0.000     0.005
 8:    1e7           nothing     3.460    0.000   0.461      0.001     0.003
 9:    1e7          volatile     2.852    0.000   0.363      0.001     0.004
10:    1e7   volatile+shared     2.654    0.000   0.335      0.002     0.003
11:    1e7      atomic write     2.608    0.000   0.330      0.002     0.002
12:    1e7 atomic read write     2.534    0.000   0.320      0.002     0.003
13:    1e7         reduction     3.279    0.001   0.446      0.000     0.005
14:    1e7      cancellation     4.361    0.001   0.581      0.000     0.004
15:     10           nothing     0.001    0.000   0.004      0.000     0.006
16:     10          volatile     0.045    0.000   0.010      0.002     0.002
17:     10   volatile+shared     0.038    0.000   0.009      0.000     0.005
18:     10      atomic write     0.000    0.000   0.004      0.000     0.005
19:     10 atomic read write     0.044    0.000   0.010      0.002     0.003
20:     10         reduction     2.520    0.000   0.382      0.000     0.004
21:     10      cancellation     4.296    0.002   0.576      0.000     0.005
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>

btw. I read volatile should not be used in favor of atomic

jangorecki · 2025-10-17T07:10:47Z

if atomic write alone would do then seems best fit

codecov · 2025-10-17T07:20:06Z

Codecov Report

❌ Patch coverage is 0% with 36 lines in your changes missing coverage. Please review.
✅ Project coverage is 98.90%. Comparing base (55b0de6) to head (e023ba3).
⚠️ Report is 4 commits behind head on master.

Files with missing lines	Patch %	Lines
src/omp-flags.c	0.00%	29 Missing ⚠️
R/utils.R	0.00%	7 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #7372      +/-   ##
==========================================
- Coverage   99.11%   98.90%   -0.22%     
==========================================
  Files          85       86       +1     
  Lines       16443    16479      +36     
==========================================
  Hits        16298    16298              
- Misses        145      181      +36

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

src/omp-flags.c

Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>

jangorecki · 2025-10-17T17:50:41Z

@ben-schwen it sems that new options are not much of use, maybe it is a compiler issue? I am on recent gcc, omp 201511.
And I observed warning:

omp-flags.c: In function ‘benchmark_omp_flag’:
omp-flags.c:95:17: warning: ‘cancel for’ inside ‘nowait’ for construct [-Wopenmp]
   95 |         #pragma omp cancel for
      |                 ^~~

ben-schwen · 2025-10-18T15:25:24Z

Interesting. With 20 threads I get this (which was my main motivation to include the reduction). I have gcc 11.4.0 and openmp 201511

      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
 1:  1e8+1           nothing 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 2:  1e8+1          volatile 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 3:  1e8+1   volatile+shared 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 4:  1e8+1      atomic write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 5:  1e8+1 atomic read write 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 6:  1e8+1         reduction 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 7:  1e8+1      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 8:    1e7           nothing 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
 9:    1e7          volatile 10000000  7523756  6139470 10163068  5543296  9819239 10011390 10223716
10:    1e7   volatile+shared 10000000  9721537 10062042  2737256 10091377  2760686  9725714  9918532
11:    1e7      atomic write 10000000 10143687  9958940 10128160  5549333  5473646  9742693 10022550
12:    1e7 atomic read write 10000000  9916360  9925820  2955716 10075786 10073474  9892110  2933117
13:    1e7         reduction 10000000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
14:    1e7      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
15:     10           nothing       10        0        0 12500000        0 12500000        0        0
16:     10          volatile       10        0        0        0    83056        0        0    74028
17:     10   volatile+shared       10    46021    19186    55439        0    28853    75632        0
18:     10      atomic write       10    90927    14187        0    95527    36864        0   142229
19:     10 atomic read write       10        0    69753   129379        0        0    90041        0
20:     10         reduction       10 12500000 12500000 12500000 12500000 12500000 12500000 12500000
21:     10      cancellation 12500000 12500000 12500000 12500000 12500000 12500000 12500000 12500000
      halt           variant       V1       V2       V3       V4       V5       V6       V7       V8
    <char>            <char>    <int>    <int>    <int>    <int>    <int>    <int>    <int>    <int>
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>
 1:  1e8+1           nothing     0.125    0.000   0.017      0.002     0.001
 2:  1e8+1          volatile     0.055    0.000   0.012      0.002     0.000
 3:  1e8+1   volatile+shared     0.080    0.000   0.014      0.003     0.000
 4:  1e8+1      atomic write     0.047    0.000   0.010      0.002     0.000
 5:  1e8+1 atomic read write     0.080    0.000   0.014      0.002     0.000
 6:  1e8+1         reduction     0.033    0.000   0.009      0.002     0.001
 7:  1e8+1      cancellation     0.964    0.000   0.126      0.002     0.000
 8:    1e7           nothing     0.070    0.000   0.011      0.003     0.000
 9:    1e7          volatile     0.053    0.000   0.012      0.002     0.000
10:    1e7   volatile+shared     0.057    0.000   0.011      0.002     0.000
11:    1e7      atomic write     0.015    0.004   0.007      0.003     0.000
12:    1e7 atomic read write     0.098    0.000   0.014      0.001     0.001
13:    1e7         reduction     0.062    0.000   0.011      0.002     0.000
14:    1e7      cancellation     1.012    0.000   0.123      0.002     0.001
15:     10           nothing     0.031    0.000   0.008      0.002     0.001
16:     10          volatile     0.069    0.000   0.010      0.002     0.000
17:     10   volatile+shared     0.054    0.000   0.010      0.002     0.000
18:     10      atomic write     0.001    0.000   0.003      0.001     0.002
19:     10 atomic read write     0.085    0.000   0.012      0.003     0.000
20:     10         reduction     0.031    0.000   0.008      0.001     0.002
21:     10      cancellation     1.061    0.000   0.152      0.002     0.000
      halt           variant user.self sys.self elapsed user.child sys.child
    <char>            <char>     <num>    <num>   <num>      <num>     <num>

jangorecki · 2025-10-18T16:53:13Z

@ben-schwen can you share also iterations made by each thread?
It is quite surprising that your code runs so much faster...

ben-schwen · 2025-10-19T13:19:31Z

@jangorecki I have added the iterations above. I have an 13th Gen Intel(R) Core(TM) i7-1370P CPU which can apparently clock up to 5.2 GHz which might be the reason why my times are flying

jangorecki · 2025-10-19T18:52:09Z

I think we can close this PR, changes according to this benchmark has been made in #7376 and #7361

benchmarks

ee6fb9d

jangorecki requested a review from MichaelChirico as a code owner October 17, 2025 07:05

jangorecki added the openmp label Oct 17, 2025

jangorecki requested a review from ben-schwen October 17, 2025 07:05

ben-schwen reviewed Oct 17, 2025

View reviewed changes

src/omp-flags.c Show resolved Hide resolved

Update src/omp-flags.c

e023ba3

Co-authored-by: Benjamin Schwendinger <52290390+ben-schwen@users.noreply.github.com>

jangorecki mentioned this pull request Oct 18, 2025

proper skip omp parallel loops in froll #7376

Merged

jangorecki closed this Oct 19, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

benchmarks for openmp parallel for skip flag#7372

benchmarks for openmp parallel for skip flag#7372
jangorecki wants to merge 2 commits intomasterfrom
omp-flag

jangorecki commented Oct 17, 2025 •

edited

Loading

Uh oh!

jangorecki commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025 •

edited

Loading

Uh oh!

Uh oh!

jangorecki commented Oct 17, 2025 •

edited

Loading

Uh oh!

ben-schwen commented Oct 18, 2025 •

edited

Loading

Uh oh!

jangorecki commented Oct 18, 2025

Uh oh!

ben-schwen commented Oct 19, 2025

Uh oh!

jangorecki commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

jangorecki commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jangorecki commented Oct 17, 2025

Uh oh!

codecov bot commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

jangorecki commented Oct 17, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ben-schwen commented Oct 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jangorecki commented Oct 18, 2025

Uh oh!

ben-schwen commented Oct 19, 2025

Uh oh!

jangorecki commented Oct 19, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

jangorecki commented Oct 17, 2025 •

edited

Loading

codecov bot commented Oct 17, 2025 •

edited

Loading

jangorecki commented Oct 17, 2025 •

edited

Loading

ben-schwen commented Oct 18, 2025 •

edited

Loading