Skip to content

CUDA RT Build Inconsistencies #52

@jasonltorchinsky

Description

@jasonltorchinsky

Hello!

I am just starting to familiarize myselves with the code, and have encountered an issue:

I have cloned the main repository and have added a .cmake file to the config subdirectory for my machine, which I'll call alpha.

After building the executable in a build directory via

cmake .. -DSYST=alpha -DUSECUDA=TRUE && make

I navigate to the rcemip directory to run the rcemip experiment, where I run the following commands

./make_links.sh
python test_rcemip_input_rt.py
bsub -I -n 1 -W 00:10 -gpu num=1 ../build/test_rte_rrtmgp_rt_gpu

and it successfully runs the experiment, i.e., returns values that look reasonable for the problem.

However, sometimes when I re-compile and re-run the experiment, all values returned for the ray tracer are zero.

For clarity, let a "good" executable be one that consistently (i.e., 10 runs out of 10) returns non-zero values for the ray tracer and a "bad" executable be one that consistently returns zeroes for the ray tracer. We have not observed any executables that are inconsistently good or bad.

The following is behavior that I have observed:

  • Sometimes cloning the original repository, adding a .cmake file for our machine, building the executable, and running the rcemip experiment yields a good executable, and sometimes it yields a bad executable.
  • A good executable will sometimes generate different rte_rrtmgp_kernel_tuning.txt files for different runs of the rcemip experiment.
  • A bad executable supplied with a rte_rrtmgp_kernel_tuning.txt file generated from a good executable will still give bad results.
  • A good executable supplied with a rte_rrtmgp_kernel_tuning.txt file generated by a bad executable will still give good results.
  • Recompiling a good executable, i.e., simply running make in the building directory instead of the full build command given above, has only been observed to yield a bad executable.
  • In the machine .cmake file, we have compiled executables with each of the flags -O0, -O2, and -O3, with seemingly no correlation betwen them the the goodness of the executable.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions