Skip to content

RF native failed to compile / load, inconsistent behavior to pure Python, dividing a tensor of type int #1749

@albertz

Description

@albertz
PyExtModCompiler call: g++ -shared -O2 -std=c++11 -fno-strict-overflow -Wsign-compare -DDYNAMIC_ANNOTATIONS_ENABLED=1 -DNDEBUG -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -O2 -fexceptions -g -grecord-gcc-switches -pipe -Wall -Werror=format-security -Wp,-D_FORTIFY_SOURCE=2 -Wp,-D_GLIBCXX_ASSERTIONS -fstack-protector-strong -m64 -march=x86-64-v2 -mtune=generic -fasynchronous-unwind-tables -fstack-clash-protection -fcf-protection -I /rwthfs/rz/cluster/home/az668407/setups/combined/2021-05-31/tools/returnn/returnn/frontend/_native -I /usr/include/python3.12 -fPIC -v -D_GLIBCXX_USE_CXX11_ABI=0 -g /w0/tmp/slurm_az668407.60282320/az668407/returnn_py_ext_mod_cache/_returnn_frontend_native/b20035631a/_returnn_frontend_native.cc -o /w0/tmp/slurm_az668407.60282320/az668407/returnn_py_ext_mod_cache/_returnn_frontend_native/b20035631a/_returnn_frontend_native.so
RETURNN frontend _native backend: Error while getting module:
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.30' not found (required by /w0/tmp/slurm_az668407.60282320/az668407/returnn_py_ext_mod_cache/_returnn_frontend_native/b20035631a/_returnn_frontend_native.so)
This is optional (although very recommended), so we continue without it.

So the compilation (or just the load of the native module) fails with /lib64/libstdc++.so.6: version GLIBCXX_3.4.30' not found`.

That then causes the following error:

...
  File "/rwthfs/rz/cluster/home/az668407/setups/combined/2021-05-31/tools/returnn/returnn/torch/frontend/_backend.py", line 1489, in TorchBackend.reduce
    line: correction_factor = rf.masked_fraction_of_shape(axis, inverse=True)
    locals:
      correction_factor = <local> None
      axis = <local> [Dim{B}, Dim{'⌈((-199+time)+-200)/160⌉'[B]}]
  File "/rwthfs/rz/cluster/home/az668407/setups/combined/2021-05-31/tools/returnn/returnn/frontend/dims.py", line 283, in masked_fraction_of_shape
    line: return (num_elems_masked / num_elems_total) if not inverse else (num_elems_total / num_elems_masked)
    locals:
      num_elems_masked = <local> Tensor{'reduce_sum', [], dtype='int64'}
      num_elems_total = <local> Tensor{'mul', [], dtype='int32'}
      inverse = <local> True
  File "/rwthfs/rz/cluster/home/az668407/setups/combined/2021-05-31/tools/returnn/returnn/tensor/_tensor_op_overloads.py", line 84, in _TensorOpOverloadsMixin.__truediv__
    line: return _rf().combine(self, "/", other)
    locals:
      self = <local> Tensor{'mul', [], dtype='int32'}
      other = <local> Tensor{'reduce_sum', [], dtype='int64'}
  File "/rwthfs/rz/cluster/home/az668407/setups/combined/2021-05-31/tools/returnn/returnn/frontend/math_.py", line 211, in combine
    line: raise ValueError(
              "Dividing a Tensor of type int by an integer is disallowed. Please convert the Tensor to float."
          )
ValueError: Dividing a Tensor of type int by an integer is disallowed. Please convert the Tensor to float.

...
Module call stack:
(Model.__call__) (root)
(BatchNorm.__call__) feature_batch_norm
(BatchNorm.__call__.<locals>.<lambda>) feature_batch_norm

This particular symptom / error was also described in #1637 (comment). The issue is that the optimized native RF code behaves different (allows such code) than the pure Python RF code.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions