Skip to content

RuntimeError: Error building extension 'wkv_1024_bf16' #23

@vision-zhao

Description

@vision-zhao

here is my code:
`export RWKV_CUDA_ON=0
export RWKV_JIT_ON=1

python3 train.py
--load_model ../../pretrained/RWKV-4-Raven-1B5-v9-Eng99-20230411-ctx4096.pth
--proj_dir ./saved_models
--data_file ./data/intent_train_prompt.json
--data_type 'utf-8'
--vocab_size 50277 --ctx_len 1024 --epoch_steps 1000 --epoch_count 1000 --epoch_begin 0 --epoch_save 5 --micro_bsz 2
--n_layer 24 --n_embd 2048 --pre_ffn 0 --head_qk 0 --lr_init 1e-4 --lr_final 1e-4 --warmup_steps 0 --beta1 0.9 --beta2 0.999
--adam_eps 1e-8 --accelerator gpu --devices 1 --precision bf16 --strategy deepspeed_stage_2 --grad_cp 0
--lora --lora_r 8 --lora_alpha 16 --lora_dropout 0.01
--lora_parts=att,ffn,time,ln # configure which parts to finetune`

here is my environment:
torch 1.13.1+cu117 deepspeed 0.7.0 pytorch_lightning 1.9.1

then here is the error code:
[1/2] /data/app/summerzhao/resource/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=wkv_1024_bf16 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include/TH -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include/THC -isystem /data/app/summerzhao/resource/cuda-11.0/include -isystem /data/app/anaconda3/envs/pre-t/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -t 4 -std=c++14 -res-usage --maxrregcount 60 --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DTmax=1024 -c /data/app/summerzhao/llm/rwkv/RWKV-LM-LoRA-main/RWKV-v4neo/cuda/wkv_cuda_bf16.cu -o wkv_cuda_bf16.cuda.o FAILED: wkv_cuda_bf16.cuda.o /data/app/summerzhao/resource/cuda-11.0/bin/nvcc -DTORCH_EXTENSION_NAME=wkv_1024_bf16 -DTORCH_API_INCLUDE_EXTENSION_H -DPYBIND11_COMPILER_TYPE=\"_gcc\" -DPYBIND11_STDLIB=\"_libstdcpp\" -DPYBIND11_BUILD_ABI=\"_cxxabi1011\" -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include/torch/csrc/api/include -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include/TH -isystem /data/app/anaconda3/envs/pre-t/lib/python3.8/site-packages/torch/include/THC -isystem /data/app/summerzhao/resource/cuda-11.0/include -isystem /data/app/anaconda3/envs/pre-t/include/python3.8 -D_GLIBCXX_USE_CXX11_ABI=0 -D__CUDA_NO_HALF_OPERATORS__ -D__CUDA_NO_HALF_CONVERSIONS__ -D__CUDA_NO_BFLOAT16_CONVERSIONS__ -D__CUDA_NO_HALF2_OPERATORS__ --expt-relaxed-constexpr -gencode=arch=compute_70,code=compute_70 -gencode=arch=compute_70,code=sm_70 --compiler-options '-fPIC' -t 4 -std=c++14 -res-usage --maxrregcount 60 --use_fast_math -O3 -Xptxas -O3 --extra-device-vectorization -DTmax=1024 -c /data/app/summerzhao/llm/rwkv/RWKV-LM-LoRA-main/RWKV-v4neo/cuda/wkv_cuda_bf16.cu -o wkv_cuda_bf16.cuda.o nvcc fatal : Unknown option '-t' ninja: build stopped: subcommand failed.

how to fix it? if anyone could provide any suggestions, I would be very grateful.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions