Skip to content

Conversation

@ggerganov
Copy link
Member

No description provided.

@ggerganov
Copy link
Member Author

@JohannesGaessler The OPT_STEP_ADAMW test produced a NaN in the CPU reference test:

[OPT_STEP_ADAMW] NaN at index 0 (CUDA0=0.051476 CPU=-nan) ggml_backend_cuda_graph_compute: disabling CUDA graphs due to GPU architecture

https://github.com/ggml-org/ci/blob/05d107abc73d309ef1374fe13092815cb7bfa254/llama.cpp/19/d012e78a699c4df8c7f1ac325db4632a710d66/ggml-4-x86-cuda-v100/stdall#L9496

@JohannesGaessler
Copy link
Collaborator

The changes to the CPU implementation for the AdamW optimizer weren't synced. This PR still has the previous implementation where the optimizer parameters were stored differently and as a result an integer is interpreted as a float, leading to garbage results.

@ggerganov
Copy link
Member Author

Ah sorry. My mistake, I forgot to update the sync scripts.

@github-actions github-actions bot added testing Everything test related Nvidia GPU Issues specific to Nvidia GPUs ggml changes relating to the ggml tensor library for machine learning labels Nov 16, 2024
@ggerganov ggerganov closed this Nov 16, 2024
@ggerganov ggerganov deleted the sync branch November 16, 2024 20:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning Nvidia GPU Issues specific to Nvidia GPUs testing Everything test related

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants