Skip to content

Conversation

@hosseinal
Copy link

Make sure to read the contributing guidelines before submitting a PR

This commit introduces two main changes:

1.  A new section in README.md providing a user-friendly guide on how to:
    *   Select a small model from Hugging Face (`microsoft/phi-2` as an example).
    *   Download and convert the model to GGUF format using the provided Python scripts.
    *   Run the converted model using `llama-cli`.
    This aims to improve the onboarding experience for new users.

2.  The `llamafile_sgemm` CPU matrix multiplication kernel in `ggml/src/ggml-cpu/llamafile/sgemm.cpp` has been replaced with a basic, triple-nested loop implementation.
    This was done for demonstrative and educational purposes as you requested, to show a very simple version of the kernel. The original optimized code has been commented out for reference.
This commit updates the README.md to provide more detailed
instructions for building llama.cpp with CUDA support.
It also adds a troubleshooting subsection to help you address
common build issues, such as the `std_function.h` parameter pack
error that can occur due to compiler/CUDA version incompatibilities.

The changes include:
- More specific CMake commands for building with CUDA.
- A direct link to NVIDIA's documentation for finding GPU compute
  capabilities.
- Guidance on checking compiler/CUDA compatibility and consulting
  community resources for build error solutions.
@hosseinal hosseinal closed this May 29, 2025
@github-actions github-actions bot added the ggml changes relating to the ggml tensor library for machine learning label May 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ggml changes relating to the ggml tensor library for machine learning

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant