Skip to content

Conversation

glide-the
Copy link

@glide-the glide-the commented May 6, 2025

Support <|observation|> for function call behavior, add in the EOG detection logic for src/llama-vocab.cpp#L1976-L1977

Verification the PR

checkout Ref: #13058

1. Build

cmake llama.cpp -B llama.cpp/build \
  -DBUILD_SHARED_LIBS=OFF \
  -DGGML_CUDA=ON \
  -DLLAMA_CURL=ON \
  -DCMAKE_C_COMPILER=gcc-13 \
  -DCMAKE_CXX_COMPILER=g++-13 \
  -DCMAKE_CUDA_COMPILER=/usr/local/cuda-12.1/bin/nvcc

cmake  --build llama.cpp/build --config Release -j --clean-first --target llama-quantize llama-cli llama-gguf-split llama-server

2. Convert HF Weights

python convert_hf_to_gguf.py THUDM/glm-4-9b-chat-hf \
  --outfile glm-4-9b.gguf \
  --outtype q8_0

3. Run Inference

{
            "name": "C++ Server Launch",
            "type": "cppdbg",
            "request": "launch",
            "program": "${workspaceFolder}/build/bin/llama-server",
            "args": [
                "--jinja",
                "-m",
                "/mnt/ceph/develop/jiawei/model_checkpoint/glm-4-9b-chat-hf.gguf",
                "--port",
                "8000"
            ],
            "stopAtEntry": false,
            "cwd": "${workspaceFolder}",
            "environment": [],
            "externalConsole": false,
            "MIMode": "gdb",
            "setupCommands": [
                {
                    "description": "Enable pretty-printing for gdb",
                    "text": "-enable-pretty-printing",
                    "ignoreFailures": true
                }
            ],
            "miDebuggerPath": "/usr/bin/gdb"
 }

4.Funcation Call test

{
    "max_tokens": 1000,

    "tools": [
        {
            "type": "function",
            "function": {
                "name": "get_current_weather",
                "description": "Get the current weather in a given location",
                "parameters": {
                    "type": "object",
                    "properties": {
                        "location": {
                            "type": "string",
                            "description": "The city and state, e.g. San Francisco, CA"
                        },
                        "unit": {
                            "type": "string",
                            "enum": [
                                "celsius",
                                "fahrenheit"
                            ]
                        }
                    },
                    "required": [
                        "location"
                    ]
                }
            }
        }
    ], 
    "stream": false,
    "temperature": 0.5,
    "messages": [
        {
            "role": "user",
            "content": "北京天气怎么样"
        } 
    ],
    
    "model": "glm-4",
    "request_id": "mycompany-1713755521779"
}

@ngxson
Copy link
Collaborator

ngxson commented May 6, 2025

This should be done when converting HF --> GGUF. Please update convert_hf_to_gguf.py instead

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants