Tool-calling with SLMs Demo

This is a small proof-of-concept demo showing how agent workflows with tool-calling SLMs can be implemented on CPU-only deployments.

A growing number of small (and not so large) language models exist, some specifically geared towards tool calling:

Gemma3 270m, 1b and 4b
Phi3 Mini
Qwen3 0.6B and 4B
Hermes 3 (Llama 3.2) 3B

In this proof-of-concept we deploy a model using llama-cpp and run it using the llama-server utility. You can deploy this demo comfortably on a CPU-only VM such as an Oracle Cloud Infrastructure (OCI) VM.Standard.E6.Flex shape with 8 oCPUs and 32GB memory.

Logistics PoC

The demo shows the use-case of an AI assistant used for supporting operations at a (fictitious) small enterprise logistics company. The chatbot interface has access to a number of tools/workflows for checking the delivery truck fleet, pending deliveries, scheduling new deliveries, assigning trucks to deliveries as well as routing and weather APIs. The (human) operator interacts with the different tools via the chatbot using it as a querying interface or to start workflows. Multi-task workflows, such as assigning a car to a delivery, fetching routing time estimationd and weather forecast and updating the different databases are all done transparently by the chatbot through the application server.

The application server is implemented using streamlit. The tool backend uses the LangChain tools API to expose Python function as tools and present them in the appropriate format for an OpenAI compatible server. It also implements the business logic, tool workflows and interaction with the language model, which it interacts with using the OpenAI REST API.

Setup instructions

Get some system dependencies, on ubuntu this should suffice:

sudo apt install git tmux htop \
    build-essential gcc-12 g++-12 cmake libcurl4-openssl-dev
mkdir $HOME/.local

Get and build llama.cpp

git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build
cmake --build build --config Release
cmake --install build --config Release --prefix $HOME/.local
echo "export PATH=$HOME/.local/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=$HOME/.local/lib:$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc

run the llama-cpp server (inside a tmux or screen session) like this:

 llama-server -hf  NousResearch/Hermes-3-Llama-3.2-3B-GGUF  -c 0 -fa on --jinja --chat-template-file models/templates/NousResearch-Hermes-3-Llama-3.1-8B-tool_use.jinja

Use uv for easy deployment:

curl -LsSf https://astral.sh/uv/install.sh | sh

Get the code:

git clone https://github.com/ohm314/slm_tool_calling_demo.git
cd slm_tool_calling_demo

Run the streamlit app:

uv run streamlit run src/logistics_poc/app.py

The app assumes llama-server is running on 127.0.0.1:8080

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
src/logistics_poc		src/logistics_poc
.gitignore		.gitignore
README.md		README.md
UPL.txt		UPL.txt
pyproject.toml		pyproject.toml
tool_calling.png		tool_calling.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Tool-calling with SLMs Demo

Logistics PoC

Setup instructions

License

About

Uh oh!

Releases

Packages

Languages

ohm314/slm_tool_calling_demo

Folders and files

Latest commit

History

Repository files navigation

Tool-calling with SLMs Demo

Logistics PoC

Setup instructions

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages