Skip to content

ohm314/slm_tool_calling_demo

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

9 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Tool-calling with SLMs Demo

This is a small proof-of-concept demo showing how agent workflows with tool-calling SLMs can be implemented on CPU-only deployments.

A growing number of small (and not so large) language models exist, some specifically geared towards tool calling:

  • Gemma3 270m, 1b and 4b
  • Phi3 Mini
  • Qwen3 0.6B and 4B
  • Hermes 3 (Llama 3.2) 3B

In this proof-of-concept we deploy a model using llama-cpp and run it using the llama-server utility. You can deploy this demo comfortably on a CPU-only VM such as an Oracle Cloud Infrastructure (OCI) VM.Standard.E6.Flex shape with 8 oCPUs and 32GB memory.

Overview Deployment of Demo App

Logistics PoC

The demo shows the use-case of an AI assistant used for supporting operations at a (fictitious) small enterprise logistics company. The chatbot interface has access to a number of tools/workflows for checking the delivery truck fleet, pending deliveries, scheduling new deliveries, assigning trucks to deliveries as well as routing and weather APIs. The (human) operator interacts with the different tools via the chatbot using it as a querying interface or to start workflows. Multi-task workflows, such as assigning a car to a delivery, fetching routing time estimationd and weather forecast and updating the different databases are all done transparently by the chatbot through the application server.

The application server is implemented using streamlit. The tool backend uses the LangChain tools API to expose Python function as tools and present them in the appropriate format for an OpenAI compatible server. It also implements the business logic, tool workflows and interaction with the language model, which it interacts with using the OpenAI REST API.

Setup instructions

  • Get some system dependencies, on ubuntu this should suffice:
sudo apt install git tmux htop \
    build-essential gcc-12 g++-12 cmake libcurl4-openssl-dev
mkdir $HOME/.local
  • Get and build llama.cpp
git clone https://github.com/ggml-org/llama.cpp.git && cd llama.cpp
cmake -B build
cmake --build build --config Release
cmake --install build --config Release --prefix $HOME/.local
echo "export PATH=$HOME/.local/bin:$PATH" >> ~/.bashrc
echo "export LD_LIBRARY_PATH=$HOME/.local/lib:$LD_LIBRARY_PATH" >> ~/.bashrc
source ~/.bashrc
  • run the llama-cpp server (inside a tmux or screen session) like this:
 llama-server -hf  NousResearch/Hermes-3-Llama-3.2-3B-GGUF  -c 0 -fa on --jinja --chat-template-file models/templates/NousResearch-Hermes-3-Llama-3.1-8B-tool_use.jinja
  • Use uv for easy deployment:
curl -LsSf https://astral.sh/uv/install.sh | sh
  • Get the code:
git clone https://github.com/ohm314/slm_tool_calling_demo.git
cd slm_tool_calling_demo
  • Run the streamlit app:
uv run streamlit run src/logistics_poc/app.py

The app assumes llama-server is running on 127.0.0.1:8080

License

Copyright (c) 2025 Omar Awile Licensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl/

About

Tool-calling SLMs on CPU compute demo

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages