proof guide.en-gb.md

Jessica41 · web-flow · commit 0007de867d7a · 2025-08-06T09:43:57.000-04:00
diff --git a/pages/public_cloud/ai_machine_learning/endpoints_guide_06_function_calling/guide.en-gb.md b/pages/public_cloud/ai_machine_learning/endpoints_guide_06_function_calling/guide.en-gb.md
@@ -1,7 +1,7 @@
 ---
 title: AI Endpoints - Function Calling
 excerpt: Learn how to use Function Calling with OVHcloud AI Endpoints
-updated: 2025-08-01
+updated: 2025-08-06
 ---
 
 > [!primary]
@@ -15,8 +15,7 @@ updated: 2025-08-01
 
 **Function Calling**, also known as tool calling, is a feature that enables a large language model (LLM) to trigger user-defined functions (also named tools). These tools are defined by the developer and implement specific behaviors such as calling an API, fetching data or calculating values, which extends the capabilities of the LLM.
 
-The LLM will identify which tool(s) to call and the arguments to use.
-This feature can be used to develop assistants or agents, for instance.
+The LLM will identify which tool(s) to call and the arguments to use. This feature can be used to develop assistants or agents, for instance.
 
 ## Objective
 
@@ -27,51 +26,55 @@ Visit our [Catalog](https://endpoints.ai.cloud.ovh.net/catalog) to find out whic
 
 ## Requirements
 
-We use Python for the examples provided through this guide.
+We use Python for the examples provided in this guide.
 
 Make sure you have a [Python](https://www.python.org/) environment configured, and install the [openai client](https://pypi.org/project/openai/).
+
 ```sh
 pip install openai
 ```
 
 ### Authentication & rate limiting
 
-All the examples provided in this guide are using the anonymous authentication which makes it simpler to use but may cause rate limiting issues.
-If you wish to enable authentication using your own token, simply specify your API key within the requests.
-Follow the following instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) for more information on authentication.
+All the examples provided in this guide use the anonymous authentication which makes it simpler to use but may cause rate limiting issues. If you wish to enable authentication using your own token, simply specify your API key within the requests.
+
+Follow the instructions in the [AI Endpoints - Getting Started](/pages/public_cloud/ai_machine_learning/endpoints_guide_01_getting_started) guide for more information on authentication.
 
 ## Function Calling overview
 
 The workflow to use function calling is described below:
+
 1. **Define tools**: tell the model what tools it can use, with a JSON schema for each tool.
 2. **Call the model with tools**: pass tools along with your system and user messages to the model, which will eventually generate tool calls.
 3. **Process tools calls**: for each tool call returned by the model, execute the actual implementation of the tool in your code.
 4. **Call the model with tools responses**: send a new request to the model, with the conversation updated with tool calls results.
-4. **Final response**: process the final generated answer, which takes the tools results into account.
+5. **Final response**: process the final generated answer, which takes the tools results into account.
 
 This diagram illustrates the workflow:
 
 ![Function calling workflow](images/function_calling_workflow.png)
 
-## Example: a time-tracking assistant
+**Example: a time-tracking assistant**
 
 To illustrate the use of function calling and progressively introduce the important notions related to this feature, we are going to develop a time-tracking assistant, step-by-step.
 
 The assistant will be able to:
-* log time spent on a task
-* generate a time report
 
-Each task has a name, category and total duration in minutes. Categories are a fixed list of strings, for example "Code" or "Meetings".
-A time report can be generated for a category of task.
+ * log time spent on a task
+ * generate a time report
+
+Each task has a name, category and total duration in minutes. Categories are a fixed list of strings, for example "Code" or "Meetings". A time report can be generated for a category of task.
 
 The user will be able to interact with the assistant to log time and get information about how time was spent.
 
 ### Define tools
 
-Our time-tracking assistant will use two tools :
+Our time-tracking assistant will use two tools:
+
 * `log_work`: log time spent on a task. Take the name of the task, category, duration and unit (minutes or hours).
   For example, to log 2 hours on documentation writing, you would call `log_work("User guide", "Documentation", 2, "hours")`
-* `time_report`: get JSON data about all tasks of a given category, and the total duration, in a given time unit (minutes or hours).
+  
+* `time_report`: get JSON data about all tasks of a given category, and the total duration in a given time unit (minutes or hours).
   For example, to get the breakdown on time spent on coding tasks, in hours, you would call `time_report("Code", "hours")`
 
 To get the model to use those tools, first we have to declare them with JSON schemas, in a `tools` list that we will pass to the Chat Completion API.
@@ -143,8 +146,7 @@ TOOLS = [
 
 ### Generate tool calls
 
-With our tools ready, we can now try to call the model and see if it understands our tools definition.
-We use the OpenAI Python SDK to call the ``/v1/chat/completions`` route on the endpoint, passing the tools definition in the `tools` parameter.
+With our tools ready, we can now try to call the model and see if it understands our tools definition. We use the OpenAI Python SDK to call the ``/v1/chat/completions`` route on the endpoint, passing the tools definition in the `tools` parameter.
 
 Let's send a simple user message: `log 1 hour team meeting` and see what the model answers.
 
@@ -181,6 +183,7 @@ print(assistant_response.to_json())
 ```
 
 Output:
+
 ```json
 {
   "role": "assistant",
@@ -202,6 +205,7 @@ We see that the model correctly identified that it needed to call the `log_work`
 The `tool_calls` list contains the tool calls the model generated in response to our user message.
 The `name` and `arguments` fields specify which tool to call and which parameters to pass to the function.
 The `id` is a unique identifier for this tool call, that we will need later on.
+
 You can have multiple tool calls in this list.
 
 Under the hood, the model has recognized that the user's intent was related to the set of tools provided, and generated a sequence of specific tokens that were post-processed to create a tool call object.
@@ -210,9 +214,9 @@ We add this message to the conversation so that the model is aware of this tool
 
 ### Process tools calls
 
-Now that we see that the model is able to generate tool calls, we need to code the Python implementation of the tools, so that we can process the tool calls the LLM will generate and actually start to log time!
-Each task is stored in a dict, with the name as the key.
-Categories are a fixed list.
+Now that we see that the model is able to generate tool calls, we need to code the Python implementation of the tools, so that we can process the tool calls generated by the LLM and start recording the time!
+
+Each task is stored in a dict, with the name as the key and categories are a fixed list.
 
 We define the two functions, `log_work` and `time_report`, in the Python code below:
 
@@ -279,8 +283,7 @@ def to_minutes(duration: float, unit: str) -> float:
         raise ValueError("Invalid unit. Must be 'minutes', or 'hours'.")
 ```
 
-Now, let's see how we can process tool calls generated by the model.
-
+Now, let's see how we can process tool calls generated by the model:
 
 ```python
 # this map allows us to know which Python function for a given tool
@@ -307,6 +310,7 @@ if assistant_response.tool_calls:
 ```
 
 Output:
+
 ```
 <	1 tool(s) to call
 >		Execute tool log_work with arguments {'task_name': 'team meeting', 'task_category': 'Meetings', 'duration': 1, 'unit': 'hours'}
@@ -317,9 +321,10 @@ We see that we successfully created a task called "team meeting", in the "Meetin
 
 ### Send tool calls results and get the final response
 
-Now that we have executed our tool calls, we have to send the result back to the model, so that it can generate a new response that takes this new information into account, to tell the user the task has been created successfully or to give the time report for instance.
+Now that we have executed our tool calls, we need to send the result back to the model so that it can generate a new response taking this new information into account, for example to inform the user that the task has been successfully created or to provide the hourly report.
 
 All we have to do, is add the tool results as new `tool` messages into the conversation, so we'll update our code:
+
 ```python
 if assistant_response.tool_calls:
     print(f"<\t{len(assistant_response.tool_calls)} tool(s) to call")
@@ -360,6 +365,7 @@ print(f"<\t\tAssistant final answer:\n{response.choices[0].message.content}")
 ```
 
 Output:
+
 ```
 <	1 tool(s) to call
 >		Execute tool log_work with arguments {'task_name': 'team meeting', 'task_category': 'Meetings', 'duration': 1, 'unit': 'hours'}
@@ -378,8 +384,9 @@ The assistant has generated a response acknowledging the creation of the task.
 ### Add a system prompt
 
 To make our assistant more robust and powerful, it can be useful to add a system prompt that:
-* explains what is expected from the model
-* provides useful information to the model, such as the current existing tasks and categories
+
+ * explains what is expected from the model.
+ * provides useful information to the model, such as the current existing tasks and categories.
 
 ```python
 SYSTEM_PROMPT = \
@@ -409,10 +416,11 @@ With this system prompt, the model will be able to use data about existing tasks
 ### Putting it all together
 
 Now we can combine all notions we've seen so far to create a `query` method that will:
-* call the model with the formatted system prompt and user message
-* process tool calls
-* call the model a second time with the tool results
-* output the final answer
+
+ * call the model with the formatted system prompt and user message
+ * process tool calls
+ * call the model a second time with the tool results
+ * output the final answer
 
 ```python
 def query(user_prompt: str):
@@ -486,6 +494,7 @@ query("on which task did I spent most hours coding?")
 ```
 
 Output:
+
 ```
 > Querying assistant with user prompt: Spent 2 hours coding on Feature A
 <	1 tool(s) to call
@@ -585,6 +594,7 @@ Based on the tracking data, the task on which you spent the most hours coding is
 Mission accomplished!
 
 ## Tips and best practices
+
 This section contains additional tips that may improve the performance of Function Calling queries.
 
 ### Tool choice
@@ -605,6 +615,7 @@ Here are the available values for this parameter and the impact on the output.
 It is possible to use Function Calling in streaming mode, by setting `stream` to `true` in your request.
 
 Let's see an example with cURL and the LLaMa 3.1 8B model:
+
 ```bash
 curl -X 'POST'
         'https://llama-3-1-8b-instruct.endpoints.kepler.ai.cloud.ovh.net/api/openai_compat/v1/chat/completions'
@@ -661,6 +672,7 @@ curl -X 'POST'
 ```
 
 You will get tool call deltas in the server-side events chunks, with this format:
+
 ```
 data: {...,"choices":[{"index":0,"delta":{"role":"assistant","content":""}}],...,"object":"chat.completion.chunk"}
 data: {...,"choices":[{"index":0,"delta":{"role":"assistant","tool_calls":[{"index":0,"id":"chatcmpl-tool-e41bfee4ae1346bbbb4336061037e2b5","type":"function","function":{"name":"get_current_weather","arguments":""}}]}}],...,"object":"chat.completion.chunk"}
@@ -699,19 +711,18 @@ final_tool_calls = [v for (k, v) in sorted(final_tool_calls_dict.items())]
 
 ### Parallel tool calls
 
-Some models are able to generate multiple tool calls in one round (see the time-tracking tutorial above for an example).
-To control this behavior, the OpenAI specification allows to pass a `parallel_tool_calls` boolean parameter.
+Some models are able to generate multiple tool calls in one round (see the time-tracking tutorial above for an example). To control this behavior, the OpenAI specification allows to pass a `parallel_tool_calls` boolean parameter.
 
-If `false`, the model can only generate one tool call at most.
-This case is currently not supported by AI Endpoints.
+If `false`, the model can only generate one tool call at most. This case is currently not supported by AI Endpoints.
 
 If you need your system to process only one tool call at a time, or if the model you are using doesn't support multiple tool calls, we suggest you pick the first one, process it, and call the model again.
 
-Please note that LLaMa models don't support multiple tool calls between users and assistants messages.
+Please note that LLaMa models do not support multiple tool calls between users and assistants messages.
 
 ### Prompting & additional parameters
 
 Some additional considerations regarding prompts and model parameters:
+
 - Most models tend to perform better when using lower temperature for function calling.
 - The use of a system prompt is recommended, to ground the model into using the tools at its disposal. Whether a system prompt is defined or not, a description of the tools will usually be included in the tokens sent to the model (see the model chat template for more details).
 - If you know in advance that your model needs to call tools, use the `tool_choice=required` parameter to make sure it generates at least one tool call.
@@ -735,4 +746,3 @@ If you need training or technical assistance to implement our solutions, contact
 Please send us your questions, feedback and suggestions to improve the service:
 
 - On the OVHcloud [Discord server](https://discord.gg/ovhcloud)
-