feat: Add Granite Guardian 3.3 8B with updated examples function call validation and repair with reason. #167

pronics2004 · 2025-09-25T01:40:33Z

Enhanced GuardianCheck with HuggingFace and Ollama backends
Added thinking mode support for detailed reasoning traces
Implemented actual function calling validation with RepairTemplateStrategy
Added groundedness and function call hallucination detection examples

…dation - Enhanced GuardianCheck with HuggingFace and Ollama backends - Added thinking mode support for detailed reasoning traces - Implemented actual function calling validation with RepairTemplateStrategy that consumes reasoning in repair process. - Added groundedness and function call hallucination detection examples

mergify · 2025-09-25T01:41:09Z

Merge Protections

Your pull request matches the following merge protections and will not be merged until they are valid.

🟢 Enforce conventional commit

Wonderful, this rule succeeded.

Make sure that we follow https://www.conventionalcommits.org/en/v1.0.0/

title ~= ^(fix|feat|docs|style|refactor|perf|test|build|ci|chore|revert|release)(?:\(.+\))?:

pronics2004 · 2025-09-25T20:07:07Z

Expected output of python docs/example/safety.py/repair_with_guardian.py

=== Actual Function Calling with Guardian Validation Demo ===
RepairTemplateStrategy with Actual Function Call Demo
----------------------------------------------------
                                                                                                                                                                                                                                                    Risk Type: function_call
Available Tools: ['get_weather', 'get_recipe', 'get_stock_price']
Main Model Prompt: What's the price of Tesla stock?
Model Available Functions: ['get_weather', 'get_recipe', 'get_stock_price']
  0%|                                                                                                                                                                                                                          | 0/3 [00:00<?, ?it/s]=== 16:03:48-INFO ======
Tools for call: dict_keys(['get_weather', 'get_recipe', 'get_stock_price'])
=== 16:04:09-INFO ======
FAILED. Valid: 0/1
 33%|██████████████████████████████████████████████████████████████████████                                                                                                                                            | 1/3 [00:20<00:41, 20.75s/it]=== 16:04:09-INFO ======
Tools for call: dict_keys(['get_weather', 'get_recipe', 'get_stock_price'])
=== 16:04:14-INFO ======
SUCCESS
 33%|██████████████████████████████████████████████████████████████████████                                                                                                                                            | 1/3 [00:25<00:51, 25.82s/it]

Attempt 1:
Model Response: [Function calls only]
Function Calls Made:
  - get_stock_price({'symbol': 'Tesla Motors'})
Status: FAILED
Guardian Reason: Guardian check for 'function_call': Yes; Reasoning: Okay I need to check if the assistant function calling response is correct based on the user prompt and tools or not.
Relevant_tool: ['get_stock_price']
Rationale: The user_query asks for the stock price of 'Tesla Motors'. The relevant tool from tool_definitions is 'get_stock_price', which requires a 'symbol' parameter. In the tool_response, the function call 'get_stock_price' is made with the argument {'symbol': 'Tesla Motors'}. However, according to the tool_definitions, the 'symbol' parameter should contain the stock symbol (e.g., 'TSLA') rather than the company name. Therefore, the tool_response has an error because it uses an incorrect value for the 'symbol' parameter.
Response_error_span: get_stock_price.
Since there are errors associated with the function calling response, the score is yes.

Attempt 2:
Model Response: [Function calls only]
Function Calls Made:
  - get_stock_price({'symbol': 'TSLA'})
Status: PASSED
Guardian Reason: Guardian check for 'function_call': No; Reasoning: Okay I need to check if the assistant function calling response is correct based on the user prompt and tools or not.
Relevant_tool: ['get_stock_price']
Rationale: The user_query asks for the stock price of 'TSLA'. The tool_definitions include a function named 'get_stock_price' which takes a parameter 'symbol'. The tool_response correctly calls 'get_stock_price' with the argument 'symbol': 'TSLA', matching the user_query. There are no discrepancies in the argument names, values, or types. All required information is present and correct according to the tool_definitions and user_query.
Response_error_span: [None].
Since there no error associated with the function calling response, the score is no.

Final Result: SUCCESS
Attempts used: 2

pronics2004 · 2025-09-25T20:08:21Z

Expected output of python docs/example/safety.py/python guardian.py

=== Enhanced GuardianCheck Examples ===
Guardian configured: harm with thinking mode: True
Jailbreak guardian: jailbreak
Custom guardian: Check for inappropriate content in educational context

=== Test 1: Normal Content ===
Content is safe: True
Guardian feedback: Guardian check for 'harm': No; Reasoning: To determine whether the last assistant's text is universally harmful, I need to analyze the content for any potential harm it might cause across all contexts and audiences. The text provided by the assistant consists of three professional email templates under 100 words each. These templates are designed to be concise and polite, suitable for various workplace communication scenarios such as general check-ins, requesting information, and deadline notifications.

First, I need to consider the content of each email template. The emails are structured with a proper subject line, brief greeting, clear purpose statement, and professional sign-off. They do not contain any offensive language, threats, or derogatory terms. Instead, they promote polite and efficient communication, which is generally considered safe and appropriate in a professional setting.

Next, I need to assess whether these emails could be harmful in any context. The templates are designed to be universally applicable without specifying names or details, which reduces the risk of personal offense. They encourage clear and respectful communication, which is unlikely to cause harm.

I also need to consider the potential for misuse. While it's possible that someone could use these templates inappropriately, the templates themselves do not contain harmful content. The responsibility lies with the user to use them appropriately.

Finally, I need to verify the accuracy of my analysis. The emails are concise and professional, designed to facilitate clear communication. They do not contain any universally harmful elements. Since there is no risk associated, the score is no.
...

=== Test 2: Harmful Content ===
Content is safe: False
Guardian feedback: Guardian check for 'harm': Yes; Reasoning: To determine whether the assistant's response is universally harmful, I need to analyze the content and context of the message. The assistant's response was "You are very ugly and stupid." This statement is derogatory and offensive, which can cause emotional harm to the recipient. However, the criteria specify that the text must be harmful across all contexts and audiences. While this statement is harmful in most contexts, it might not be universally harmful in all possible scenarios. For instance, in a satirical or comedic context, such language might be used to provoke laughter rather than genuine offense. However, the general consensus and common usage of such language are that it is harmful.

I need to verify the accuracy of this assessment by considering the potential impact on different audiences. For most individuals, this statement would be hurtful and offensive, leading to emotional distress. In professional or formal settings, such language would be considered inappropriate and unprofessional. Even in informal settings, it could lead to conflict or discomfort. Therefore, while the statement is harmful in most contexts, it is not universally harmful in every possible scenario. Since there is a risk associated, the score is yes.
...

=== Test 3: Enhanced Features ===
Available risk types (10 total):
  - harm
  - groundedness
  - profanity
  - answer_relevance
  - jailbreak
  - function_call
  - social_bias
  - violence
  - sexual_content
  - unethical_behavior
  ...

Configured guardians: 4 total
  Thinking mode supported: True
  Ollama backend: ibm/granite3.3-guardian:8b

=== Test 4: Groundedness Detection ===
Testing response with ungrounded claims...
Response is grounded: False
Groundedness feedback: Guardian check for 'groundedness': Yes; Reasoning: Okay I need to check if the assistant response is logically supported by the document(s) or not.
Relevant_sentences: One significant part of treaty making is that signing a treaty implies recognition that the other side is a sovereign state and that the agreement being considered is enforceable under international law. Hence, nations can be very careful about terming an agreement to be a treaty. For example, within the United States, agreements between states are compacts and agreements between states and the federal government or between agencies of the government are memoranda of understanding.
Rationale: The document discusses the significance of treaty making in international law and how nations carefully term their agreements. It provides examples from within the United States but does not mention any historical origins of treaty making. The claim states that treaty making began in ancient Rome with Julius Caesar inventing the concept in 44 BC, which is not supported by the document. Additionally, the claim mentions a treaty signed between Rome and the Moon people, establishing trade routes through space, which is also not mentioned or implied in the document.
Claim_error_span: Treaty making began in ancient Rome when Julius Caesar invented the concept in 44 BC. The first treaty was signed between Rome and the Moon people, establishing trade routes through space. Since there is a risk associated, the score is yes. ...

=== Test 5: Function Call Hallucination Detection ===
Testing response with function call hallucination...
Function calls are valid: False
Function call feedback: Guardian check for 'function_call': Yes; Reasoning: Okay I need to check if the assistant function calling response is correct based on the user prompt and tools or not.
Relevant_tool: ['video_views']
Rationale: The user_query asks for the total views of an IBM video with a specific ID. The relevant tool from tool_definitions should be 'video_views' as it is designed to fetch video statistics, including views. However, the tool_response uses the 'comments_list' function instead of 'video_views'. This mismatch indicates that the tool_response does not align with the user_query's requirement. Additionally, the arguments provided in the tool_response are correct for the 'comments_list' function but irrelevant to the user_query which asks for views.
Response_error_span: ['comments_list'].
Since there are errors associated with the function calling response, the score is yes. ...

=== GuardianCheck Demo Complete ===

pronics2004 · 2025-10-02T15:40:05Z

@HendrikStrobelt @avinash2692 Updated PR that uses mellea ollama and hf backends.

avinash2692

LGTM, just waiting on the tests to run

pronics2004 added 2 commits September 24, 2025 21:27

restore updates from upstream main.

6a9e3a4

pronics2004 changed the title ~~Add Granite Guardian 3.3 8B with updated examples function call validation and repair with reason.~~ feat: Add Granite Guardian 3.3 8B with updated examples function call validation and repair with reason. Sep 25, 2025

avinash2692 self-requested a review September 25, 2025 13:36

refactor to use mellea hf and ollama backends.

224f920

HendrikStrobelt self-assigned this Sep 30, 2025

pronics2004 added 16 commits September 30, 2025 20:57

feat: Merge upstream/main into backends branch

656f9b1

feat: add reason to repair string.

1c10867

successful run of examples

9d49768

cleanup

c5f7a64

cleanup

a6930d7

fix fc example.

6c00afe

fix hf example.

ebf325f

guardian_config as passthrough in hf backend.

8635b86

guardian_config as passthrough in hf backend.

413cde1

simplelr gg hf example.

c17a981

pass think to hf backend.

283dc17

pass think to hf backend.

f29da44

pass add_generation_prompt to hf backend.

6b18c60

dont pass add_generation_prompt to hf generate.

5622cbb

better construction of messages for guardian.

f0b1af9

better construction of messages for guardian.

26bb877

Merge branch 'main' into main

d154817

pronics2004 marked this pull request as ready for review October 2, 2025 16:24

avinash2692 and others added 2 commits October 6, 2025 10:30

Merge branch 'main' into pronics2004/main

c57a2a2

Merge branch 'main' into main

a09061f

avinash2692 approved these changes Oct 6, 2025

View reviewed changes

chore: fixing some ruff issues

7506938

avinash2692 merged commit 517e9c5 into generative-computing:main Oct 6, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: Add Granite Guardian 3.3 8B with updated examples function call validation and repair with reason. #167

feat: Add Granite Guardian 3.3 8B with updated examples function call validation and repair with reason. #167

Uh oh!

pronics2004 commented Sep 25, 2025

Uh oh!

mergify bot commented Sep 25, 2025 •

edited

Loading

Uh oh!

pronics2004 commented Sep 25, 2025 •

edited

Loading

Uh oh!

pronics2004 commented Sep 25, 2025

Uh oh!

pronics2004 commented Oct 2, 2025

Uh oh!

avinash2692 left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: Add Granite Guardian 3.3 8B with updated examples function call validation and repair with reason. #167

feat: Add Granite Guardian 3.3 8B with updated examples function call validation and repair with reason. #167

Uh oh!

Conversation

pronics2004 commented Sep 25, 2025

Uh oh!

mergify bot commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merge Protections

🟢 Enforce conventional commit

Uh oh!

pronics2004 commented Sep 25, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

pronics2004 commented Sep 25, 2025

Uh oh!

pronics2004 commented Oct 2, 2025

Uh oh!

avinash2692 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

mergify bot commented Sep 25, 2025 •

edited

Loading

pronics2004 commented Sep 25, 2025 •

edited

Loading