Add content detector by gkumbhat · Pull Request #21 · foundation-model-stack/vllm-detector-adapter

gkumbhat · 2025-02-06T16:19:38Z

Changes

Add support for "content type" detector endpoint running on top of generative guardrail models.

Examples

Request (Llama-guard)

curl -X 'POST'   'http://localhost:8000/api/v1/text/contents' -H 'accept: application/json'   -H 'detector-id: llama-guard'   -H 'Content-Type: application/json'   -d '{ "contents": ["If you are thinking about skipping out on filing your taxes, there are a few ways to avoid getting caught.", "this is a fairly good sentence."], "detector_params": {}}'

Note: above sentence taken from dataset examples of llama-guard

Response

[
   [
      {
         "detection" : "unsafe",
         "detection_type" : "risk",
         "end" : 106,
         "score" : 0.531209409236908,
         "start" : 0,
         "text" : "If you are thinking about skipping out on filing your taxes, there are a few ways to avoid getting caught."
      }
   ],
   [
      {
         "detection" : "safe",
         "detection_type" : "risk",
         "end" : 31,
         "score" : 0.00359360268339515,
         "start" : 0,
         "text" : "this is a fairly good sentence."
      }
   ]
]

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

evaline-ju

A couple high level things before looking in a lot of detail:

Looks like there’s a rebase needed
I think it’ll be good/helpful to have an example call [with either granite guardian or llama guard] in the PR to see expected usage - from what I understand so far, I'm a bit concerned about the choices for different parts of the original llama guard message.
It’ll likely be helpful to have a couple more unit tests, specifically for the common changes in base.py (e.g. preprocess_request) and protocol.py, and since this also applies to granite guardian, it might be good to have some corresponding .content_analysis tests there?

tests/generative_detectors/test_llama_guard.py

evaline-ju · 2025-02-13T18:23:49Z

tests/generative_detectors/test_llama_guard.py

So this would present each of the parts of the original message as a different "choice"? This seems confusing from a user POV, since usually choices are presented as independent alternates to each other.

this is purely internal and output of a private function (part of the reason making it private), the user of "content_detector" endpoint won't be affected. To them it will just come out as different label

I think I understand better, that the independent choices aren't seen by the end user, though I think the "different label" is still what is confusing to me, since they are all presented at the same level i.e. in the added example

{ "detection" : "unsafe", "detection_type" : "risk", "end" : 106, "score" : 0.531209409236908, "start" : 0, "text" : "If you are thinking about skipping out on filing your taxes, there are a few ways to avoid getting caught." }, { "detection" : "S2", "detection_type" : "risk", "end" : 106, "score" : 0.531209409236908, "start" : 0, "text" : "If you are thinking about skipping out on filing your taxes, there are a few ways to avoid getting caught." }

unsafe now seems disjoint/independent of S2, whereas those that are familiar with llama guard would know those are related, more like an "unsafe - S2" concept

hmm. true, thats an interesting point..

We could merge them together as well🤔 But the problem is, then they become separate from what Llamaguard document, like in a string matching way

vllm_detector_adapter/generative_detectors/base.py

Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

evaline-ju

Thanks for the additional tests and example! A couple additional q's/comments

evaline-ju · 2025-02-19T18:52:24Z

code-of-conduct.md

It seems something may have gone unexpectedly with the rebase that some of these files are showing up in the diff and there's a conflict still showing

tests/generative_detectors/test_base.py

evaline-ju · 2025-02-19T18:54:23Z

tests/generative_detectors/test_base.py

I wonder if to avoid confusion with granite tests, we just change this to completion_response, since these are tests for the base class and meant to be generic

hmm. true, that is doable

vllm_detector_adapter/protocol.py

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

…ng issue Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

… completion req Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

evaline-ju

Mostly comments on some docstring/comments aesthetics and a few test questions remaining

vllm_detector_adapter/protocol.py

tests/generative_detectors/test_llama_guard.py

tests/generative_detectors/test_base.py

vllm_detector_adapter/generative_detectors/base.py

evaline-ju · 2025-02-21T16:47:02Z

vllm_detector_adapter/generative_detectors/base.py

-        return DetectionResponse.from_chat_completion_response(
-            chat_response, scores, self.DETECTION_TYPE
-        )
+        return chat_response, scores, self.DETECTION_TYPE


Could we update the docstring L176 since it no longer returns DetectionResponse?

vllm_detector_adapter/generative_detectors/base.py

evaline-ju · 2025-02-21T16:49:22Z

vllm_detector_adapter/generative_detectors/llama_guard.py

same note about response vs. responses here as before

evaline-ju · 2025-02-21T16:49:53Z

vllm_detector_adapter/generative_detectors/llama_guard.py

+        # atleast for Llama-Guard-3 (latest at the time of writing)
+
+        # In this function, we will basically remove those "safety" category from output and later on
+        # move them to evidences.


fyi created follow-up story here #25

vllm_detector_adapter/generative_detectors/llama_guard.py

Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav Kumbhat <kumbhat.gaurav@gmail.com>

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

evaline-ju

LGTM!

gkumbhat added 5 commits February 3, 2025 11:33

🚧 wip content detection

8516b8f

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🐛 Fix response format conversion

7e6d3f0

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

✨ Add llama-guard separate safety category in output

865f3e9

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

⚡ Small cleanups

f3b27fc

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

:white_check_marks: Add test for content detection

dbc4241

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat requested a review from evaline-ju February 13, 2025 18:01

evaline-ju reviewed Feb 13, 2025

View reviewed changes

Apply suggestions from code review

792c81d

Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat force-pushed the add_content_detector branch from d70b5cf to 75ad74c Compare February 19, 2025 17:40

evaline-ju reviewed Feb 19, 2025

View reviewed changes

gkumbhat added 2 commits February 19, 2025 14:53

Merge branch 'main' into add_content_detector

ef85449

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

:white_check_marks: Push test for base class and fix response massagi…

f9388e7

…ng issue Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat force-pushed the add_content_detector branch from 75ad74c to f9388e7 Compare February 19, 2025 20:55

gkumbhat and others added 3 commits February 20, 2025 16:03

Apply suggestions from code review

1368290

Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

♻️ Remove llama-guard category output as separate label in content type

ef39154

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

🚚 Rename completion fixture

81675c1

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

gkumbhat force-pushed the add_content_detector branch from d4ab934 to 81675c1 Compare February 20, 2025 22:04

:white_check_marks: Add test for protocol and remove extra field from…

f5267fc

… completion req Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

evaline-ju mentioned this pull request Feb 21, 2025

Provide categories for Llama Guard as metadata #25

Closed

evaline-ju reviewed Feb 21, 2025

View reviewed changes

evaline-ju linked an issue Feb 21, 2025 that may be closed by this pull request

Explore support for content endpoint of detectors API #15

Closed

gkumbhat and others added 2 commits February 21, 2025 10:58

Apply suggestions from code review

199de11

Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav Kumbhat <kumbhat.gaurav@gmail.com>

🎨✅ Fix doc strings and scores in llama-test

308f2e9

Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>

evaline-ju approved these changes Feb 21, 2025

View reviewed changes

gkumbhat merged commit 54050ae into foundation-model-stack:main Feb 21, 2025
3 checks passed

gkumbhat deleted the add_content_detector branch February 21, 2025 17:42

Conversation

gkumbhat commented Feb 6, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Changes

Examples

Request (Llama-guard)

Response

Uh oh!

evaline-ju left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

evaline-ju left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

evaline-ju left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

evaline-ju Feb 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

evaline-ju left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

gkumbhat commented Feb 6, 2025 •

edited

Loading

evaline-ju Feb 21, 2025 •

edited

Loading