Add content detector#21
Conversation
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
evaline-ju
left a comment
There was a problem hiding this comment.
A couple high level things before looking in a lot of detail:
- Looks like there’s a rebase needed
- I think it’ll be good/helpful to have an example call [with either granite guardian or llama guard] in the PR to see expected usage - from what I understand so far, I'm a bit concerned about the
choicesfor different parts of the original llama guard message. - It’ll likely be helpful to have a couple more unit tests, specifically for the common changes in base.py (e.g. preprocess_request) and protocol.py, and since this also applies to granite guardian, it might be good to have some corresponding
.content_analysistests there?
There was a problem hiding this comment.
So this would present each of the parts of the original message as a different "choice"? This seems confusing from a user POV, since usually choices are presented as independent alternates to each other.
There was a problem hiding this comment.
this is purely internal and output of a private function (part of the reason making it private), the user of "content_detector" endpoint won't be affected. To them it will just come out as different label
There was a problem hiding this comment.
I think I understand better, that the independent choices aren't seen by the end user, though I think the "different label" is still what is confusing to me, since they are all presented at the same level i.e. in the added example
{
"detection" : "unsafe",
"detection_type" : "risk",
"end" : 106,
"score" : 0.531209409236908,
"start" : 0,
"text" : "If you are thinking about skipping out on filing your taxes, there are a few ways to avoid getting caught."
},
{
"detection" : "S2",
"detection_type" : "risk",
"end" : 106,
"score" : 0.531209409236908,
"start" : 0,
"text" : "If you are thinking about skipping out on filing your taxes, there are a few ways to avoid getting caught."
}
unsafe now seems disjoint/independent of S2, whereas those that are familiar with llama guard would know those are related, more like an "unsafe - S2" concept
There was a problem hiding this comment.
hmm. true, thats an interesting point..
We could merge them together as well🤔 But the problem is, then they become separate from what Llamaguard document, like in a string matching way
Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
d70b5cf to
75ad74c
Compare
evaline-ju
left a comment
There was a problem hiding this comment.
Thanks for the additional tests and example! A couple additional q's/comments
There was a problem hiding this comment.
It seems something may have gone unexpectedly with the rebase that some of these files are showing up in the diff and there's a conflict still showing
There was a problem hiding this comment.
I wonder if to avoid confusion with granite tests, we just change this to completion_response, since these are tests for the base class and meant to be generic
There was a problem hiding this comment.
hmm. true, that is doable
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
…ng issue Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
75ad74c to
f9388e7
Compare
Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
d4ab934 to
81675c1
Compare
… completion req Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
evaline-ju
left a comment
There was a problem hiding this comment.
Mostly comments on some docstring/comments aesthetics and a few test questions remaining
| return DetectionResponse.from_chat_completion_response( | ||
| chat_response, scores, self.DETECTION_TYPE | ||
| ) | ||
| return chat_response, scores, self.DETECTION_TYPE |
There was a problem hiding this comment.
Could we update the docstring L176 since it no longer returns DetectionResponse?
There was a problem hiding this comment.
same note about response vs. responses here as before
| # atleast for Llama-Guard-3 (latest at the time of writing) | ||
|
|
||
| # In this function, we will basically remove those "safety" category from output and later on | ||
| # move them to evidences. |
Co-authored-by: Evaline Ju <69598118+evaline-ju@users.noreply.github.com> Signed-off-by: Gaurav Kumbhat <kumbhat.gaurav@gmail.com>
Signed-off-by: Gaurav-Kumbhat <Gaurav.Kumbhat@ibm.com>
Changes
Examples
Request (Llama-guard)
Note: above sentence taken from dataset examples of llama-guard
Response
[ [ { "detection" : "unsafe", "detection_type" : "risk", "end" : 106, "score" : 0.531209409236908, "start" : 0, "text" : "If you are thinking about skipping out on filing your taxes, there are a few ways to avoid getting caught." } ], [ { "detection" : "safe", "detection_type" : "risk", "end" : 31, "score" : 0.00359360268339515, "start" : 0, "text" : "this is a fairly good sentence." } ] ]