Skip to content

Commit 5af4e77

Browse files
jsondaicopybara-github
authored andcommitted
feat: GenAI SDK client(evals) - Add Generate Rubrics API config and internal method
PiperOrigin-RevId: 777655592
1 parent 6e5c421 commit 5af4e77

File tree

3 files changed

+598
-0
lines changed

3 files changed

+598
-0
lines changed
Lines changed: 170 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,170 @@
1+
# Copyright 2025 Google LLC
2+
#
3+
# Licensed under the Apache License, Version 2.0 (the "License");
4+
# you may not use this file except in compliance with the License.
5+
# You may obtain a copy of the License at
6+
#
7+
# http://www.apache.org/licenses/LICENSE-2.0
8+
#
9+
# Unless required by applicable law or agreed to in writing, software
10+
# distributed under the License is distributed on an "AS IS" BASIS,
11+
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12+
# See the License for the specific language governing permissions and
13+
# limitations under the License.
14+
#
15+
# pylint: disable=protected-access,bad-continuation,missing-function-docstring
16+
17+
18+
from tests.unit.vertexai.genai.replays import pytest_helper
19+
from vertexai._genai import types
20+
21+
_TEST_RUBRIC_GENERATION_PROMPT = """SPECIAL INSTRUCTION: think silently. Silent thinking token budget: 16384.
22+
23+
You are a teacher who is responsible for scoring a student\'s response to a prompt. In order to score that response, you must write down a rubric for each prompt. That rubric states what properties the response must have in order to be a valid response to the prompt. Properties are weighted by importance via the "importance" field.
24+
25+
Rubric requirements:
26+
- Properties either exist or don\'t exist.
27+
- Properties can be either implicit in the prompt or made explicit by the prompt.
28+
- Make sure to always include the correct expected human language as one of the properties. If the prompt asks for code, the programming language should be covered by a separate property.
29+
- The correct expected language may be explicit in the text of the prompt but is usually simply implicit in the prompt itself.
30+
- Be as comprehensive as possible with the list of properties in the rubric.
31+
- All properties in the rubric must be in English, regardless of the language of the prompt.
32+
- Rubric properties should not specify correct answers in their descriptions, e.g. to math and factoid questions if the prompt calls for such an answer. Rather, it should check that the response contains an answer and optional supporting evidence if relevant, and assume some other process will later validate correctness. A rubric property should however call out any false premises present in the prompt.
33+
34+
About importance:
35+
- Most properties will be of medium importance by default.
36+
- Properties of high importance are critical to be fulfilled in a good response.
37+
- Properties of low importance are considered optional or supplementary nice-to-haves.
38+
39+
You will see prompts in many different languages, not just English. For each prompt you see, you will write down this rubric in JSON format.
40+
41+
IMPORTANT: Never respond to the prompt given. Only write a rubric.
42+
43+
Example:
44+
What is the tallest building in the world?
45+
46+
```json
47+
{
48+
"criteria":[
49+
{
50+
"rubric_id": "00001",
51+
"property": "The response is in English.",
52+
"type": "LANGUAGE:PRIMARY_RESPONSE_LANGUAGE",
53+
"importance": "high"
54+
},
55+
{
56+
"rubric_id": "00002",
57+
"property": "Contains the name of the tallest building in the world.",
58+
"type": "QA_ANSWER:FACTOID",
59+
"importance": "high"
60+
},
61+
{
62+
"rubric_id": "00003",
63+
"property": "Contains the exact height of the tallest building.",
64+
"type": "QA_SUPPORTING_EVIDENCE:HEIGHT",
65+
"importance": "low"
66+
},
67+
{
68+
"rubric_id": "00004",
69+
"property": "Contains the location of the tallest building.",
70+
"type": "QA_SUPPORTING_EVIDENCE:LOCATION",
71+
"importance": "low"
72+
},
73+
...
74+
]
75+
}
76+
```
77+
78+
Write me a letter to my HOA asking them to reconsider the fees they are asking me to pay because I haven\'t mowed my lawn on time. I have been very busy at work.
79+
```json
80+
{
81+
"criteria": [
82+
{
83+
"rubric_id": "00001",
84+
"property": "The response is in English.",
85+
"type": "LANGUAGE:PRIMARY_RESPONSE_LANGUAGE",
86+
"importance": "high"
87+
},
88+
{
89+
"rubric_id": "00002",
90+
"property": "The response is formatted as a letter.",
91+
"type": "FORMAT_REQUIREMENT:FORMAL_LETTER",
92+
"importance": "medium"
93+
},
94+
{
95+
"rubric_id": "00003",
96+
"property": "The letter is addressed to the Homeowners Association (HOA).",
97+
"type": "CONTENT_REQUIREMENT:ADDRESSEE",
98+
"importance": "medium"
99+
},
100+
{
101+
"rubric_id": "00004",
102+
"property": "The letter explains that the sender has not mowed their lawn on time.",
103+
"type": "CONTENT_REQUIREMENT:BACKGROUND_CONTEXT:TARDINESS",
104+
"importance": "medium"
105+
},
106+
{
107+
"rubric_id": "00005",
108+
"property": "The letter provides a reason for not mowing the lawn, specifically being busy at work.",
109+
"type": "CONTENT_REQUIREMENT:EXPLANATION:EXCUSE:BUSY",
110+
"importance": "medium"
111+
},
112+
{
113+
"rubric_id": "00006",
114+
"property": "The letter discusses that the sender has been in compliance until now.",
115+
"type": "OPTIONAL_CONTENT:SUPPORTING_EVIDENCE:COMPLIANCE",
116+
"importance": "low"
117+
},
118+
{
119+
"rubric_id": "00007",
120+
"property": "The letter requests that the HOA reconsider the fees associated with not mowing the lawn on time.",
121+
"type": "CONTENT_REQUIREMENT:REQUEST:FEE_WAIVER",
122+
"importance": "high"
123+
},
124+
{
125+
"rubric_id": "00008",
126+
"property": "The letter maintains a polite and respectful tone.",
127+
"type": "CONTENT_REQUIREMENT:FORMALITY:FORMAL",
128+
"importance": "high"
129+
},
130+
{
131+
"rubric_id": "00009",
132+
"property": "The letter includes a closing (e.g., \'Sincerely\') and the sender\'s name.",
133+
"type": "CONTENT_REQUIREMENT:SIGNATURE",
134+
"importance": "medium"
135+
}
136+
]
137+
}
138+
```
139+
140+
Now write a rubric for the following user prompt. Remember to write only the rubric, NOT response to the prompt.
141+
142+
User prompt:
143+
{prompt}"""
144+
145+
146+
def test_internal_method_generate_rubrics(client):
147+
"""Tests the internal _generate_rubrics method."""
148+
test_contents = [
149+
types.Content(
150+
parts=[
151+
types.Part(
152+
text="Generate a short story about a friendly dragon.",
153+
),
154+
],
155+
)
156+
]
157+
response = client.evals._generate_rubrics(
158+
contents=test_contents,
159+
rubric_generation_spec=types.RubricGenerationSpec(
160+
prompt_template=_TEST_RUBRIC_GENERATION_PROMPT,
161+
),
162+
)
163+
assert len(response.generated_rubrics) >= 1
164+
165+
166+
pytestmark = pytest_helper.setup(
167+
file=__file__,
168+
globals_for_file=globals(),
169+
test_method="evals._generate_rubrics",
170+
)

vertexai/_genai/evals.py

Lines changed: 188 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -664,6 +664,65 @@ def _EvaluateInstancesRequestParameters_to_vertex(
664664
return to_object
665665

666666

667+
def _RubricGenerationSpec_to_vertex(
668+
from_object: Union[dict[str, Any], object],
669+
parent_object: Optional[dict[str, Any]] = None,
670+
) -> dict[str, Any]:
671+
to_object: dict[str, Any] = {}
672+
if getv(from_object, ["prompt_template"]) is not None:
673+
setv(
674+
to_object,
675+
["promptTemplate"],
676+
getv(from_object, ["prompt_template"]),
677+
)
678+
679+
if getv(from_object, ["generator_model_config"]) is not None:
680+
setv(
681+
to_object,
682+
["model_config"],
683+
getv(from_object, ["generator_model_config"]),
684+
)
685+
686+
if getv(from_object, ["rubric_content_type"]) is not None:
687+
setv(
688+
to_object,
689+
["rubricContentType"],
690+
getv(from_object, ["rubric_content_type"]),
691+
)
692+
693+
if getv(from_object, ["rubric_type_ontology"]) is not None:
694+
setv(
695+
to_object,
696+
["rubricTypeOntology"],
697+
getv(from_object, ["rubric_type_ontology"]),
698+
)
699+
700+
return to_object
701+
702+
703+
def _GenerateInstanceRubricsRequest_to_vertex(
704+
from_object: Union[dict[str, Any], object],
705+
parent_object: Optional[dict[str, Any]] = None,
706+
) -> dict[str, Any]:
707+
to_object: dict[str, Any] = {}
708+
if getv(from_object, ["contents"]) is not None:
709+
setv(to_object, ["contents"], getv(from_object, ["contents"]))
710+
711+
if getv(from_object, ["rubric_generation_spec"]) is not None:
712+
setv(
713+
to_object,
714+
["rubricGenerationSpec"],
715+
_RubricGenerationSpec_to_vertex(
716+
getv(from_object, ["rubric_generation_spec"]), to_object
717+
),
718+
)
719+
720+
if getv(from_object, ["config"]) is not None:
721+
setv(to_object, ["config"], getv(from_object, ["config"]))
722+
723+
return to_object
724+
725+
667726
def _EvaluateInstancesResponse_from_vertex(
668727
from_object: Union[dict[str, Any], object],
669728
parent_object: Optional[dict[str, Any]] = None,
@@ -790,6 +849,21 @@ def _EvaluateInstancesResponse_from_vertex(
790849
return to_object
791850

792851

852+
def _GenerateInstanceRubricsResponse_from_vertex(
853+
from_object: Union[dict[str, Any], object],
854+
parent_object: Optional[dict[str, Any]] = None,
855+
) -> dict[str, Any]:
856+
to_object: dict[str, Any] = {}
857+
if getv(from_object, ["generatedRubrics"]) is not None:
858+
setv(
859+
to_object,
860+
["generated_rubrics"],
861+
getv(from_object, ["generatedRubrics"]),
862+
)
863+
864+
return to_object
865+
866+
793867
class Evals(_api_module.BaseModule):
794868
def _evaluate_instances(
795869
self,
@@ -869,6 +943,62 @@ def _evaluate_instances(
869943
self._api_client._verify_response(return_value)
870944
return return_value
871945

946+
def _generate_rubrics(
947+
self,
948+
*,
949+
contents: list[genai_types.ContentOrDict],
950+
rubric_generation_spec: types.RubricGenerationSpecOrDict,
951+
config: Optional[types.RubricGenerationConfigOrDict] = None,
952+
) -> types.GenerateInstanceRubricsResponse:
953+
"""Generates rubrics for a given prompt."""
954+
955+
parameter_model = types._GenerateInstanceRubricsRequest(
956+
contents=contents,
957+
rubric_generation_spec=rubric_generation_spec,
958+
config=config,
959+
)
960+
961+
request_url_dict: Optional[dict[str, str]]
962+
if not self._api_client.vertexai:
963+
raise ValueError("This method is only supported in the Vertex AI client.")
964+
else:
965+
request_dict = _GenerateInstanceRubricsRequest_to_vertex(parameter_model)
966+
request_url_dict = request_dict.get("_url")
967+
if request_url_dict:
968+
path = ":generateInstanceRubrics".format_map(request_url_dict)
969+
else:
970+
path = ":generateInstanceRubrics"
971+
972+
query_params = request_dict.get("_query")
973+
if query_params:
974+
path = f"{path}?{urlencode(query_params)}"
975+
# TODO: remove the hack that pops config.
976+
request_dict.pop("config", None)
977+
978+
http_options: Optional[types.HttpOptions] = None
979+
if (
980+
parameter_model.config is not None
981+
and parameter_model.config.http_options is not None
982+
):
983+
http_options = parameter_model.config.http_options
984+
985+
request_dict = _common.convert_to_dict(request_dict)
986+
request_dict = _common.encode_unserializable_types(request_dict)
987+
988+
response = self._api_client.request("post", path, request_dict, http_options)
989+
990+
response_dict = "" if not response.body else json.loads(response.body)
991+
992+
if self._api_client.vertexai:
993+
response_dict = _GenerateInstanceRubricsResponse_from_vertex(response_dict)
994+
995+
return_value = types.GenerateInstanceRubricsResponse._from_response(
996+
response=response_dict, kwargs=parameter_model.model_dump()
997+
)
998+
999+
self._api_client._verify_response(return_value)
1000+
return return_value
1001+
8721002
def run(self) -> types.EvaluateInstancesResponse:
8731003
"""Evaluates an instance of a model.
8741004
@@ -1133,6 +1263,64 @@ async def _evaluate_instances(
11331263
self._api_client._verify_response(return_value)
11341264
return return_value
11351265

1266+
async def _generate_rubrics(
1267+
self,
1268+
*,
1269+
contents: list[genai_types.ContentOrDict],
1270+
rubric_generation_spec: types.RubricGenerationSpecOrDict,
1271+
config: Optional[types.RubricGenerationConfigOrDict] = None,
1272+
) -> types.GenerateInstanceRubricsResponse:
1273+
"""Generates rubrics for a given prompt."""
1274+
1275+
parameter_model = types._GenerateInstanceRubricsRequest(
1276+
contents=contents,
1277+
rubric_generation_spec=rubric_generation_spec,
1278+
config=config,
1279+
)
1280+
1281+
request_url_dict: Optional[dict[str, str]]
1282+
if not self._api_client.vertexai:
1283+
raise ValueError("This method is only supported in the Vertex AI client.")
1284+
else:
1285+
request_dict = _GenerateInstanceRubricsRequest_to_vertex(parameter_model)
1286+
request_url_dict = request_dict.get("_url")
1287+
if request_url_dict:
1288+
path = ":generateInstanceRubrics".format_map(request_url_dict)
1289+
else:
1290+
path = ":generateInstanceRubrics"
1291+
1292+
query_params = request_dict.get("_query")
1293+
if query_params:
1294+
path = f"{path}?{urlencode(query_params)}"
1295+
# TODO: remove the hack that pops config.
1296+
request_dict.pop("config", None)
1297+
1298+
http_options: Optional[types.HttpOptions] = None
1299+
if (
1300+
parameter_model.config is not None
1301+
and parameter_model.config.http_options is not None
1302+
):
1303+
http_options = parameter_model.config.http_options
1304+
1305+
request_dict = _common.convert_to_dict(request_dict)
1306+
request_dict = _common.encode_unserializable_types(request_dict)
1307+
1308+
response = await self._api_client.async_request(
1309+
"post", path, request_dict, http_options
1310+
)
1311+
1312+
response_dict = "" if not response.body else json.loads(response.body)
1313+
1314+
if self._api_client.vertexai:
1315+
response_dict = _GenerateInstanceRubricsResponse_from_vertex(response_dict)
1316+
1317+
return_value = types.GenerateInstanceRubricsResponse._from_response(
1318+
response=response_dict, kwargs=parameter_model.model_dump()
1319+
)
1320+
1321+
self._api_client._verify_response(return_value)
1322+
return return_value
1323+
11361324
async def batch_evaluate(
11371325
self,
11381326
*,

0 commit comments

Comments
 (0)