Skip to content

Commit 38d2a97

Browse files
committed
feat: add accuracy patch to behave scenarios
1 parent 35cd601 commit 38d2a97

File tree

7 files changed

+208
-7
lines changed

7 files changed

+208
-7
lines changed

CHANGELOG.rst

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -6,13 +6,16 @@ v3.6.0
66

77
*Release date: In development*
88

9-
- Removed xcuitest deprecated get_window_size() method and replaced it with get_window_rect() in all mobile actions
9+
- Remove xcuitest deprecated `get_window_size` method and replaced it with `get_window_rect` in all mobile actions
1010
- Add text comparison methods based on AI libraries. To use them, install the `ai` extra dependency:
1111

1212
.. code:: console
1313
1414
$ pip install toolium[ai]
1515
16+
- Add accuracy tag to behave scenarios using `@accuracy_<percent>_<retries>`, e.g. `@accuracy_80_10` for 80%
17+
accuracy with 10 retries
18+
1619
v3.5.0
1720
------
1821

docs/ai_utils.rst

Lines changed: 28 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -88,3 +88,31 @@ you have (direct OpenAI access or Azure OpenAI):
8888
AZURE_OPENAI_API_KEY=<your_api_key>
8989
AZURE_OPENAI_ENDPOINT=<your_endpoint>
9090
OPENAI_API_VERSION=<your_api_version>
91+
92+
93+
Accuracy tag for Behave scenarios
94+
---------------------------------
95+
96+
You can use accuracy tags in your Behave scenarios to specify the desired accuracy level and number of retries for
97+
scenarios that involve AI-generated content. The accuracy tag follows the format `@accuracy_<percent>_<retries>`,
98+
where `<percent>` is the desired accuracy percentage (0-100) and `<retries>` is the number of retries to achieve that
99+
accuracy. For example, `@accuracy_80_10` indicates that the scenario should achieve at least 80% accuracy retrying the
100+
scenario execution 10 times.
101+
102+
.. code-block:: bash
103+
104+
@accuracy_80_10
105+
Scenario: Validate AI-generated response accuracy
106+
Given the AI model generates a response
107+
When the user sends a message
108+
Then the AI response should be accurate
109+
110+
When a scenario is tagged with an accuracy tag, Toolium will automatically execute the scenario multiple times. If the
111+
scenario does not meet the specified accuracy after the given number of retries, it will be marked as failed.
112+
113+
Other examples of accuracy tags:
114+
- `@accuracy_percent_85_retries_10`: 85% accuracy, 10 retries
115+
- `@accuracy_percent_75`: 75% accuracy, default 10 retries
116+
- `@accuracy_90_5`: 90% accuracy, 5 retries
117+
- `@accuracy_80`: 80% accuracy, default 10 retries
118+
- `@accuracy`: default 90% accuracy, 10 retries

docs/bdd_integration.rst

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -70,14 +70,18 @@ Toolium defines three tags to configure driver:
7070
* :code:`@reset_driver`: identifies a scenario that should not reuse the driver. The browser will be closed and reopen before this test.
7171
* :code:`@no_driver`: identifies a scenario or feature that should not start the driver, typically in API tests.
7272

73-
And other scenario tags to configure Appium tests:
73+
It also supports other scenario tags to configure Appium tests:
7474

7575
* :code:`@no_reset_app`: mobile app will not be reset before test (i.e. no-reset Appium capability is set to true)
7676
* :code:`@reset_app`: mobile app will be reset before test (i.e. no-reset and full-reset Appium capabilities are set to false)
7777
* :code:`@full_reset_app`: mobile app will be full reset before test (i.e. full-reset Appium capability is set to true)
7878
* :code:`@android_only`: identifies a scenario that should only be executed in Android
7979
* :code:`@ios_only`: identifies a scenario that should only be executed in iOS
8080

81+
And also supports accuracy tag for AI related scenarios:
82+
83+
* :code:`@accuracy_<percent>_<retries>`: identifies a scenario that should achieve at least `<percent>` accuracy retrying up to `<retries>` times (view :ref:`Accuracy tag for Behave scenarios <ai_utils>` for more details)
84+
8185
Behave - Dynamic Environment
8286
----------------------------
8387

toolium/behave/environment.py

Lines changed: 7 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -23,13 +23,14 @@
2323

2424
from behave.api.async_step import use_or_create_async_context
2525

26-
from toolium.utils import dataset
26+
from toolium.behave.env_utils import DynamicEnvironment
2727
from toolium.config_files import ConfigFiles
2828
from toolium.driver_wrapper import DriverWrappersPool
2929
from toolium.jira import add_jira_status, change_all_jira_status, save_jira_conf
30-
from toolium.visual_test import VisualTest
3130
from toolium.pageelements import PageElement
32-
from toolium.behave.env_utils import DynamicEnvironment
31+
from toolium.utils import dataset
32+
from toolium.utils.ai_utils.accuracy import patch_feature_scenarios_with_accuracy
33+
from toolium.visual_test import VisualTest
3334

3435

3536
def before_all(context):
@@ -86,6 +87,9 @@ def before_feature(context, feature):
8687
context.feature_storage = dict()
8788
context.storage = collections.ChainMap(context.feature_storage, context.run_storage)
8889

90+
# Patch scenarios when accuracy tags are present
91+
patch_feature_scenarios_with_accuracy(context, feature)
92+
8993
# Behave dynamic environment
9094
context.dyn_env.get_steps_from_feature_description(feature.description)
9195
context.dyn_env.execute_before_feature_steps(context)
Lines changed: 40 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,40 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
Copyright 2025 Telefónica Innovación Digital, S.L.
4+
This file is part of Toolium.
5+
6+
Licensed under the Apache License, Version 2.0 (the "License");
7+
you may not use this file except in compliance with the License.
8+
You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing, software
13+
distributed under the License is distributed on an "AS IS" BASIS,
14+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
See the License for the specific language governing permissions and
16+
limitations under the License.
17+
"""
18+
19+
import pytest
20+
21+
from toolium.utils.ai_utils.accuracy import get_accuracy_and_retries_from_tags
22+
23+
24+
accuracy_tags_examples = (
25+
(['accuracy'], {'accuracy': 0.9, 'retries': 10}),
26+
(['accuracy_85'], {'accuracy': 0.85, 'retries': 10}),
27+
(['accuracy_percent_80'], {'accuracy': 0.8, 'retries': 10}),
28+
(['accuracy_75_5'], {'accuracy': 0.75, 'retries': 5}),
29+
(['accuracy_percent_70_retries_3'], {'accuracy': 0.7, 'retries': 3}),
30+
(['other_tag', 'accuracy_95_15'], {'accuracy': 0.95, 'retries': 15}),
31+
(['no_accuracy_tag'], None),
32+
(['accuracy_85', 'accuracy_95_15'], {'accuracy': 0.85, 'retries': 10}),
33+
([], None),
34+
)
35+
36+
37+
@pytest.mark.parametrize('tags, expected_accuracy_data', accuracy_tags_examples)
38+
def test_get_accuracy_and_retries_from_tags(tags, expected_accuracy_data):
39+
accuracy_data = get_accuracy_and_retries_from_tags(tags)
40+
assert accuracy_data == expected_accuracy_data

toolium/utils/ai_utils/accuracy.py

Lines changed: 122 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,122 @@
1+
# -*- coding: utf-8 -*-
2+
"""
3+
Copyright 2025 Telefónica Innovación Digital, S.L.
4+
This file is part of Toolium.
5+
6+
Licensed under the Apache License, Version 2.0 (the "License");
7+
you may not use this file except in compliance with the License.
8+
You may obtain a copy of the License at
9+
10+
http://www.apache.org/licenses/LICENSE-2.0
11+
12+
Unless required by applicable law or agreed to in writing, software
13+
distributed under the License is distributed on an "AS IS" BASIS,
14+
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
15+
See the License for the specific language governing permissions and
16+
limitations under the License.
17+
"""
18+
19+
import functools
20+
import re
21+
from behave.model import ScenarioOutline
22+
from behave.model_core import Status
23+
24+
25+
def get_accuracy_and_retries_from_tags(tags):
26+
"""
27+
Extract accuracy and retries values from accuracy tag using regex.
28+
Examples of valid tags:
29+
- accuracy
30+
- accuracy_90
31+
- accuracy_percent_90
32+
- accuracy_90_10
33+
- accuracy_percent_90_retries_10
34+
35+
:param tags: behave tags
36+
:return: dict with 'accuracy' and 'retries' keys if tag matches, None otherwise
37+
"""
38+
accuracy_regex = re.compile(r'^accuracy(?:_(?:percent_)?(\d+)(?:_retries_(\d+)|_(\d+))?)?', re.IGNORECASE)
39+
for tag in tags:
40+
match = accuracy_regex.search(tag)
41+
if match:
42+
# Default values: 90% accuracy, 10 retries
43+
accuracy_percent = (int(match.group(1)) / 100.0) if match.group(1) else 0.9
44+
# Check if retries is in group 2 (accuracy_percent_90_retries_10) or group 3 (accuracy_90_10)
45+
retries = int(match.group(2)) if match.group(2) else (int(match.group(3)) if match.group(3) else 10)
46+
return {'accuracy': accuracy_percent, 'retries': retries}
47+
return None
48+
49+
50+
def patch_scenario_with_accuracy(context, scenario, accuracy=0.9, retries=10):
51+
"""Monkey-patches :func:`~behave.model.Scenario.run()` to execute multiple times and calculate the accuracy of the
52+
results.
53+
54+
This is helpful when the test is flaky due to unreliable test infrastructure or when the application under test is
55+
AI based and its responses may vary slightly.
56+
57+
:param context: behave context
58+
:param scenario: Scenario or ScenarioOutline to patch
59+
:param accuracy: Minimum accuracy required to consider the scenario as passed
60+
:param retries: Number of times the scenario will be executed
61+
"""
62+
def scenario_run_with_accuracy(context, scenario_run, scenario, *args, **kwargs):
63+
# Execute the scenario multiple times and count passed executions
64+
passed_executions = 0
65+
for retry in range(1, retries+1):
66+
if not scenario_run(*args, **kwargs):
67+
passed_executions += 1
68+
status = "PASSED"
69+
else:
70+
status = "FAILED"
71+
print(f"ACCURACY SCENARIO {status}: retry {retry}/{retries}")
72+
context.logger.info(f"Accuracy scenario {status} (retry {retry}/{retries})")
73+
74+
# Calculate scenario accuracy
75+
scenario_accuracy = passed_executions / retries
76+
has_passed = scenario_accuracy >= accuracy
77+
final_status = 'PASSED' if has_passed else 'FAILED'
78+
print(f"\nACCURACY SCENARIO {final_status}: {retries} retries, accuracy {scenario_accuracy} >= {accuracy}")
79+
final_message = (f"Accuracy scenario {final_status} after {retries} retries with"
80+
f" accuracy {scenario_accuracy} >= {accuracy}")
81+
82+
# Set final scenario status
83+
if has_passed:
84+
context.logger.info(final_message)
85+
scenario.set_status(Status.passed)
86+
else:
87+
context.logger.error(final_message)
88+
scenario.set_status(Status.failed)
89+
return not has_passed # Run method returns true when failed
90+
91+
scenario_run = scenario.run
92+
scenario.run = functools.partial(scenario_run_with_accuracy, context, scenario_run, scenario)
93+
94+
95+
def patch_scenario_from_tags(context, scenario):
96+
"""Patch scenario with accuracy method when accuracy tags are present in scenario.
97+
98+
:param context: behave context
99+
:param scenario: behave scenario
100+
"""
101+
accuracy_data = get_accuracy_and_retries_from_tags(scenario.effective_tags)
102+
if accuracy_data:
103+
patch_scenario_with_accuracy(context, scenario, accuracy=accuracy_data['accuracy'],
104+
retries=accuracy_data['retries'])
105+
106+
107+
def patch_feature_scenarios_with_accuracy(context, feature):
108+
"""Patch feature scenarios with accuracy method when accuracy tags are present in scenarios.
109+
110+
:param context: behave context
111+
:param feature: behave feature
112+
"""
113+
try:
114+
for scenario in feature.scenarios:
115+
if isinstance(scenario, ScenarioOutline):
116+
for outline_scenario in scenario.scenarios:
117+
patch_scenario_from_tags(context, outline_scenario)
118+
else:
119+
patch_scenario_from_tags(context, scenario)
120+
except Exception as e:
121+
# Log error but do not fail the execution to avoid errors in before feature method
122+
context.logger.error(f"Error applying accuracy policy: {e}")

toolium/utils/ai_utils/text_similarity.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -106,8 +106,8 @@ def get_text_similarity_with_openai(text, expected_text, azure=False):
106106
explanation = response['explanation']
107107
except (KeyError, ValueError, TypeError) as e:
108108
raise ValueError(f"Unexpected response format from OpenAI: {response}") from e
109-
logger.info(f"OpenAI LLM similarity: {similarity} between '{text}' and '{expected_text}'."
110-
f" LLM explanation: {explanation}")
109+
logger.info(f"OpenAI LLM similarity: {similarity} between '{text}' and '{expected_text}'")
110+
logger.info(f"OpenAI LLM explanation: {explanation}")
111111
return similarity
112112

113113

0 commit comments

Comments
 (0)