Skip to content

Commit f30c921

Browse files
authored
Flexible execution and easier analysis with autogenerated Marks (#2609)
#### Reference Issues/PRs <!--Example: Fixes #1234. See also #3456.--> #### What does this implement or fix? Currently our tests lack metadata to help quicly filtered out tests or do in depth analysis of existing tests: - how many tests we have for each one of storage types we test against - how many tests we have for each test type/level - unit, integration - how many test we have in different cross sections of marks With this PR several things are introduced: - dynamic assignments of marks based on physical structure of directories - from there we can obtain which tests are at what level - dynamic assignment of marks based on fixture usage - each test is marked with specific mark if certain storage fixture is used - lmdb, s3, real_s3 etc - dynamic assignment of marks based on library options like dynamic_schema, dynamic_strings etc, inclusing arctic encoding type that allows quick queries over our tests like: ``` pytest -s --co -m "(lmdb and unit) or (lmdb and integration)" ``` to obtain better understanding where and what tests we have. Those marks will be possible to be used further in test execution selection Overall this approach adds good benefit for adding with little effort important metadata to the tests. Further that metadata could be enhanced significantly when needed. Like taking info from external sources - like databases. xls. github to dynamically mark tests with certain properties, like flaky, quaranteen etc. In other words there is no longer need marks to be added and maintained all the time by the team. Effectivley marks now are combination of all those we add explicitly to the test and those assigned to a test by external resources. Most important files to review: conftest.py (here we do dynamic assignment) marking.py (small helper class for better handling of Marks on large scale projects) Additionally this PR introduces small additions to Marks namagement: - ability to assign many marks to a test on a single line - ability to group marks and avoidance of misspelling of marks through Mark and Marks classes As there is no way currently to obtain list of unique tests only a small cmd line utility is also available.: ``` $ . ../build_tooling/list_pytests.sh Usage: -bash <pytest_mark_expression> Example: -bash "pipeline and real_s3" $ . ../build_tooling/list_pytests.sh pipeline and real_s3 2025-08-25 16:03:18,353 - client_utils - INFO - VERSION with AZURE and GCP 240/16608 tests collected (16368 deselected) in 3.67s python/tests/integration/arcticdb/test_arctic.py::test_read_with_read_request_form python/tests/integration/arcticdb/test_arctic_batch.py::test_delete_version_with_snapshot_batch python/tests/integration/arcticdb/test_arctic_batch.py::test_read_batch_overall_query_builder python/tests/integration/arcticdb/test_arctic_batch.py::test_read_batch_overall_query_builder_and_per_request_query_builder_raises python/tests/integration/arcticdb/test_arctic_batch.py::test_read_batch_per_symbol_query_builder python/tests/integration/arcticdb/test_arctic_batch.py::test_read_batch_query_builder_missing_keys python/tests/integration/arcticdb/test_arctic_batch.py::test_read_batch_query_builder_symbol_doesnt_exist python/tests/integration/arcticdb/test_arctic_batch.py::test_read_batch_query_builder_version_doesnt_exist python/tests/integration/arcticdb/test_read_batch_more.py::test_read_batch_multiple_symbols_all_types_data_query_metadata python/tests/integration/arcticdb/test_read_batch_more.py::test_read_batch_multiple_wrong_things_at_once python/tests/integration/arcticdb/test_read_batch_more.py::test_read_batch_query_and_columns python/tests/integration/arcticdb/test_read_batch_more.py::test_read_batch_query_with_and ``` IMPORTANT: this approach can we switched on and off. By default it is switched on. To switch it off use ```ARCTICDB_EXTENDED_MARKS=0``` #### Any other comments? #### Checklist <details> <summary> Checklist for code changes... </summary> - [ ] Have you updated the relevant docstrings, documentation and copyright notice? - [ ] Is this contribution tested against [all ArcticDB's features](../docs/mkdocs/docs/technical/contributing.md)? - [ ] Do all exceptions introduced raise appropriate [error messages](https://docs.arcticdb.io/error_messages/)? - [ ] Are API changes highlighted in the PR description? - [ ] Is the PR labelled as enhancement or bug so it appears in autogenerated release notes? </details> <!-- Thanks for contributing a Pull Request to ArcticDB! Please ensure you have taken a look at: - ArcticDB's Code of Conduct: https://github.com/man-group/ArcticDB/blob/master/CODE_OF_CONDUCT.md - ArcticDB's Contribution Licensing: https://github.com/man-group/ArcticDB/blob/master/docs/mkdocs/docs/technical/contributing.md#contribution-licensing --> --------- Co-authored-by: Georgi Rusev <Georgi Rusev>
1 parent bcf2b2f commit f30c921

File tree

17 files changed

+450
-34
lines changed

17 files changed

+450
-34
lines changed

build_tooling/list_pytests.sh

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
#!/bin/bash
2+
3+
# Script: list_unique_tests.sh
4+
# Description: Lists unique pytest test names (without parameterized fixture values)
5+
# for the given pytest -m marker expression(s).
6+
7+
if [ $# -eq 0 ]; then
8+
echo "Usage: $0 <pytest_mark_expression>"
9+
echo "Example: $0 \"pipeline and real_s3\""
10+
else
11+
# Join all arguments into a single marker expression
12+
MARK_EXPR="$*"
13+
14+
# Collect and deduplicate test names
15+
tests=$(pytest --co -q -m "$MARK_EXPR" \
16+
| sed 's/\[.*\]//' \
17+
| sort -u)
18+
19+
# Print tests
20+
echo "$tests"
21+
22+
# Count them
23+
count=$(echo "$tests" | grep -c '^')
24+
echo "Total unique tests: $count"
25+
fi
26+
27+

pyproject.toml

Lines changed: 33 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -53,9 +53,40 @@ exclude = '''
5353
[tool.pytest.ini_options]
5454
markers = [
5555
"storage: marks a test as a test against real storage (deselect with: -m 'not storage')",
56+
"dedup: marks deduplication tests",
5657
"authentication: marks a test for authentication group (deselect with: -m 'not authentication')",
5758
"pipeline: Pipeline tests (deselect with: -m 'not pipeline')",
5859
"skip_fixture_params: will instruct fixture that supports excluding fixture values, which values to be excluded",
5960
"only_fixture_params: will instruct fixture supporting that to include only parameters from the list",
60-
"bug_ids: allows specifying bug ids list the tests is based on or depends"
61-
]
61+
"bug_ids: allows specifying bug ids list the tests is based on or depends",
62+
"priority0: Most important tests group",
63+
"compat: Mark from physical folder",
64+
"integration: Mark from physical folder",
65+
"unit: Mark from physical folder",
66+
"stress: Mark from physical folder",
67+
"nonreg: Mark from physical folder",
68+
"hypothesis: Mark from physical folder",
69+
"arcticdb: Mark from physical folder",
70+
"version_store: Mark from physical folder",
71+
"toolbox: Mark from physical folder",
72+
"lmdb: Mark from test usage for execution against LMDB storage",
73+
"mem: Mark from test usage for execution against In-memory storage",
74+
"s3: Mark from test usage for execution against Simulated S3 storage",
75+
"gcp: Mark from test usage for execution against Simulated GCP storage",
76+
"azurite: Mark from test usage for execution against Simulated Azurite storage",
77+
"nfs: Mark from test usage for execution against Simulated NFS S3 storage",
78+
"mongo: Mark from test usage for execution against Mongo storage",
79+
"real_s3: Mark from test usage for execution against AWS S3 storage",
80+
"real_azure: Mark from test usage for execution against Azure storage",
81+
"real_gcp: Mark from test usage for execution against GCP storage",
82+
"dynamic_schema: marks test using dynamic_schema=True",
83+
"empty_types: marks test using empty_types=True",
84+
"delayed_deletes: marks test using delayed_deletes=True",
85+
"sync_passive: marks test using sync_passive=True",
86+
"use_tombstones: marks test using use_tombstones=True",
87+
"segment_size: marks test using any of library segment size settings",
88+
"dynamic_strings: marks tests using dynamic_strings=True",
89+
"bucketize_dynamic: marks tests using bucketize_dynamic=True",
90+
"prune_previous: marks tests using prune_previous_version=True",
91+
"encoding_v2: marks tests that use V2 encoding"
92+
]

python/arcticdb/util/logger.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -81,8 +81,3 @@ def __init__(self, message: str):
8181
# Sanitize the message
8282
sanitized_message = GitHubSanitizingHandler.sanitize_message(message)
8383
super().__init__(sanitized_message)
84-
85-
86-
sanitized_message = " fgy 54654 ARCTICDB_REAL_S3_SECRET_KEY=AwsB1YWasZBtonDiBcsqtz36M3m4yPl9EsiTS57w"
87-
sanitized_message = re.sub(r"(.*SECRET_KEY=).*$", r"\1***", sanitized_message, flags=re.IGNORECASE)
88-
print(sanitized_message)

python/tests/conftest.py

Lines changed: 280 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -7,7 +7,7 @@
77
"""
88

99
import enum
10-
from typing import Callable, Generator, Union
10+
from typing import Callable, Generator, Iterable, Union
1111
from arcticdb.util.logger import get_logger
1212
from arcticdb.version_store._store import NativeVersionStore
1313
from arcticdb.version_store.library import Library
@@ -54,7 +54,9 @@
5454
from arcticdb.version_store._normalization import MsgPackNormalizer
5555
from arcticdb.util.test import create_df
5656
from arcticdb.arctic import Arctic
57+
from tests.util.marking import Mark
5758
from .util.mark import (
59+
EXTENDED_MARKS,
5860
LMDB_TESTS_MARK,
5961
LOCAL_STORAGE_TESTS_ENABLED,
6062
MACOS_WHEEL_BUILD,
@@ -1541,3 +1543,280 @@ def clear_query_stats():
15411543
yield
15421544
query_stats.disable()
15431545
query_stats.reset_stats()
1546+
1547+
1548+
# region Pytest special xfail handling
1549+
1550+
1551+
def pytest_runtest_makereport(item, call):
1552+
from tests.pytest_xfail import pytest_runtest_makereport
1553+
1554+
return pytest_runtest_makereport(item, call)
1555+
1556+
1557+
def pytest_terminal_summary(terminalreporter, exitstatus):
1558+
from tests.pytest_xfail import pytest_terminal_summary
1559+
1560+
pytest_terminal_summary(terminalreporter, exitstatus)
1561+
1562+
1563+
# endregion
1564+
1565+
# region =================================== Pytest plugins&hooks ====================================
1566+
1567+
1568+
class Marks:
1569+
"""Central Marks Registry
1570+
Usage:
1571+
@mark([Marks.abc, Marks.cde])
1572+
def test_first():
1573+
....
1574+
@Marks.abc.mark
1575+
def test_two():
1576+
....
1577+
"""
1578+
1579+
storage = Mark("storage")
1580+
dedup = Mark("dedup")
1581+
authentication = Mark("authentication")
1582+
pipeline = Mark("pipeline")
1583+
compat = Mark("compat")
1584+
dynamic_schema = Mark("dynamic_schema")
1585+
encoding_v2 = Mark("encoding_v2")
1586+
empty_types = Mark("empty_types")
1587+
delayed_deletes = Mark("delayed_deletes")
1588+
use_tombstones = Mark("use_tombstones")
1589+
sync_passive = Mark("sync_passive")
1590+
segment_size = Mark("segment_size")
1591+
dynamic_strings = Mark("dynamic_strings")
1592+
prune_previous = Mark("prune_previous")
1593+
bucketize_dynamic = Mark("bucketize_dynamic")
1594+
lmdb = Mark("lmdb")
1595+
mem = Mark("mem")
1596+
nfs = Mark("nfs")
1597+
mongo = Mark("mongo")
1598+
azurite = Mark("azurite")
1599+
s3 = Mark("s3")
1600+
gcp = Mark("gcp")
1601+
real_s3 = Mark("real_s3")
1602+
real_gcp = Mark("real_gcp")
1603+
real_azure = Mark("real_azure")
1604+
integration = Mark("integration")
1605+
unit = Mark("unit")
1606+
stress = Mark("stress")
1607+
nonreg = Mark("nonreg")
1608+
hypothesis = Mark("hypothesis")
1609+
arcticdb = Mark("arcticdb")
1610+
version_store = Mark("version_store")
1611+
toolbox = Mark("toolbox")
1612+
priority0 = Mark("priority0")
1613+
1614+
@classmethod
1615+
def list_all_marks(cls):
1616+
"""Lists all marks in the registry"""
1617+
return [v for k, v in cls.__dict__.items() if isinstance(v, Mark)]
1618+
1619+
1620+
def apply_hybrid_marks(item, source_values: Iterable[str], rules: dict):
1621+
"""
1622+
Apply marks to pytest item if any of the source_values matches a rule.
1623+
1624+
:param item: pytest.Item
1625+
:param source_values: values to search in (e.g., [item.name], item.fixturenames, [item.fspath])
1626+
:param rules: dict of mark_name -> list[str | regex]
1627+
"""
1628+
for mark_name, patterns in rules.items():
1629+
1630+
# Deduplication guard
1631+
if item.get_closest_marker(mark_name):
1632+
continue
1633+
1634+
marked = False
1635+
for pattern in patterns:
1636+
if marked:
1637+
break
1638+
for value in source_values:
1639+
value_lower = value.lower()
1640+
if isinstance(pattern, str):
1641+
if pattern.lower() in value_lower:
1642+
item.add_marker(mark_name)
1643+
marked = True
1644+
break
1645+
elif pattern.search(value):
1646+
item.add_marker(mark_name)
1647+
marked = True
1648+
break
1649+
1650+
1651+
# Define how fixtures map to marks
1652+
ALL_FIXTURES = [
1653+
re.compile(r"^arctic_client(?!.*lmdb).*", re.I),
1654+
re.compile(r"^arctic_library(?!.*lmdb).*", re.I),
1655+
re.compile(r"^object_and_mem_and_lmdb.*", re.I),
1656+
]
1657+
ALL_FIXTURES_AND_LMDB = [
1658+
re.compile(r"^arctic_client.*", re.I),
1659+
re.compile(r"^arctic_library.*", re.I),
1660+
re.compile(r"^object_and_mem_and_lmdb.*", re.I),
1661+
]
1662+
BASIC_ARCTIC_FIXTURES = [re.compile(r"^basic_arctic", re.I)]
1663+
BASIC_STORE_FIXTURES = [re.compile(r"^(basic_store.*|basic_version_.*) ", re.I)]
1664+
OBJECT_STORE_FIXTURES = [re.compile(r"^(object_store.*|object_version_.*)", re.I)]
1665+
LOCAL_OBJECT_STORE_FIXTURES = [re.compile(r"^(local_object_store.*|local_object_version.*)", re.I)]
1666+
VERSION_STORE_AND_REAL_FIXTURES = [re.compile(r"^version_store_and_real*", re.I)]
1667+
1668+
FIXTURES_TO_MARK = {
1669+
Marks.lmdb.name: [re.compile(r"^lmdb_.*", re.I)]
1670+
+ ALL_FIXTURES_AND_LMDB
1671+
+ VERSION_STORE_AND_REAL_FIXTURES
1672+
+ BASIC_STORE_FIXTURES,
1673+
Marks.mem.name: [re.compile(r"^(mem_.*|in_memory_.*)", re.I)] + ALL_FIXTURES + BASIC_STORE_FIXTURES,
1674+
Marks.s3.name: [re.compile(r"^(s3_.*|mock_s3.*)", re.I)]
1675+
+ ALL_FIXTURES
1676+
+ BASIC_STORE_FIXTURES
1677+
+ LOCAL_OBJECT_STORE_FIXTURES
1678+
+ OBJECT_STORE_FIXTURES,
1679+
Marks.nfs.name: [re.compile(r"^nfs_.*", re.I)] + ALL_FIXTURES + OBJECT_STORE_FIXTURES,
1680+
Marks.gcp.name: [re.compile(r"^gcp_.*", re.I)] + ALL_FIXTURES,
1681+
Marks.mongo.name: [re.compile(r"^mongo_.*", re.I)] + ALL_FIXTURES,
1682+
Marks.azurite.name: [re.compile(r"^(azurite_.*|azure_.*)", re.I)]
1683+
+ ALL_FIXTURES
1684+
+ LOCAL_OBJECT_STORE_FIXTURES
1685+
+ OBJECT_STORE_FIXTURES
1686+
+ OBJECT_STORE_FIXTURES,
1687+
Marks.real_s3.name: [re.compile(r"^real_s3_.*", re.I)]
1688+
+ ALL_FIXTURES
1689+
+ BASIC_STORE_FIXTURES
1690+
+ BASIC_ARCTIC_FIXTURES
1691+
+ VERSION_STORE_AND_REAL_FIXTURES
1692+
+ OBJECT_STORE_FIXTURES,
1693+
Marks.real_azure.name: [re.compile(r"^real_azure_.*", re.I)]
1694+
+ ALL_FIXTURES
1695+
+ BASIC_STORE_FIXTURES
1696+
+ BASIC_ARCTIC_FIXTURES
1697+
+ VERSION_STORE_AND_REAL_FIXTURES
1698+
+ OBJECT_STORE_FIXTURES,
1699+
Marks.real_gcp.name: [re.compile(r"^real_gcp_.*", re.I)]
1700+
+ ALL_FIXTURES
1701+
+ BASIC_STORE_FIXTURES
1702+
+ BASIC_ARCTIC_FIXTURES
1703+
+ VERSION_STORE_AND_REAL_FIXTURES
1704+
+ OBJECT_STORE_FIXTURES,
1705+
Marks.dynamic_schema.name: [re.compile(r".*(dynamic_schema|dynamic(?!string)).*", re.I)],
1706+
Marks.empty_types.name: [
1707+
"empty_types",
1708+
"lmdb_version_store_delayed_deletes_v1",
1709+
"lmdb_version_store_delayed_deletes_v2",
1710+
],
1711+
Marks.delayed_deletes.name: ["delayed_deletes"],
1712+
Marks.use_tombstones.name: ["tombstone", "basic_store_prune_previous", "basic_store_prune_previous"],
1713+
Marks.sync_passive.name: ["sync_passive"],
1714+
Marks.bucketize_dynamic.name: ["buckets"],
1715+
Marks.prune_previous.name: [
1716+
"prune_previous",
1717+
"lmdb_version_store_delayed_deletes_v1",
1718+
"lmdb_version_store_tombstone_and_pruning",
1719+
"basic_store_delayed_deletes_v1",
1720+
"basic_store_delayed_deletes_v2",
1721+
],
1722+
Marks.segment_size.name: ["segment", "lmdb_version_store_no_symbol_list"],
1723+
Marks.dynamic_strings.name: [
1724+
"dynamic_strings",
1725+
"real_s3_version_store_dynamic_schema",
1726+
"real_gcp_version_store_dynamic_schema",
1727+
"real_azure_version_store_dynamic_schema",
1728+
"nfs_backed_s3_version_store_v1",
1729+
"nfs_backed_s3_version_store_v2",
1730+
"s3_version_store_v1",
1731+
"s3_version_store_v2",
1732+
"s3_version_store_dynamic_schema_v1",
1733+
"s3_version_store_dynamic_schema_v2",
1734+
"nfs_backed_s3_version_store_dynamic_schema_v2",
1735+
"nfs_backed_s3_version_store_dynamic_schema_v2",
1736+
"azure_version_store_dynamic_schema",
1737+
"lmdb_version_store_v1",
1738+
"lmdb_version_store_v2",
1739+
"lmdb_version_store_prune_previous",
1740+
"lmdb_version_store_dynamic_schema_v1",
1741+
"lmdb_version_store_dynamic_schema_v2",
1742+
"lmdb_version_store_dynamic_schema",
1743+
"lmdb_version_store_empty_types_v1",
1744+
"lmdb_version_store_empty_types_v2",
1745+
"lmdb_version_store_empty_types_dynamic_schema_v1",
1746+
"lmdb_version_store_empty_types_dynamic_schema_v2",
1747+
"lmdb_version_store_delayed_deletes_v1",
1748+
"lmdb_version_store_delayed_deletes_v2",
1749+
"lmdb_version_store_tombstones_no_symbol_list",
1750+
"lmdb_version_store_allows_pickling",
1751+
"lmdb_version_store_tiny_segment_dynamic_strings",
1752+
"basic_store_prune_previous",
1753+
"basic_store_dynamic_schema_v1",
1754+
"basic_store_dynamic_schema_v2",
1755+
"basic_store_dynamic_schema",
1756+
"basic_store_delayed_deletes_v1",
1757+
"basic_store_delayed_deletes_v2",
1758+
"basic_store_tombstones_no_symbol_list",
1759+
"basic_store_allows_pickling",
1760+
],
1761+
Marks.encoding_v2.name: [
1762+
re.compile(
1763+
r".*("
1764+
r"arctic_client|"
1765+
r"nfs_backed_s3_version_store_dynamic_schema|"
1766+
r"lmdb_version_store_|"
1767+
r"lmdb_version_store_dynamic_schema|"
1768+
r"lmdb_version_store_empty_types_|"
1769+
r"lmdb_version_store_empty_types_dynamic_schema|"
1770+
r"lmdb_version_store_delayed_deletes|"
1771+
r"basic_store_dynamic_schema"
1772+
r").*(?!v1).*",
1773+
re.I,
1774+
)
1775+
],
1776+
}
1777+
1778+
ALL_FIXTURE_NAMES = set()
1779+
1780+
1781+
def pytest_collection_modifyitems(config, items):
1782+
"""This hook is useful for filtering in out tests and modifying tests
1783+
as soon as pytest collects them before execution
1784+
"""
1785+
1786+
def evaluate_item(item, part_string: str, mark_to_add: Mark):
1787+
"""Evaluate item(test) if its module path contains certain string
1788+
If there it will mark the test with specified mark
1789+
"""
1790+
doc = item.module.__file__
1791+
if doc and part_string in doc.lower():
1792+
item.add_marker(mark_to_add)
1793+
1794+
# Apply this process only when asked for
1795+
if not EXTENDED_MARKS:
1796+
return
1797+
1798+
start_time = time.time()
1799+
for item in items:
1800+
## Add custom marks to test depending file path name of module to the test
1801+
## Electively this silently marks each test with its physical location in the repo
1802+
## allowing later that physical location to be used in combination with other marks
1803+
##
1804+
## Example:
1805+
## pytest -s --co -m "toolbox and storage"
1806+
evaluate_item(item, Marks.unit.name, Marks.unit.mark)
1807+
evaluate_item(item, Marks.integration.name, Marks.integration.mark)
1808+
evaluate_item(item, Marks.stress.name, Marks.stress.mark)
1809+
evaluate_item(item, Marks.hypothesis.name, Marks.hypothesis.mark)
1810+
evaluate_item(item, Marks.nonreg.name, Marks.integration.mark)
1811+
evaluate_item(item, Marks.version_store.name, Marks.version_store.mark)
1812+
evaluate_item(item, Marks.toolbox.name, Marks.toolbox.mark)
1813+
1814+
# --- Auto‑mark by fixtures ---
1815+
fixtures = set(item.fixturenames)
1816+
ALL_FIXTURE_NAMES.update(fixtures)
1817+
apply_hybrid_marks(item, fixtures, FIXTURES_TO_MARK)
1818+
1819+
get_logger().info(f"Extended marks applied for: {time.time() - start_time} sec.")
1820+
1821+
1822+
# endregion

0 commit comments

Comments
 (0)