-
Notifications
You must be signed in to change notification settings - Fork 146
ASV Benchmarks: Real storage tests
Having a real storage test would require a single test to be able to be executed against any supported by ArcticDB storage - Amazon S3, LMDB, GCP etc, without any change of test logic but only changing the parameters - like number of rows per dataframe etc. That requires creation of small abstraction over Arctic class that will allow us to take advantages of persistancy of storages like S3, and also emulate that with LMDB. Such abstraction is TestLibraryManager class.
The class main purpose is to provide ASV test developer with a way to manage lifecycle of persistent/or shared/ libraries and modifiable/or private/ libraries that only the particular test can access on any of the supported storages.
- A modifiable /or private/ library is such that is isolated and reachable only for the instance of the test that is currently working on one's machine. Any instances of the same tests on other machines will create their onw modifiable/or private/ libraries. This way each instance of the runs in isolation. The main important rule here is that those libraries should be cleaned either before the test is executed as part of the setup or as part of the cleanup. As recomendation it would be make sure to clean all libraries that could remain from previous execution in the setup of the test, as this makes sure the test runs in clean/controlled environment in any test execution
- A persistent/or shared/ library is such that can be created once and read many times by a particular ASV test. Its benefit is to have creation of the needed for test infrastructure of library and symbols done once and then reused many times. Additional benefit of such storage is the ability to simulate aging over time - if each run of test only adds data to libraries symbols without destroing them that simulates the typical aging process which is natural for all actual usage scenarios.
Another important characteristic of this class is that it provides a way to resuse that logic outside of ASV in other type of tests or tests of the class functionality itsilf. In other way it is not coupled at all with ASV.
The class provides minimal set of needed methods to achieve any user scenario for any of those storages. It does not implement all methods that Arctic class does have thus when needed can be exteneded. Here are practical examples of this class usage.
# Following code will create one persistent library
lm = TestLibraryManager(Storage.AMAZON, "FINALIZE") # Defined the type of storage and a name for the library
# NOTE: The 'FINALIZE' string will be part of the name not the whole name, but this string should be unique
# NOTE 2: The name of library MUST NOT be a substring of another library name.
# In other words do not plan to have 'FINALIZE' and 'FINALIZE_MORE' tests.
# Should be 'FINALIZE_BASIC' and 'FINALIZE_MORE'
lib = lm.get_library(LibraryType.PERSISTENT) # create one persistent library if does not exist
assert lm.has_library(LibraryType.PERSISTENT) # yes we have this library created
# Creation of several persistent libraries
lm = TestLibraryManager(Storage.AMAZON, "FINALIZE")
for name in ["first", "second", "third"]:
lib = lm.get_library(LibraryType.PERSISTENT, name)
assert lm.has_library(LibraryType.PERSISTENT, name)
# 'name' is the optional suffix of the actual library name that will be created:
# <somthing>_FINALIZE_name
# where something may be different so your test must not rely on trying to reconstruct
# the actual name, there is another method to get full name:
lib_name = lm.get_library_name(LibraryType.PERSISTENT, 'my_suffix')
# NOTE: as seen the suffix is not mandatory if you plan to have one library for your test
# Use of modifiable libraries
lm = TestLibraryManager(Storage.AMAZON, "FINALIZE")
lm.clear_all_benchmark_libs() # Use this to clear all previous not cleaned libs
# clears only modifiable libraries will not clear any persistent library
# will clear all libraries for this test/benchmark from any processes
# therefore useful to be added in setup_cache() method as precondition
for name in ["first", "second", "third"]:
lib = lm.get_library(LibraryType.MODIFIABLE, name)
assert lm.has_library(LibraryType.MODIFIABLE, name)
lm.clear_all_modifiable_libs_from_this_process() # will clear all created in the loop libraries
# differs from other clear method that it clears only the libs created from current process
# therefore it is useful in teardown() method
# From time to time the storage should be cleaned as it may still have traces of libraries
# not cleaned due to malfunctions etc. In that case the machine private space in shared
# storages can be cleaned with:
lm = TestLibraryManager(Storage.AMAZON, "FINALIZE")
lm.remove_all_modifiable_libs_for_machine()
# Not that this can be automated process iterating through all tests
# especially for github
# HOW TO CLEAR PERSISTENT LIBRARIES
# there are cases when you want to delete persistent libraries for a test/benchmark
# in order for them to be recreated due to a change in parameters that trigger different symbol
# structure in this case and ONLY IN THIS CASE use:
lm = TestLibraryManager(Storage.AMAZON, "FINALIZE")
lm.remove_all_persistent_libs_for_this_test()
# That process is intended to be manual, attended one and never automated!
# Should not be used in tests/benchmarks
Note, that each ASV test for a single benchmark runs in a separate process on same machine. Therefore the modifyable libraries for each process of the same test/benchmark will be different, in their own space. Thus you do not need to worry when creating libraries in ASV test. That is handled automatically by above methods
In order to do most of benchamark tests, first the structure of libraries and symbols need to be prepared. To help preparation class LibraryPopulationPolicy along with populate_library and populate_library_if_missing can be used. Note, that usage of those utilities is not bound to ASV, thus it is possible those to be reused outside ASV benchmark tests also.
Generally there are 3 different types of structures that those utilities will help you create:
- A single library with several symbols each having same or different number of rows and fixed columns
logger = get_console_logger()
lm = TestLibraryManager(Storage.AMAZON, "FINALIZE")
# This configures generation of 2 symbols with 10 and 20 rows. The number of rows can later be used to get symbol name.
# Note that this defined that all symbols will have fixed number of columns = 5
lpp = LibraryPopulationPolicy(logger).set_parameters([10,20], 5)
populate_library(lm, lpp, LibraryType.PERSISTENT)
lib = lm.get_library(LibraryType.PERSISTENT)
symbol = lpp.get_symbol_name(10) # to access the symbol we need its name
data = lib.read(symbol).data
symbol = lpp.get_symbol_name(20) # to access the symbol we need its name
data = lib.read(symbol).data
- A single library with several symbols each having same or different number of columns and fixed rows
# Using above example replace the definition with:
lpp = LibraryPopulationPolicy(some_logger).set_parameters(3, [10,20])
# This configures generation of 2 symbols with 10 and 20 columns. The number columns can later be used to get symbol name.
# Note that this defined that all symbols will have fixed number of rows = 3
- Populating a library with many identical symbols
ASV benchmark tests that would use real storage libraries and utilities described here should create the benchmark class inhering from AsvBase class, like this example:
class AWSReadWrite(AsvBase):
"""
This class is for general read write tests
Uses 1 persistent library for read tests
Uses 1 modifiable library for write tests
"""
rounds = 1
number = 3 # invokes 3 times the test runs between each setup-teardown
repeat = 1 # defines the number of times the measurements will invoke setup-teardown
min_run_count = 1
warmup_time = 0
timeout = 1200
param_names = ["num_rows"]
params = [1_000_000, 2_000_000]
library_manager = TestLibraryManager(storage=Storage.AMAZON, name_benchmark="READ_WRITE")
def get_logger(self) -> Logger:
return get_console_logger(self)
def get_library_manager(self) -> TestLibraryManager:
return AWSReadWrite.library_manager
def get_population_policy(self) -> LibraryPopulationPolicy:
lpp = LibraryPopulationPolicy(self.get_logger(), AllColumnTypesGenerator()).set_parameters(AWSReadWrite.params)
return lpp
def setup_cache(self):
'''
In setup_cache we only populate the persistent libraries if they are missing.
'''
manager = self.get_library_manager()
policy = self.get_population_policy()
populate_library_if_missing(manager, policy, LibraryType.PERSISTENT)
manager.log_info() # Logs info about ArcticURI - do always use last
def setup(self, num_rows):
self.population_policy = self.get_population_policy()
self.symbol = self.population_policy.get_symbol_name(num_rows)
...........
def teardown(self, num_rows):
# We could clear the modifiable libraries we used
self.get_library_manager().clear_all_modifiable_libs_from_this_process()
ArcticDB Wiki