diff --git a/docs/content/punpy_digital_effects_table.rst b/docs/content/punpy_digital_effects_table.rst index c1e55f4..8d42aa5 100644 --- a/docs/content/punpy_digital_effects_table.rst +++ b/docs/content/punpy_digital_effects_table.rst @@ -110,7 +110,7 @@ or Law of Propagation of Uncertainties (LPU) method (see Section :ref:`LPUMethod gl = IdealGasLaw(prop=prop) -If no argument is provided for prop, a MCPropagation(100,parallel_cores=0) object is used. +If no argument is provided for prop, a MCPropagation(100,parallel_cores=1) object is used. The next arguments are for providing the input quantity names and the measurand name and measurand unit respectively:: gl = IdealGasLaw(prop=prop, xvariables=["pressure", "temperature", "n_moles"], yvariable="volume", yunit="m^3") diff --git a/docs/content/punpy_memory_and_speed.rst b/docs/content/punpy_memory_and_speed.rst index 03d7ed4..0071930 100644 --- a/docs/content/punpy_memory_and_speed.rst +++ b/docs/content/punpy_memory_and_speed.rst @@ -137,10 +137,30 @@ In some cases, when there are multiple measurands with different shapes, it is n In such cases, the `refyvar` keyword should be set to the index of the measurand with the most dimensions and the repeat_dims indexes should correspond to this measurand. +Processing the MC samples simultaneously +######################################## +For simple measurement functions (e.g. that just do arithmic operations such as +,-,*,/) with numpy arrays, the most computationally efficient way to propagate uncertainties is to +pass all the MC samples simultaneously to the measurement function. The measurement thus takes as arguments numpy arrays consisting of all MC steps for a given +input quantity, and will return a numpy array with all the MC samples of the measurand. In order to use this method, set the optional `parallel_cores` keyword to 0. +By using this method we can then benefit from the optimised array operations within numpy +(which use all available CPUS, and rely on C code instead of Python), which is significantly faster than any available alternative. +This method is recommended for any measurement function that allows it (i.e. if the measurment function can deal with the additional dimension in the input quantities and through numpy operations the measurand will simply end up with the same additional dimension.) + +When parallel_cores is set to 0, all iterations will be processed simultaneously and there will be an additional dimension for the MC iterations. +Generally within punpy, the MC dimension in the samples is the first one (i.e. internally as well as when MC samples are returned). +However, when processing all iterations simultaniously, in most cases it is more practical to have the MC dimension as the last dimension. +This is because we use numpy arrays and these are compatible when the last dimensions match following the `numpy broadcasting rules `_. +So as default, the shape of the input quantities when using parallel_cores=0 will have the MC iterations as its last dimension. +However, in some cases it is more helpful to have the MC iterations as the first dimension. +If this is the case, the MC iteration dimension can be made the first dimension by setting the `MClastdim` keyword to False:: + + prop = punpy.MCPropagation(1000,parallel_cores=0,MCdimlast=False) + + Processing the MC samples in parallel ###################################### -At the start of this section we already saw that the optional `parallel_cores` keyword can be used to running the MC -samples one-by-one through the measurement function rather than all at once as in the standard case. It is also possible +In the previous section we already saw that the optional `parallel_cores` keyword can be used to running the MC +samples all at once through the measurement function rather than one-by-one as in the standard case. It is also possible to use the same keyword to use parallel processing. Here, only the processing of the input quantities through the measurement function is done in parallel. Generating the samples and calculating the covariance matrix etc is still done as normal. Punpy uses the multiprocessing module which comes standard with your python distribution. diff --git a/docs/content/punpy_standalone.rst b/docs/content/punpy_standalone.rst index d46cc6a..3038ef2 100644 --- a/docs/content/punpy_standalone.rst +++ b/docs/content/punpy_standalone.rst @@ -62,9 +62,9 @@ first create a prop object (object of punpy MCPropagation of LPUPropagation clas prop=punpy.MCPropagation(10000) # Here the number is how # many MC samples will be generated - # Or if you have a measurement function that does not accept + # Or if you have a measurement function that accepts # higher dimensional arrays as argument: - prop=punpy.MCPropagation(10000,parallel_cores=1) + prop=punpy.MCPropagation(10000,parallel_cores=0) # Alternatively it is possible to use LPU methods to # propagate uncertainties @@ -82,15 +82,18 @@ GUM (Guide to the Expression of Uncertainty in Measurement) by calculating the J For the MC method, the number of MC samples that is used is set as the first argument when creating the MCPropagation object (see example above). Two approaches can be followed to propagate the MC samples of the input quantities to the measurand, depending on whether the measurement function can be applied to numpy arrays of arbitrary size. -The most computationally efficient way is to pass an array consisting of all MC steps of an -input quantity instead of the input quantity themselves. Each of the input quantities will thus get an additional dimension in the MC sample. +By default, MC samples are processed one-by-one. This is equivalent to setting the optional +`parallel_cores` keyword to 1. However, some measurement functions allow to process all the MC samples simultaneously. +When this is possible, it is the most computationally efficient way to propagate MC uncertainties. +In this case, a numpy array consisting of all MC steps of an input quantity is passed to the measurement function +instead of the input quantity themselves. Each of the input quantities will thus get an additional MC dimension and these higher +dimension input quantities are used as arguments to the measurement function, which will in turn return a measurand with this additional MC dimension. If the measurement function can deal with these higher dimensional arrays by just performing numpy operations, this gives the most computationally efficient MC propagation. -This method is the default in punpy (which corresponds to setting the optional `parallel_cores` keyword to 0). +In order to use this method, the optional `parallel_cores` keyword is to 0. -However, this is not the case for every measurement function. If the inputs to the measurement -function are less flexible, and don't support additional dimensions, it is possible to instead run the MC samples one by one. -In order to pass each MC sample individually to the measurement function, it is possible to set the optional -`parallel_cores` keyword to 1. In :ref:`punpy_memory_and_speed`, we will show how the same keyword can be used to do parallel processing for such measurement functions. +If the inputs to the measurement function are less flexible, and don't support additional dimensions, +one can either use the default option (`parallel_cores=1`; run the MC samples one by one) or use parallel processing (`parallel_cores>1`). +In :ref:`punpy_memory_and_speed`, we provide some more detail on processing the MC samples simultaniously or in parallel. For the LPU methods, the numdifftools package (used within comet_maths) is used to calculate the Jacobian. This package automatically determines the stepsize in the numerical @@ -341,15 +344,3 @@ However, it is also possible to ignore all MC samples where any of the values ar This can be done by setting the `allow_some_nans` keyword to False. -Shape of input quanties within the measurement function -########################################################## -When setting parallel_cores to 1 or more, the shape of the input quantities used for each iteration in the measurement function matches the shape of the input quantities themselves. -However, when parallel_cores is set to 0, all iterations will be processed simultaneously and there will be an additional dimension for the MC iterations. -Generally within punpy, the MC dimension in the samples is the first one (i.e. internally as well as when MC samples are returned). -However, when processing all iterations simultaniously, in most cases it is more practical to have the MC dimension as the last dimension. -This is because we use numpy arrays and these are compatible when the last dimensions match following the `numpy broadcasting rules `_. -So as default, the shape of the input quantities when using parallel_cores will have the MC iterations as its last dimension. -However, in some cases it is more helpful to have the MC iterations as the first dimension. -If this is the case, the MC iteration dimension can be made the first dimension by setting the `MClastdim` keyword to False:: - - prop = punpy.MCPropagation(1000,MCdimlast=False) diff --git a/propagate_ds_example.nc b/propagate_ds_example.nc new file mode 100644 index 0000000..c00d6d6 Binary files /dev/null and b/propagate_ds_example.nc differ diff --git a/punpy/_version.py b/punpy/_version.py index 9e604c0..6849410 100644 --- a/punpy/_version.py +++ b/punpy/_version.py @@ -1 +1 @@ -__version__ = "1.0.7" +__version__ = "1.1.0" diff --git a/punpy/digital_effects_table/tests/test_encoding.nc b/punpy/digital_effects_table/tests/test_encoding.nc new file mode 100644 index 0000000..50dca0c Binary files /dev/null and b/punpy/digital_effects_table/tests/test_encoding.nc differ diff --git a/punpy/lpu/lpu_propagation.py b/punpy/lpu/lpu_propagation.py index 4215b59..c44bd08 100644 --- a/punpy/lpu/lpu_propagation.py +++ b/punpy/lpu/lpu_propagation.py @@ -30,7 +30,7 @@ class LPUPropagation: :type verbose: bool """ - def __init__(self, parallel_cores=0, Jx_diag=False, step=None, verbose=False): + def __init__(self, parallel_cores=1, Jx_diag=False, step=None, verbose=False): self.parallel_cores = parallel_cores self.Jx_diag = Jx_diag self.step = step diff --git a/punpy/mc/mc_propagation.py b/punpy/mc/mc_propagation.py index fc63527..80cb5d9 100644 --- a/punpy/mc/mc_propagation.py +++ b/punpy/mc/mc_propagation.py @@ -34,7 +34,7 @@ class MCPropagation: """ def __init__( - self, steps, parallel_cores=0, dtype=None, verbose=False, MCdimlast=True + self, steps, parallel_cores=1, dtype=None, verbose=False, MCdimlast=True ): self.MCsteps = steps self.parallel_cores = parallel_cores @@ -1443,6 +1443,7 @@ def run_samples( start=None, end=None, sli=None, + return_objects=False, allow_some_nans=True, ): """ @@ -1490,100 +1491,112 @@ def run_samples( MC_y = self.pool.starmap(func, MC_x2) if output_vars == 1: - if allow_some_nans: + if return_objects: MC_y_out = np.array( - [ - MC_y[i] - for i in range(len(indices)) - if np.any(np.isfinite(MC_y[i])) - ], - dtype=self.dtype, + [MC_y[i] for i in range(len(indices))], + dtype=object, ) else: - MC_y_out = np.array( - [ - MC_y[i] - for i in range(len(indices)) - if np.all(np.isfinite(MC_y[i])) - ], - dtype=self.dtype, - ) - else: - if allow_some_nans: - try: + if allow_some_nans: MC_y_out = np.array( [ MC_y[i] for i in range(len(indices)) - if np.all( - [ - np.any(np.isfinite(MC_y[i][ivar])) - for ivar in range(output_vars) - if ( - hasattr(MC_y[i][ivar], "__len__") - and (len(MC_y[i][ivar]) > 0) - ) - ] - ) + if np.any(np.isfinite(MC_y[i])) ], dtype=self.dtype, ) - except: - MC_y_out = np.array( - [ - MC_y[i] - for i in range(len(indices)) - if np.all( - [ - np.any(np.isfinite(MC_y[i][ivar])) - for ivar in range(output_vars) - if ( - hasattr(MC_y[i][ivar], "__len__") - and (len(MC_y[i][ivar]) > 0) - ) - ] - ) - ], - dtype=object, - ) - - else: - try: + else: MC_y_out = np.array( [ MC_y[i] for i in range(len(indices)) - if np.all( - [ - np.all(np.isfinite(MC_y[i][ivar])) - for ivar in range(output_vars) - if ( - hasattr(MC_y[i][ivar], "__len__") - and (len(MC_y[i][ivar]) > 0) - ) - ] - ) + if np.all(np.isfinite(MC_y[i])) ], dtype=self.dtype, ) - except: - MC_y_out = np.array( - [ - MC_y[i] - for i in range(len(indices)) - if np.all( - [ - np.all(np.isfinite(MC_y[i][ivar])) - for ivar in range(output_vars) - if ( - hasattr(MC_y[i][ivar], "__len__") - and (len(MC_y[i][ivar]) > 0) - ) - ] - ) - ], - dtype=object, - ) + else: + if return_objects: + MC_y_out = np.array( + [MC_y[i] for i in range(len(indices))], + dtype=object, + ) + else: + if allow_some_nans: + try: + MC_y_out = np.array( + [ + MC_y[i] + for i in range(len(indices)) + if np.all( + [ + np.any(np.isfinite(MC_y[i][ivar])) + for ivar in range(output_vars) + if ( + hasattr(MC_y[i][ivar], "__len__") + and (len(MC_y[i][ivar]) > 0) + ) + ] + ) + ], + dtype=self.dtype, + ) + except: + MC_y_out = np.array( + [ + MC_y[i] + for i in range(len(indices)) + if np.all( + [ + np.any(np.isfinite(MC_y[i][ivar])) + for ivar in range(output_vars) + if ( + hasattr(MC_y[i][ivar], "__len__") + and (len(MC_y[i][ivar]) > 0) + ) + ] + ) + ], + dtype=object, + ) + + else: + try: + MC_y_out = np.array( + [ + MC_y[i] + for i in range(len(indices)) + if np.all( + [ + np.all(np.isfinite(MC_y[i][ivar])) + for ivar in range(output_vars) + if ( + hasattr(MC_y[i][ivar], "__len__") + and (len(MC_y[i][ivar]) > 0) + ) + ] + ) + ], + dtype=self.dtype, + ) + except: + MC_y_out = np.array( + [ + MC_y[i] + for i in range(len(indices)) + if np.all( + [ + np.all(np.isfinite(MC_y[i][ivar])) + for ivar in range(output_vars) + if ( + hasattr(MC_y[i][ivar], "__len__") + and (len(MC_y[i][ivar]) > 0) + ) + ] + ) + ], + dtype=object, + ) if len(MC_y_out) < len(indices): if allow_some_nans: diff --git a/punpy/mc/tests/test_mc_propagation.py b/punpy/mc/tests/test_mc_propagation.py index 91fde64..8b1a691 100644 --- a/punpy/mc/tests/test_mc_propagation.py +++ b/punpy/mc/tests/test_mc_propagation.py @@ -763,6 +763,11 @@ def test_separate_processing(self): MC_x = prop.generate_MC_sample(xsd, xerrsd, corrd) MC_y1 = prop.run_samples(functiond, MC_x, output_vars=2, start=0, end=10000) MC_y2 = prop.run_samples(functiond, MC_x, output_vars=2, start=10000, end=20000) + MC_y3 = prop.run_samples( + functiond, MC_x, output_vars=2, start=0, end=10000, return_objects=True + ) + assert isinstance(MC_y3, object) + MC_y = prop.combine_samples([MC_y1, MC_y2]) ufd, ucorrd, corr_out = prop.process_samples(