-
Notifications
You must be signed in to change notification settings - Fork 46
Feature: support multi-dimensional batch calculation[experimental] #1186
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
Signed-off-by: Tony Xiang <[email protected]>
|
| /** | ||
| * @brief Create a PGM_MultiDimensionalDataset from multiple PGM_ConstDataset instances | ||
| * | ||
| * @param handle | ||
| * @param const_datasets | ||
| * @param n_datasets | ||
| * @return | ||
| */ | ||
| PGM_API PGM_MultiDimensionalDataset* | ||
| PGM_dataset_create_multidimensional_from_const(PGM_Handle* handle, PGM_ConstDataset const** const_datasets, | ||
| PGM_Idx n_datasets); | ||
|
|
||
| /** | ||
| * @brief Get the array pointer from a PGM_MultiDimensionalDataset | ||
| * | ||
| * @param handle | ||
| * @param multidimensional_dataset | ||
| * @return | ||
| */ | ||
| PGM_API PGM_ConstDataset const* | ||
| PGM_get_array_pointer_from_multidimensional(PGM_Handle* handle, | ||
| PGM_MultiDimensionalDataset const* multidimensional_dataset); | ||
|
|
||
| /** | ||
| * @brief destroy the multidimensional dataset object | ||
| * | ||
| * @param multidimensional_dataset | ||
| */ | ||
| PGM_API void PGM_destroy_multidimensional_dataset(PGM_MultiDimensionalDataset* multidimensional_dataset); | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mgovers This is one option to pass a MD datasets into the C-API. You need to have additional helper functions to create an array of datasets. We need a new opaque struct.
Another interesting approach would be using chaining. So define a function called dataset_const_chain_another, which store a pointer to the next dimension dataset into this one. Internnaly we can check if the next_dataset pointer is nullptr or not to traverse dimensions. This way is less intuitive, but still used in many other APIs. And you only need one extra function. We don't even need to change the options.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i prefer this separate ND approach, but how would we use ND-datasets, e.g. a 3-dimensional one instead of only 2-dimensional
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is not really a ND, but a list of datasets. The core will make a cross-join ND calculation of combinations in the list.
The questions, do we create separate opaque struct to store an array of datasets? This requires a lot of new functions. Or do we do the chaining, this only needs one new function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i do believe under the current implementation, sure, but it also restricts us going forward. i feel like if we are going to support 1D and have a separate object for 2D, then we might as well go for ND. otherwise, we'll fall into the same rabbit hole that C++ has been in (watch any video on std::mdspan and you'll know, they are better at explaining what i mean than i am)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current experiment is about ND, not just 2D.
The ND calculation is created by a list of datasets. Each dataset in the list represents the mutation of that dimension. The core creates a cross-join on the mutations.
| requires(!is_indptr_mutable_v<dataset_type>) | ||
| { | ||
| assert(0 <= scenario && scenario < batch_size()); | ||
| assert(0 <= scenario && scenario <= batch_size()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think this will crash because buffer.indptr[scenario + 1] is called. In python, it's also not possible to call [1, 2, 3][3] (IndexOutOfRange)
| } | ||
|
|
||
| // get slice dataset from batch | ||
| Dataset get_slice_scenario(Idx begin, Idx end) const |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
really useful 👍 If we add an additional increment here, we can actually also use this in the dispatching of multithreaded batch calculations
| using PGM_MutableDataset = power_grid_model::meta_data::Dataset<power_grid_model::mutable_dataset_t>; | ||
| using PGM_WritableDataset = power_grid_model::meta_data::Dataset<power_grid_model::writable_dataset_t>; | ||
| using PGM_DatasetInfo = power_grid_model::meta_data::DatasetInfo; | ||
| using PGM_MultiDimensionalDataset = std::vector<PGM_ConstDataset>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i feel like this should be able to use arbitrary dimensions
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
C++23 has std::mdspan, which would probably solve this
| * | ||
| * @param multidimensional_dataset | ||
| */ | ||
| PGM_API void PGM_destroy_multidimensional_dataset(PGM_MultiDimensionalDataset* multidimensional_dataset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
inconsistent PGM_dataset_create_* vs PGM_destroy_*
| * @brief Opaque struct for the multi dimensional dataset class. | ||
| * | ||
| */ | ||
| typedef struct PGM_MultiDimensionalDataset PGM_MultiDimensionalDataset; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe following the conventions of std::mdspan:
| typedef struct PGM_MultiDimensionalDataset PGM_MultiDimensionalDataset; | |
| typedef struct PGM_MDDataset PGM_MDDataset; |
| */ | ||
| PGM_API PGM_ConstDataset const* | ||
| PGM_get_array_pointer_from_multidimensional(PGM_Handle* handle, | ||
| PGM_MultiDimensionalDataset const* multidimensional_dataset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe we can do a multi-indexing approach instead? e.g.
PGM_mddataset_get_data(PGM_Handle* handle, PGM_MDDataset const* mddataset);
PGM_mddataset_get_flat_element(PGM_Handle* handle, PGM_MDDataset const* mddataset, PGM_Idx flattened_index);
PGM_mddataset_get_scenario(PGM_Handle* handle, PGM_MDDataset const* mddataset, PGM_Idx** multi_index, PGM_Idx n_dimensions);(we can even omit n_dimensions, although it's probably safer to keep it)



This PR experiments an idea of multi-dimensional batch calculation in the C-API without introducing breaking changes. The Python API is also adjusted without breaking changes.