Skip to content

Commit ac169ed

Browse files
authored
Optimize VDS operations with r-tree (#5843)
Optimize VDS operations using R-tree spatial index, adding new API functions and tests for improved performance. Behavior: Introduces R-tree spatial index for optimizing VDS operations in H5Dvirtual.c. Adds H5Pset_dset_use_spatial_tree() and H5Pget_dset_use_spatial_tree() to control R-tree usage. Default behavior uses R-tree for VDS with more than 1000 mappings. Implementation: Adds H5RT.c, H5RTprivate.h, and H5RTpkg.h for R-tree implementation. Updates H5Pdapl.c and H5Pdcpl.c to include R-tree properties. Modifies H5Dvirtual.c to integrate R-tree in VDS I/O operations. Testing: Adds rtree.c for testing R-tree creation, search, and copy operations. Tests R-tree integration with VDS in test/dsets.c. Verifies R-tree behavior with different dataset access property list settings.
1 parent 20557ae commit ac169ed

File tree

17 files changed

+2034
-305
lines changed

17 files changed

+2034
-305
lines changed

doxygen/examples/tables/propertyLists.dox

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,10 @@ encoding for object names.</td>
616616
<td>#H5Pset_virtual_view/#H5Pget_virtual_view</td>
617617
<td>Sets/gets the view of the virtual dataset (VDS) to include or exclude missing mapped elements.</td>
618618
</tr>
619+
<tr>
620+
<td>#H5Pset_virtual_spatial_tree/#H5Pget_virtual_spatial_tree</td>
621+
<td>Sets/gets the flag to use spatial trees when searching many VDS mappings</td>
622+
</tr>
619623
</table>
620624
//! [dapl_table]
621625
*

fortran/src/H5Pff.F90

Lines changed: 90 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -4365,6 +4365,96 @@ END FUNCTION h5pget_chunk_cache_c
43654365

43664366
END SUBROUTINE h5pget_chunk_cache_f
43674367

4368+
!>
4369+
!! \ingroup FH5P
4370+
!!
4371+
!! \brief Retrieves the flag for whether to use/not use a spatial tree
4372+
!! during mapping operations on a Virtual Dataset. The default value is true.
4373+
!!
4374+
!! Use of a spatial tree will accelerate the process of searching through mappings
4375+
!! to determine which contain intersections with the user's selection region.
4376+
!! With the tree disabled, all mappings will simply be iterated through and
4377+
!! checked directly.
4378+
!!
4379+
!! Certain workflows may find that tree creation overhead outweighs the time saved
4380+
!! on reads. In this case, disabling this property will lead to a performance improvement,
4381+
!! though it is expected that almost all cases will benefit from the tree on net.
4382+
!!
4383+
!! \param dapl_id Target dataset access property list identifier.
4384+
!! \param use_tree Value of the setting.
4385+
!! \param hdferr \fortran_error
4386+
!!
4387+
!! See C API: @ref H5Pget_virtual_spatial_tree()
4388+
!!
4389+
SUBROUTINE h5pget_virtual_spatial_tree_f(dapl_id, use_tree, hdferr)
4390+
IMPLICIT NONE
4391+
INTEGER(HID_T) , INTENT(IN) :: dapl_id
4392+
LOGICAL , INTENT(OUT) :: use_tree
4393+
INTEGER , INTENT(OUT) :: hdferr
4394+
LOGICAL(C_BOOL) :: c_use_tree
4395+
4396+
INTERFACE
4397+
INTEGER(C_INT) FUNCTION H5Pget_virtual_spatial_tree_c(dapl_id, use_tree) &
4398+
BIND(C, NAME='H5Pget_virtual_spatial_tree')
4399+
IMPORT :: C_INT, HID_T, C_BOOL
4400+
IMPLICIT NONE
4401+
INTEGER(HID_T), INTENT(IN), VALUE :: dapl_id
4402+
LOGICAL(C_BOOL), INTENT(OUT) :: use_tree
4403+
END FUNCTION H5Pget_virtual_spatial_tree_c
4404+
END INTERFACE
4405+
4406+
hdferr = INT(H5Pget_virtual_spatial_tree_c(dapl_id, c_use_tree))
4407+
4408+
! Transfer value of C C_BOOL type to Fortran LOGICAL
4409+
use_tree = c_use_tree
4410+
4411+
END SUBROUTINE h5pget_virtual_spatial_tree_f
4412+
4413+
!>
4414+
!! \ingroup FH5P
4415+
!!
4416+
!! \brief Sets the dapl to use/not use a spatial tree
4417+
!! during mapping operations on a Virtual Dataset. The default value is true.
4418+
!!
4419+
!! Use of a spatial tree will accelerate the process of searching through mappings
4420+
!! to determine which contain intersections with the user's selection region.
4421+
!! With the tree disabled, all mappings will simply be iterated through and
4422+
!! checked directly.
4423+
!!
4424+
!! Certain workflows may find that tree creation overhead outweighs the time saved
4425+
!! on reads. In this case, disabling this property will lead to a performance improvement,
4426+
!! though it is expected that almost all cases will benefit from the tree on net.
4427+
!!
4428+
!! \param dapl_id Target dataset access property list identifier.
4429+
!! \param use_tree Value of the setting.
4430+
!! \param hdferr \fortran_error
4431+
!!
4432+
!! See C API: @ref H5Pset_virtual_spatial_tree()
4433+
!!
4434+
SUBROUTINE h5pset_virtual_spatial_tree_f(dapl_id, use_tree, hdferr)
4435+
IMPLICIT NONE
4436+
INTEGER(HID_T) , INTENT(IN) :: dapl_id
4437+
LOGICAL , INTENT(IN) :: use_tree
4438+
INTEGER , INTENT(OUT) :: hdferr
4439+
LOGICAL(C_BOOL) :: c_use_tree
4440+
4441+
INTERFACE
4442+
INTEGER FUNCTION h5pset_virtual_spatial_tree_c(dapl_id, use_tree) &
4443+
BIND(C, NAME='H5Pset_virtual_spatial_tree')
4444+
IMPORT :: HID_T, C_BOOL
4445+
IMPLICIT NONE
4446+
INTEGER(HID_T), INTENT(IN), VALUE :: dapl_id
4447+
LOGICAL(C_BOOL), INTENT(IN), VALUE :: use_tree
4448+
END FUNCTION h5pset_virtual_spatial_tree_c
4449+
END INTERFACE
4450+
4451+
! Transfer value of Fortran LOGICAL to C C_BOOL type
4452+
c_use_tree = use_tree
4453+
4454+
hdferr = INT(h5pset_virtual_spatial_tree_c(dapl_id, c_use_tree))
4455+
4456+
END SUBROUTINE h5pset_virtual_spatial_tree_f
4457+
43684458
#ifdef H5_DOXYGEN
43694459
!>
43704460
!! \ingroup FH5P

fortran/src/hdf5_fortrandll.def.in

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -420,6 +420,8 @@ H5P_mp_H5PGET_VIRTUAL_VSPACE_F
420420
H5P_mp_H5PGET_VIRTUAL_SRCSPACE_F
421421
H5P_mp_H5PGET_VIRTUAL_FILENAME_F
422422
H5P_mp_H5PGET_VIRTUAL_DSETNAME_F
423+
H5P_mp_H5PGET_VIRTUAL_SPATIAL_TREE_F
424+
H5P_mp_H5PSET_VIRTUAL_SPATIAL_TREE_F
423425
H5P_mp_H5PGET_DSET_NO_ATTRS_HINT_F
424426
H5P_mp_H5PSET_DSET_NO_ATTRS_HINT_F
425427
H5P_mp_H5PSET_VOL_F

fortran/test/tH5P.F90

Lines changed: 35 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -777,8 +777,10 @@ SUBROUTINE test_misc_properties(total_error)
777777
INTEGER, INTENT(INOUT) :: total_error
778778

779779
INTEGER(hid_t) :: fapl_id = -1 ! Local fapl
780+
INTEGER(hid_t) :: dapl_id = -1 ! Local dapl
780781
LOGICAL :: use_file_locking ! (H5Pset/get_file_locking_f)
781782
LOGICAL :: ignore_disabled_locks ! (H5Pset/get_file_locking_f)
783+
LOGICAL :: use_spatial_tree ! (H5Pset/get_dset_use_spatial_tree_f)
782784
INTEGER :: error
783785

784786
! Create a default fapl
@@ -826,6 +828,39 @@ SUBROUTINE test_misc_properties(total_error)
826828
CALL H5Pclose_f(fapl_id, error)
827829
CALL check("H5Pclose_f", error, total_error)
828830

831+
! Create a dataset access property list
832+
CALL H5Pcreate_f(H5P_DATASET_ACCESS_F, dapl_id, error)
833+
CALL check("H5Pcreate_f", error, total_error)
834+
835+
! Test H5Pset/get_virtual_spatial_tree_f
836+
! true value
837+
use_spatial_tree = .TRUE.
838+
CALL h5pset_virtual_spatial_tree_f(dapl_id, use_spatial_tree, error)
839+
CALL check("h5pset_virtual_spatial_tree_f", error, total_error)
840+
use_spatial_tree = .FALSE.
841+
CALL h5pget_virtual_spatial_tree_f(dapl_id, use_spatial_tree, error)
842+
CALL check("h5pget_virtual_spatial_tree_f", error, total_error)
843+
if(use_spatial_tree .neqv. .TRUE.) then
844+
total_error = total_error + 1
845+
write(*,*) "Got wrong use_spatial_tree flag from h5pget_virtual_spatial_tree_f"
846+
endif
847+
848+
! false value
849+
use_spatial_tree = .FALSE.
850+
CALL h5pset_virtual_spatial_tree_f(dapl_id, use_spatial_tree, error)
851+
CALL check("h5pset_virtual_spatial_tree_f", error, total_error)
852+
use_spatial_tree = .TRUE.
853+
CALL h5pget_virtual_spatial_tree_f(dapl_id, use_spatial_tree, error)
854+
CALL check("h5pget_virtual_spatial_tree_f", error, total_error)
855+
if(use_spatial_tree .neqv. .FALSE.) then
856+
total_error = total_error + 1
857+
write(*,*) "Got wrong use_spatial_tree flag from h5pget_virtual_spatial_tree_f"
858+
endif
859+
860+
! Close the dapl
861+
CALL H5Pclose_f(dapl_id, error)
862+
CALL check("H5Pclose_f", error, total_error)
863+
829864
END SUBROUTINE test_misc_properties
830865

831866
!-------------------------------------------------------------------------

java/src/hdf/hdf5lib/H5.java

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10675,6 +10675,59 @@ public synchronized static native void H5Pset_virtual_prefix(long dapl_id, Strin
1067510675
public synchronized static native void H5Pset_efile_prefix(long dapl_id, String prefix)
1067610676
throws HDF5LibraryException, NullPointerException;
1067710677

10678+
/**
10679+
* @ingroup JH5P
10680+
*
10681+
* H5Pget_virtual_spatial_tree accesses the flag for whether to use/not use a spatial tree
10682+
* during mapping operations on a Virtual Dataset. The default value is true.
10683+
*
10684+
* Use of a spatial tree will accelerate the process of searching through mappings
10685+
* to determine which contain intersections with the user's selection region.
10686+
* With the tree disabled, all mappings will simply be iterated through and
10687+
* checked directly.
10688+
*
10689+
* Certain workflows may find that tree creation overhead outweighs the time saved
10690+
* on reads. In this case, disabling this property will lead to a performance improvement,
10691+
* though it is expected that almost all cases will benefit from the tree on net.
10692+
*
10693+
* @param dapl_id
10694+
* IN: Dataset access property list
10695+
*
10696+
* @return true if the given dapl is set to use a spatial tree, false if not.
10697+
*
10698+
* @exception HDF5LibraryException
10699+
* Error from the HDF5 Library.
10700+
**/
10701+
public synchronized static native boolean H5Pget_virtual_spatial_tree(long dapl_id)
10702+
throws HDF5LibraryException;
10703+
10704+
/**
10705+
* @ingroup JH5P
10706+
*
10707+
* H5Pset_virtual_spatial_tree sets the dapl to use/not use a spatial tree
10708+
* during mapping operations on a Virtual Dataset. The default value is true.
10709+
*
10710+
* Use of a spatial tree will accelerate the process of searching through mappings
10711+
* to determine which contain intersections with the user's selection region.
10712+
* With the tree disabled, all mappings will simply be iterated through and
10713+
* checked directly.
10714+
*
10715+
* Certain workflows may find that tree creation overhead outweighs the time saved
10716+
* on reads. In this case, disabling this property will lead to a performance improvement,
10717+
* though it is expected that almost all cases will benefit from the tree on net.
10718+
*
10719+
* @param dapl_id
10720+
* IN: Dataset access property list
10721+
*
10722+
* @param use_tree
10723+
* IN: the use_tree flag setting
10724+
*
10725+
* @exception HDF5LibraryException
10726+
* Error from the HDF5 Library.
10727+
**/
10728+
public synchronized static native void H5Pset_virtual_spatial_tree(long dapl_id, boolean use_tree)
10729+
throws HDF5LibraryException;
10730+
1067810731
// public synchronized static native void H5Pset_append_flush(long plist_id, int ndims, long[] boundary,
1067910732
// H5D_append_cb func, H5D_append_t udata) throws HDF5LibraryException;
1068010733

java/src/jni/h5pDAPLImp.c

Lines changed: 45 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -312,6 +312,51 @@ H5D_append_cb(hid_t dataset_id, hsize_t *cur_dims, void *cb_data)
312312
return (herr_t)status;
313313
} /* end H5D_append_cb */
314314

315+
/*
316+
* Class: hdf_hdf5lib_H5
317+
* Method: H5Pset_virtual_spatial_tree
318+
* Signature: (JZ)V
319+
*/
320+
JNIEXPORT void JNICALL
321+
Java_hdf_hdf5lib_H5_H5Pset_1virtual_1spatial_1tree(JNIEnv *env, jclass clss, jlong dapl_id, jboolean use_tree)
322+
{
323+
bool use_tree_val;
324+
herr_t retVal = FAIL;
325+
326+
UNUSED(clss);
327+
328+
use_tree_val = (JNI_TRUE == use_tree) ? true : false;
329+
330+
if ((retVal = H5Pset_virtual_spatial_tree((hid_t)dapl_id, (bool)use_tree_val)) < 0)
331+
H5_LIBRARY_ERROR(ENVONLY);
332+
333+
done:
334+
return;
335+
} /* end Java_hdf_hdf5lib_H5_H5Pset_1virtual_1spatial_1tree */
336+
337+
/*
338+
* Class: hdf_hdf5lib_H5
339+
* Method: H5Pget_virtual_spatial_tree
340+
* Signature: (J)Z
341+
*/
342+
JNIEXPORT jboolean JNICALL
343+
Java_hdf_hdf5lib_H5_H5Pget_1virtual_1spatial_1tree(JNIEnv *env, jclass clss, jlong dapl_id)
344+
{
345+
bool use_tree = false;
346+
jboolean bval = JNI_FALSE;
347+
348+
UNUSED(clss);
349+
350+
if (H5Pget_virtual_spatial_tree((hid_t)dapl_id, (bool *)&use_tree) < 0)
351+
H5_LIBRARY_ERROR(ENVONLY);
352+
353+
if (use_tree == true)
354+
bval = JNI_TRUE;
355+
356+
done:
357+
return bval;
358+
} /* end Java_hdf_hdf5lib_H5_H5Pget_1virtual_1spatial_1tree */
359+
315360
#ifdef __cplusplus
316361
} /* end extern "C" */
317362
#endif /* __cplusplus */

java/src/jni/h5pDAPLImp.h

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -89,6 +89,20 @@ JNIEXPORT void JNICALL Java_hdf_hdf5lib_H5_H5Pset_1virtual_1printf_1gap(JNIEnv *
8989
*/
9090
JNIEXPORT jlong JNICALL Java_hdf_hdf5lib_H5_H5Pget_1virtual_1printf_1gap(JNIEnv *, jclass, jlong);
9191

92+
/*
93+
* Class: hdf_hdf5lib_H5
94+
* Method: H5Pset_virtual_spatial_tree
95+
* Signature: (JZ)V
96+
*/
97+
JNIEXPORT void JNICALL Java_hdf_hdf5lib_H5_H5Pset_1virtual_1spatial_1tree(JNIEnv *, jclass, jlong, jboolean);
98+
99+
/*
100+
* Class: hdf_hdf5lib_H5
101+
* Method: H5Pget_virtual_spatial_tree
102+
* Signature: (J)Z
103+
*/
104+
JNIEXPORT jboolean JNICALL Java_hdf_hdf5lib_H5_H5Pget_1virtual_1spatial_1tree(JNIEnv *, jclass, jlong);
105+
92106
#ifdef __cplusplus
93107
} /* end extern "C" */
94108
#endif /* __cplusplus */

release_docs/CHANGELOG.md

Lines changed: 21 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -25,6 +25,7 @@ For releases prior to version 2.0.0, please see the release.txt file and for mor
2525

2626
## Performance Enhancements:
2727

28+
- Up to [2500% faster](https://github.com/HDFGroup/hdf5/blob/develop/release_docs/CHANGELOG.md#rtree) Virtual Dataset read/write operations
2829
- [30% faster opening](https://github.com/HDFGroup/hdf5/blob/develop/release_docs/CHANGELOG.md#layoutcopydelay) and [25% faster closing](https://github.com/HDFGroup/hdf5/blob/develop/release_docs/CHANGELOG.md#fileformat) of virtual datasets.
2930
- [Reduced memory overhead](https://github.com/HDFGroup/hdf5/blob/develop/release_docs/CHANGELOG.md#fileformat) via shared name strings and optimized spatial search algorithms for virtual datasets.
3031

@@ -461,6 +462,26 @@ Simple example programs showing how to use complex number datatypes have been ad
461462

462463
This layout copy is now delayed until either a user requests the DCPL, or until the start of an operation that needs to read the layout from the DCPL.
463464

465+
### Virtual datasets now use a spatial tree to optimize searches<a name="rtree"></a>
466+
467+
Virtual dataset operations with many (>1,000) mappings were much slower than
468+
corresponding operations on normal datasets. This was due to the need
469+
to iterate through every source dataset's dataspace and check for an intersection
470+
with the user-selected region for a read/write in the virtual dataset.
471+
472+
Virtual datasets with many mappings now use an r-tree (defined in H5RT.c) to
473+
perform a spatial search. This allows the dataspaces that intersect the
474+
user-selection to be computed with, in most cases, much fewer intersection checks,
475+
improving the speed of VDS read/write operations.
476+
477+
Virtual datasets will use the r-tree by default, since the majority of use cases,
478+
should see improvements from use of the tree. However, because some workflows may
479+
find that the overhead of the tree outweighs the time saved on searches, there is
480+
a new Dataset Access Property List (DAPL) property to control use of the spatial tree.
481+
482+
This property can be set or queried with the new API functions
483+
H5Pset_virtual_spatial_tree()/H5Pget_virtual_spatial_tree().
484+
464485
## Parallel Library
465486

466487
### Added H5FDsubfiling_get_file_mapping() API function for subfiling VFD

src/H5Dprivate.h

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -54,6 +54,7 @@
5454
#define H5D_ACS_VDS_PREFIX_NAME "vds_prefix" /* VDS file prefix */
5555
#define H5D_ACS_APPEND_FLUSH_NAME "append_flush" /* Append flush actions */
5656
#define H5D_ACS_EFILE_PREFIX_NAME "external file prefix" /* External file prefix */
57+
#define H5D_ACS_USE_TREE_NAME "tree" /* Whether to use spatial tree */
5758

5859
/* ======== Data transfer properties ======== */
5960
#define H5D_XFER_MAX_TEMP_BUF_NAME "max_temp_buf" /* Maximum temp buffer size */
@@ -124,6 +125,9 @@
124125
/* Default virtual dataset list size */
125126
#define H5D_VIRTUAL_DEF_LIST_SIZE 8
126127

128+
/* Threshold for use of a tree for VDS mappings */
129+
#define H5D_VIRTUAL_TREE_THRESHOLD 50
130+
127131
#ifdef H5D_MODULE
128132
#define H5D_OBJ_ID(D) (((H5D_obj_create_t *)(D))->dcpl_id)
129133
#else /* H5D_MODULE */

0 commit comments

Comments
 (0)