2.0.0: Dense arrays, large dataset optimization, attribute/metric filters
Functionality Changes
- SilviMetric is now writing to a Dense TileDB array. Dense arrays allow us to take more advantage of the benefits that TileDB offers without many drawbacks.
- Attribute and Metric Filters. We're now writing Attribute and Metric data with TileDB's
ZstdFilter, with the level set to 7. Variable length arrays will now take advantage of thePositiveOffsetFilter. These changes will provide size reduction for output data. - Storage config now requires a xsize and ysize variable to indicate how big the extents of tiledb tiles should be. This was in response to memory problems from tiledb when it was unspecified.
- Updated info call:
- added a metrics option to the cli
- fixed history output
- removed necessary info from the concise output.
- Updated extract call:
- handle_overlaps speedup
- removed extent indexing, was too slow and could get them in other ways
- adding start_datetime and end_datetime to tiledb attributes being written, in similar fashion to count
- This will allow users to query by start and end time
Behind the Scenes
- SM no longer writes to a specific timestamp for a write, this turns out to be a TileDB anti-pattern. We now write to the current timestamp and write a start and end timestamp attribute for collection dates of data. These attributes can be queried with normal tiledb operations.
- Deletions will now be overwrites. TileDB dense arrays don't support deletion operations, so we'll instead be writing new data at the current timestamp over the old data.
- In order to operate better on larger datasets, SilviMetric will now operate in chunks the size of the TileDB x and y sizes (see note in Functionality Changes about StorageConfig changes). This means there is very little need to consolidate commits to the array, and should increase speed and memory performance.
- updated metrics:
- added nan_value member variable
- added nan_policy member variable
- added logic to handle bad return values and bad dependency values depending on nan_policy
- Updated storage config to adjust a relative path to absolute for tdb dir
- Adjusted aad metrics to use variables that were already created
- Added nan handling to several metrics in which it was possible
What's Changed
- 1.4.1 by @kylemann16 in #123
- Adjustments for large datasets by @kylemann16 in #124
- fixed naming mismatch tdb, tifs by @gannon-guess in #131
- tutorial metric import module fix by @gannon-guess in #133
- Issue 126 by @gannon-guess in #135
- Finish sentence in About document by @danielrode in #137
- protect users from going oob with tile sizes by @kylemann16 in #140
- pin pandas to <3.0.0 by @kylemann16 in #148
- 141 - added offset filter and level by @gannon-guess in #142
- Issue 129 by @gannon-guess in #134
- bumping to 2.0.0 by @kylemann16 in #149
New Contributors
- @gannon-guess made their first contribution in #131 🎉
Full Changelog: 1.4.0...2.0.0