Skip to content

SilviMetric should switch to Dense TileDB arrays #116

@kylemann16

Description

@kylemann16

After receiving some advice on using TileDB, we have decided to switch to Dense Arrays for our TileDB usage. This is being worked on in #114

Key points

  • TileDB from_pandas cannot handle flexible ingestion, and requires an entire row for each insertion to the database. This would probably not be feasible for any larger projects, and would require a significant rework of our chunking/tiling.
    • I made an issue in TileDB-py about this, and implemented a solution but have received no response
  • SilviMetric currently adapts extract information based on all of the shatters that have happened in the past, combining any overlapped data and rerunning those cells
  • TleDB Dense Arrays will only show you the most recent information that was input to a cell
  • Sparse arrays allowed for duplication, so all cell values are essentially just an array of however many shatter processes touched that cell, and we can rerun easily with the combined point data if necessary
  • Dense arrays do not allow for duplication, so the only way to get values from separate shatter processes is by iterating through the shatter entries (time travel), combining all of those separate entries into one dataframe , and then doing the same logic as in the sparse arrays

Conclusions:

  • Because from_pandas is not flexible when dealing with dense arrays and we've received no response from the TileDB team, we'll be switching back to the previous usage, which unfortunately means having to work around TileDB's array requirements, documented here

Question still to be answered:

Should the value of a cell when we do extract be...
1. The value that was most recently shattered
2. The value that is the combination of all processes that touched this cell?

Thoughts @bmcgaughey1 and @hobu ?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions