SilviMetric should switch to Dense TileDB arrays

After receiving some advice on using TileDB, we have decided to switch to Dense Arrays for our TileDB usage. This is being worked on in #114 

### Key points
- `TileDB` `from_pandas` cannot handle flexible ingestion, and requires an entire row for each insertion to the database. This would probably not be feasible for any larger projects, and would require a significant rework of our chunking/tiling. 
    - I made an [issue in TileDB-py ](https://github.com/TileDB-Inc/TileDB-Py/issues/2158) about this, and implemented a solution but have received no response
- `SilviMetric` currently adapts `extract` information based on all of the shatters that have happened in the past, combining any overlapped data and rerunning those cells
- `TleDB Dense Arrays` will only show you the most recent information that was input to a cell
- `Sparse arrays` allowed for duplication, so all cell values are essentially just an array of however many shatter processes touched that cell, and we can rerun easily with the combined point data if necessary
- `Dense arrays` do not allow for duplication, so the only way to get values from separate shatter processes is by iterating through the shatter entries (time travel), combining all of those separate entries into one dataframe , and then doing the same logic as in the sparse arrays

### Conclusions:
- Because `from_pandas` is not flexible when dealing with dense arrays and we've received no response from the TileDB team, we'll be switching back to the previous usage, which unfortunately means having to work around TileDB's array requirements, documented [here](https://forum.tiledb.com/t/weird-behavior-with-variable-length-attributes/508)

### Question still to be answered: 
Should the value of a cell when we do extract be...
    1. The value that was most recently shattered
    2. The value that is the combination of all processes that touched this cell?

Thoughts @bmcgaughey1 and @hobu ?




Provide feedback

Saved searches

Use saved searches to filter your results more quickly

SilviMetric should switch to Dense TileDB arrays #116

Key points

Conclusions:

Question still to be answered:

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

SilviMetric should switch to Dense TileDB arrays #116

Description

Key points

Conclusions:

Question still to be answered:

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions