See discussion here. We (devs and users) should never use dplyr on data.tables due to likely every verb violating its memory model + some of them producing corrupt data.tables. E.g., arrange can output a data.table with incorrect metadata. We should make as_epi_archive either:
- Make something valid out of this invalid input (see comment A, B).
- Detect invalid metadata somehow and balk, forcing user to correct their dplyr usage. This might be hard to do or involve peeking into data.table internals. Perhaps we could go with the first approach but also detect specific violations like not actually being appropriately sorted and balk at them. Or just give up on this idea.
The memory model violations mean that we could have input columns clobbering another data.table's "owned" columns. If we want to address these, then the first approach may be just: if x is a data.table, convert to a plain data.frame with as.data.frame (should dupe columns), then setDT to a data.table with the appropriate key. This should also fix the metadata-based issues since it should nuke the data.table metadata.