radar-hdfs-restructure version 0.5.0
Use a plugin architecture to specify:
- path layout: for binning (how many hours for a file) and organisation (project/user/topic/time.csv or topic/project/user/time.csv, project/user/topic.csv, etc.).
- file format: currently csv or json
- compression method: currently gzip or none
- storage driver: currently local, but could be minio or s3.
This makes the module much more extensible for other needs or projects.
Other updates:
- threaded task model
- deduplication now does not change ordering, and does not use another temporary file
- files are now atomically moved from staging directory if possible
- bins and offsets are written from separate thread, using single
Accountant
class - settings and factories are propagated through the application with the
FileStoreFactory
.