Releases: RADAR-base/radar-output-restructure
Releases · RADAR-base/radar-output-restructure
radar-output-restructure version 1.0.0
Changes since radar-hdfs-restructure version 0.6.0:
- Added storage drivers for S3 API input and output
- Storage is more consistent to configure
- Changed synchronisation and accounting to be handled with Redis
- With more consistent locking
- Faster write times
- Redis is required.
- Added integration tests
- Configurable automatic deletion after a threshold amount of time, to avoid storage filling up.
- Reprocesses files that were modified after the last processed modification time.
- Customizable time bin format
radar-hdfs-restructure version 0.6.0
Changes since version 0.5.7:
- A new configuration file has been introduced. This allows for easier and more flexible configuration of the converter.
- Fixes an issue when using the output generator as a one-time application instead of a service.
- Converted Java code to Kotlin
- Refactored package names, including change to
org.radarbase
- Track offsets in a per-topic offsets file in the offsets directory.
- Added an S3 storage driver.
- Use per-topic locking, to allow multiple restructure services to run simultaneously.
- Simplified parallelism.
Upgrade instructions:
- Write configuration file to match settings used with 0.5.7
- If needed, move all entries of
offsets.csv
to their own file inoffsets/<topic>.csv
. First go to the output directory, then run thebin/migrate-offsets-to-0.6.0.sh
script.
radar-hdfs-restructure version 0.5.7
Changes since version 0.5.6:
- Run the restructure script as a service
- Updated gradle
radar-hdfs-restructure version 0.5.6
Changes since version 0.5.5:
- Corrects snappy decompression (fixes #43)
radar-hfds-restructure version 0.5.5
Changes since version 0.5.4:
- Added per-month directory key path factory
org.radarcns.hdfs.MonthlyObservationKeyPathFactory
- Added
--exclude-topic
option to exclude certain topics from extraction - Added ZIP compression
- Fixed integer overflow in record counts
- Updated unit testing to JUnit 5
- Updated Gradle
radar-hdfs-restructure version 0.5.4
Changes since version 0.5.3:
- Added setting for maximum number of files to process per topic
--max-files-per-topic
radar-hdfs-restructure version 0.5.3
Changes since version 0.5.2:
- Parse more types of time fields
- Update gradle version
radar-hdfs-restructure version 0.5.2
Changes since version 0.5.1:
- Specify output file user and group
- Added copyright statements in files added since version 0.5.x
- Organised imports according to style guide.
radar-hdfs-restructure version 0.5.1
Changes since version 0.5.0:
- Fixed ETA for large tasks
- Do not store all bins in memory simultaneously
- Do not do CSV-cleaning when unnecessary plus test
- Store hash for classes frequently used as HashMap keys
radar-hdfs-restructure version 0.5.0
Use a plugin architecture to specify:
- path layout: for binning (how many hours for a file) and organisation (project/user/topic/time.csv or topic/project/user/time.csv, project/user/topic.csv, etc.).
- file format: currently csv or json
- compression method: currently gzip or none
- storage driver: currently local, but could be minio or s3.
This makes the module much more extensible for other needs or projects.
Other updates:
- threaded task model
- deduplication now does not change ordering, and does not use another temporary file
- files are now atomically moved from staging directory if possible
- bins and offsets are written from separate thread, using single
Accountant
class - settings and factories are propagated through the application with the
FileStoreFactory
.