Skip to content

Releases: RADAR-base/radar-output-restructure

radar-output-restructure version 1.0.0

26 May 10:46
b168d12
Compare
Choose a tag to compare

Changes since radar-hdfs-restructure version 0.6.0:

  • Added storage drivers for S3 API input and output
  • Storage is more consistent to configure
  • Changed synchronisation and accounting to be handled with Redis
    • With more consistent locking
    • Faster write times
    • Redis is required.
  • Added integration tests
  • Configurable automatic deletion after a threshold amount of time, to avoid storage filling up.
  • Reprocesses files that were modified after the last processed modification time.
  • Customizable time bin format

radar-hdfs-restructure version 0.6.0

14 Nov 12:42
779d34c
Compare
Choose a tag to compare

Changes since version 0.5.7:

  • A new configuration file has been introduced. This allows for easier and more flexible configuration of the converter.
  • Fixes an issue when using the output generator as a one-time application instead of a service.
  • Converted Java code to Kotlin
  • Refactored package names, including change to org.radarbase
  • Track offsets in a per-topic offsets file in the offsets directory.
  • Added an S3 storage driver.
  • Use per-topic locking, to allow multiple restructure services to run simultaneously.
  • Simplified parallelism.

Upgrade instructions:

  • Write configuration file to match settings used with 0.5.7
  • If needed, move all entries of offsets.csv to their own file in offsets/<topic>.csv. First go to the output directory, then run the bin/migrate-offsets-to-0.6.0.sh script.

radar-hdfs-restructure version 0.5.7

17 Jun 13:02
5fa5b2a
Compare
Choose a tag to compare

Changes since version 0.5.6:

  • Run the restructure script as a service
  • Updated gradle

radar-hdfs-restructure version 0.5.6

20 May 07:12
7b6f49e
Compare
Choose a tag to compare

Changes since version 0.5.5:

  • Corrects snappy decompression (fixes #43)

radar-hfds-restructure version 0.5.5

06 Mar 08:27
cdeffd2
Compare
Choose a tag to compare

Changes since version 0.5.4:

  • Added per-month directory key path factory org.radarcns.hdfs.MonthlyObservationKeyPathFactory
  • Added --exclude-topic option to exclude certain topics from extraction
  • Added ZIP compression
  • Fixed integer overflow in record counts
  • Updated unit testing to JUnit 5
  • Updated Gradle

radar-hdfs-restructure version 0.5.4

23 Jan 14:39
af1d14b
Compare
Choose a tag to compare

Changes since version 0.5.3:

  • Added setting for maximum number of files to process per topic --max-files-per-topic

radar-hdfs-restructure version 0.5.3

23 Oct 07:46
f8e0a5c
Compare
Choose a tag to compare

Changes since version 0.5.2:

  • Parse more types of time fields
  • Update gradle version

radar-hdfs-restructure version 0.5.2

06 Sep 07:09
f9a9877
Compare
Choose a tag to compare

Changes since version 0.5.1:

  • Specify output file user and group
  • Added copyright statements in files added since version 0.5.x
  • Organised imports according to style guide.

radar-hdfs-restructure version 0.5.1

31 Jul 07:28
c9cd842
Compare
Choose a tag to compare

Changes since version 0.5.0:

  • Fixed ETA for large tasks
  • Do not store all bins in memory simultaneously
  • Do not do CSV-cleaning when unnecessary plus test
  • Store hash for classes frequently used as HashMap keys

radar-hdfs-restructure version 0.5.0

26 Jul 14:54
04583dc
Compare
Choose a tag to compare

Use a plugin architecture to specify:

  • path layout: for binning (how many hours for a file) and organisation (project/user/topic/time.csv or topic/project/user/time.csv, project/user/topic.csv, etc.).
  • file format: currently csv or json
  • compression method: currently gzip or none
  • storage driver: currently local, but could be minio or s3.

This makes the module much more extensible for other needs or projects.

Other updates:

  • threaded task model
  • deduplication now does not change ordering, and does not use another temporary file
  • files are now atomically moved from staging directory if possible
  • bins and offsets are written from separate thread, using single Accountant class
  • settings and factories are propagated through the application with the FileStoreFactory.