This Python script generates an inventory of files in a specified directory, including file metadata such as extension, file type, creation date, size, checksums, and more. The script can be configured to update existing inventories, compress JSON files, and utilize multi-threading for faster processing.
- Python 3
- Required Python packages can be installed using the following command:
pip install numpy pandas tabulate tqdm compress_json numpyencoder magic pandarallelThere two main scripts in this repository
Running the script update-summary.py will generate a daily report about the status of the database. This script also generates the file today.json that is stored in /bil/data.
There is a cron-job that runs this script every night.
python manifest-builder.py -d <directory_path> [-n <number_of_cores>] [--update] [--compress] [--rebuild] [--avoid-checksums] [--multi-threading]
-d, --directory Specify the target directory for inventory.
-n, --number-of-cores Number of CPU cores to use for parallel processing.
--update Update existing inventories.
--compress Compress JSON files.
--rebuild Rebuild the inventory, removing existing TSV file.
--avoid-checksums Skip checksum computation.
--multi-threading Enable multi-threading for faster processing.
python ./manifest-builder.py -d "/bil/data/91/aa/91aad194ce577ebe/E15.5_BB0610/LSFM/stitched_01" -n 4 --update --compress
If you want to compute the inventory for the first 100 datasets in the report generated by update-summary.py, then you can run
cat summary_metadata.tsv | grep -v bildirectory | cut -d$'\t' -f9 | grep "/bil/data/" | awk 'NR >= 1 && NR <= 100' | xargs -n 1 -P 1 -I {} python ./manifest-builder.py -d {} -n 16 --compress --update
Copyright © 2020-2024 Pittsburgh Supercomputing Center. All Rights Reserved.
The Biomedical Applications Group at the Pittsburgh Supercomputing Center in the Mellon College of Science at Carnegie Mellon University.
