Skip to content

brain-image-library/py-inventory-builder

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

145 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Inventory Script(s)

Diagram

Description

This Python script generates an inventory of files in a specified directory, including file metadata such as extension, file type, creation date, size, checksums, and more. The script can be configured to update existing inventories, compress JSON files, and utilize multi-threading for faster processing.

Prerequisites

  • Python 3
  • Required Python packages can be installed using the following command:
pip install numpy pandas tabulate tqdm compress_json numpyencoder magic pandarallel

Usage

There two main scripts in this repository

update-summary.py

Running the script update-summary.py will generate a daily report about the status of the database. This script also generates the file today.json that is stored in /bil/data.

There is a cron-job that runs this script every night.

manifest-builder.py

python manifest-builder.py -d <directory_path> [-n <number_of_cores>] [--update] [--compress] [--rebuild] [--avoid-checksums] [--multi-threading]

Options

-d, --directory        Specify the target directory for inventory.
-n, --number-of-cores  Number of CPU cores to use for parallel processing.
--update               Update existing inventories.
--compress             Compress JSON files.
--rebuild              Rebuild the inventory, removing existing TSV file.
--avoid-checksums      Skip checksum computation.
--multi-threading      Enable multi-threading for faster processing.

Example

python ./manifest-builder.py -d "/bil/data/91/aa/91aad194ce577ebe/E15.5_BB0610/LSFM/stitched_01" -n 4 --update --compress

Another example

If you want to compute the inventory for the first 100 datasets in the report generated by update-summary.py, then you can run

cat summary_metadata.tsv | grep -v bildirectory | cut -d$'\t' -f9 | grep "/bil/data/" | awk 'NR >= 1 && NR <= 100'  | xargs -n 1 -P 1 -I {} python ./manifest-builder.py -d {} -n 16 --compress --update

Copyright © 2020-2024 Pittsburgh Supercomputing Center. All Rights Reserved.

The Biomedical Applications Group at the Pittsburgh Supercomputing Center in the Mellon College of Science at Carnegie Mellon University.

About

Inventory script that generates a JSON file with file level stats. Replaces manifest scripts.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors