Utilities and scripts used by the Pittsburgh Supercomputing Center (PSC) / Brain Image Library (BIL) to query Spectra Logic StorCycle job status via the StorCycle OpenAPI endpoint and produce:
- A daily TSV report named
YYYYMMDD.tsvwith columns:bildid,backup_idx,state,percentComplete,start,completion,totalFiles,directory - A status TSV report named
status-YYYYMMDD.tsv(subset view) - An Excel workbook
spectrabrainz-report.xlsxwith one sheet per day, sorted and color-formatted, plus a Histogram of states chart sheet - Optional upload of the Excel workbook to Google Drive via
rclone
spectrabrainz.py— Python module that authenticates to StorCycle, fetchesjobStatus, and generates daily TSVs.daily.py— thin wrapper that runsspectrabrainz.daily().upload_to_gdrive.py— builds/updatesspectrabrainz-report.xlsxfromYYYYMMDD.tsvfiles and uploads it viarclone.daily.sh— daily pipeline runner (generate → upload → compressed rsync backups → end-of-month archiving).
- Python 3
requestspandasopenpyxlmatplotlibtqdmpandarallelbrainimagelibraryrclone(only if using the upload step)- Network access to:
https://storcycle.bil.psc.edu/openapi/...
Authentication uses a simple key-value file at:
~/.SPECTRA
Format:
# StorCycle credentials for SpectraBrainz scripts
USERNAME=your_username
PASSWORD=your_password- Token caching — authentication tokens are cached in-memory for 15 minutes to reduce login calls.
- Parallel enrichment — uses
pandarallel(16 workers) to fetchworkingDirectoryfor each dataset concurrently. - Job filtering — system/maintenance jobs are excluded from reports (matches:
Daily-Storcycle-Database-Backup,test,Scan,Daily,Restore). - Latest backup only — for datasets with multiple backup runs, only the most recent backup index is kept.
- State ordering — rows are sorted: Failed → Canceled → Completed → Active.
Public functions:
| Function | Description |
|---|---|
login() |
Request a fresh token from the StorCycle API |
exists(dataset_id) |
Check whether a dataset project exists |
get(dataset_id) |
Retrieve a single dataset object |
get_projects() |
Retrieve all active ScanAndArchive projects |
jobStatus() |
Fetch all job status rows with pagination |
get_status() |
Write status-YYYYMMDD.tsv and return a DataFrame |
create(name, description, directory) |
Create a ScanAndArchive project |
scan(name, description, directory) |
Create a Scan project |
daily() |
Generate (or load) the daily YYYYMMDD.tsv report |
-
Scans the working directory for
YYYYMMDD.tsvfiles and writes one sheet per date intospectrabrainz-report.xlsx. -
Rows are sorted by
completiondate (descending) within each sheet. -
Row color coding:
State Color Completed Green ( #228B22)Failed Red ( #B22222)Canceled Yellow ( #FFD700)Queued/other Gray ( #808080) -
A "Histogram of states" sheet is inserted as the first sheet, containing a stacked bar chart showing state counts per day over time.
-
After formatting, the workbook is uploaded to Google Drive via
rcloneatPSC:Brain_Image_Library/spectrabrainz/.
- Runs
daily.pyto generate today's TSV. - Runs
upload_to_gdrive.pyto update and upload the Excel report. - Compresses each
YYYYMMDD.tsvas a.tar.gzand rsyncs to/bil/users/icaoberg/backups/spectranbrainz/. - Compresses and rsyncs
spectrabrainz-report.xlsxto the same backup location. - On the last day of the month, copies the Excel report as
spectrabrainz-report.YYYYMM.xlsxand removes the original.
Copyright © 2026 Pittsburgh Supercomputing Center. All Rights Reserved.
The Biomedical Applications Group at the Pittsburgh Supercomputing Center in the Mellon College of Science at Carnegie Mellon University.
