This repository contains tools and scripts for managing data transfer processes related to Vizgen projects. It is designed to streamline and automate the movement of data between systems.
From Windows:
- Default Windows Python version (Tested on Python 3.9.13)
From Linux (for testing/debugging):
- Python 3.11 or higher
-
Clone the repository:
git clone https://github.com/EI-CoreBioinformatics/vizgen_data_transfer.git cd vizgen_data_transfer
-
Build and Install using Poetry (From Linux):
version=0.1.0 poetry build pip install --prefix=/path/to/vizgen_data_transfer/${version}/x86_64 -U dist/*whl
-
Set up the environment (From Linux):
export PATH=/path/to/vizgen_data_transfer/${version}/x86_64/bin:$PATH export PYTHONPATH=/path/to/vizgen_data_transfer/${version}/x86_64/lib/python3*/site-packages:$PYTHONPATH
-
Run the script:
From Windows:
Copy the script vizgen_data_transfer.py toWindows F: Drive, for example.$ cd F: $ python .\vizgen_data_transfer.py --help usage: vizgen_data_transfer.py [-h] [--copy_type COPY_TYPE [COPY_TYPE ...]] [--threads THREADS] [--disk] [--debug] run_id Script for Vizgen data transfer positional arguments: run_id Provide run name, for example: 202310261058_VZGEN1_VMSC10202 options: -h, --help show this help message and exit --copy_type COPY_TYPE [COPY_TYPE ...] Provide copy type, for example: raw_data, analysis, output (default: ['raw_data', 'analysis', 'output']) --threads THREADS Number of threads to use for copying (default: 8) --disk Enable this option if run has to be copied from the Windows external Hard disk 'G:\Vizgen data Z drive' instead of the default Z: Drive on the analysis machine [default:False] --debug Enable this option for debugging [default:False] Contact: Gemy George Kaithakottil ([email protected])
From Linux (for testing/debugging):
$ vizgen_data_transfer --help usage: vizgen_data_transfer [-h] [--copy_type COPY_TYPE [COPY_TYPE ...]] [--threads THREADS] [--disk] [--vizgen_config VIZGEN_CONFIG] [--debug] run_id Script for Vizgen data transfer positional arguments: run_id Provide run name, for example: 202310261058_VZGEN1_VMSC10202 options: -h, --help show this help message and exit --copy_type COPY_TYPE [COPY_TYPE ...] Provide copy type, for example: raw_data, analysis, output (default: ['raw_data', 'analysis', 'output']) --threads THREADS Number of threads to use for copying (default: 8) --disk Enable this option if run has to be copied from the Windows external Hard disk 'G:\Vizgen data Z drive' instead of the default Z: Drive on the analysis machine [default:False] --vizgen_config VIZGEN_CONFIG Path to vizgen config file [default:/path/to/vizgen_data_transfer/dev/x86_64/lib/python3*/site-packages/vizgen_data_transfer/etc/.vizgen_config.toml] --debug Enable this option for debugging [default:False] Contact: Gemy George Kaithakottil ([email protected])
From Windows (Windows PowerShell (x86)):
Additional details are under the section 'Vizgen Data Transfer Script'
- Configure the settings in
F:\.vizgen_config.json. Use template vizgen_config.json - Run the main script:
cd F: python .\vizgen_data_transfer.py 202310261058_VZGEN1_VMSC10202
From Linux (for testing/debugging):
- Configure the settings in
.vizgen_config.toml. Use template vizgen_config.toml - Run the main script:
vizgen_data_transfer --vizgen_config /path/to/.vizgen_config.toml 202310261058_VZGEN1_VMSC10202
Vizgen Windows PC
- The Vizgen Windows PC runs experiments and stores raw data on the
Windows D: Drive. After completing an experiment, users initiate the analysis process, which automatically copies raw data to theAnalysis PC. TheAnalysis PCcarries out the analysis, generating analysis and output files. - Use the
vizgen_data_transfer.pyscript with the JSON configuration file on this Windows system.
Analysis PC
- The
Analysis PCruns the analysis and stores raw data, analysis, and output files on theWindows Z: Drive.
Isilon Storage
- Isilon storage is a post-transfer destination for the
Analysis PCdata and is located on theWindows F: Drive.
External Hard Disk
- The
External Hard Diskis a temporary storage location that currently holds data from theAnalysis PC. This drive is located on theWindows G: Drive.
Note:
The CLI tool vizgen_data_transfer using the TOML configuration - vizgen_config.toml, has only been tested on Linux systems. For Windows, please use vizgen_data_transfer.py with the JSON configuration file - vizgen_config.json.
Below are the steps to transfer Vizgen run data from the Analysis PC to the Isilon storage.
-
Open
Windows PowerShell (x86)from the Vizgen Windows PC Start option. -
Type in the following commands
Change to F: Drive
cd F:Initiate the transfer
python .\vizgen_data_transfer.py RUN_FOLDERReplace
RUN_FOLDERwith the full run name, for example,202310261058_VZGEN1_VMSC10202Example command:
python .\vizgen_data_transfer.py 202310261058_VZGEN1_VMSC10202 -
I have also added an option (
--disk) whereby we can transfer Analysis data from the external hard disk if plugged into the Vizgen Windows PC. This option will copy the RUN_FOLDER from the external hard disk instead of the Analysis PC.An example command is below:
python .\vizgen_data_transfer.py --disk 202310261058_VZGEN1_VMSC10202 -
Once the data transfer is complete, the Python script will notify users via email (based on the list in the configuration file).
The Python script vizgen_data_transfer.py is designed to copy data from the Analysis PC (or from the external hard disk) and write to the Isilon Storage
For example, when you execute the following command
python .\vizgen_data_transfer.py 202310261058_VZGEN1_VMSC10202The script performs the following data transfer:
Raw data:
Z:\merfish_raw_data\202310261058_VZGEN1_VMSC10202 to F:\202310261058_VZGEN1_VMSC10202\raw_dataAnalysis:
Z:\merfish_analysis\202310261058_VZGEN1_VMSC10202 to F:\202310261058_VZGEN1_VMSC10202\analysisOutput:
Z:\merfish_output\202310261058_VZGEN1_VMSC10202 to F:\202310261058_VZGEN1_VMSC10202\outputWhen specifying the --disk option, the script copies the data from the external hard disk (if connected) to the Isilon Storage
python .\vizgen_data_transfer.py --disk 202310261058_VZGEN1_VMSC10202Raw data:
G:\Vizgen data Z drive\merfish_raw_data\RUN_FOLDER to F:\RUN_FOLDER\raw_dataAnalysis:
G:\Vizgen data Z drive\merfish_analysis\RUN_FOLDER to F:\RUN_FOLDER\analysisOutput:
G:\Vizgen data Z drive\merfish_output\RUN_FOLDER to F:\RUN_FOLDER\outputNote:
The scripts use 8 CPUs as the default configuration for data transfer. If you need to increase the number of CPUs, for example, to 10 CPUs, you can use the option --threads 10. While I have not done specific tests to measure the impact of increasing the CPUs, this option allows you to adjust the CPU count if required.
For example:
python .\vizgen_data_transfer.py --threads 10 RUN_FOLDERIf we are in a situation where the run data are in two different locations, i.e.,
- raw_data and analysis folders located on the Analysis PC, and
- output folder located on an External Hard Disk, then you would need to execute the transfer command like below:
First, transfer the raw_data and analysis folders from the Analysis PC to the Isilon Storage
python .\vizgen_data_transfer.py RUN_FOLDER
--copy_type raw_data analysisOnce the above command completes, transfer the output folder from the External Hard Disk to the Isilon Storage
python .\vizgen_data_transfer.py RUN_FOLDER
--copy_type output
--diskAt the end of the transfer, we will have the below folder structure on the Isilon Storage.
F:RUN_FOLDER/
├── raw_data - Transferred from Analysis PC
├── analysis - Transferred from Analysis PC
└── output - Transferred from External Hard DiskContributions are welcome! Please fork the repository and submit a pull request.
This project is licensed under the GNU General Public License. See the LICENSE file for details.
For questions or support, please contact [[email protected]] or [[email protected]].