Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
26 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/cloud/04_dataproc/02_data_management.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ HDFS stands for Hadoop Distributed File System. HDFS is a highly fault-tolerant

### File Permissions and Access Control Lists

You can share files with others using [access control lists (ACLs)](../../hpc/03_storage/09_sharing_data_on_hpc.md). An ACL gives you per-file, per-directory and per-user control over who has permission to access files. You can see the ACL for a file or directory with the getfacl command:
You can share files with others using [access control lists (ACLs)](../../hpc/03_storage/08_sharing_data_on_hpc.md). An ACL gives you per-file, per-directory and per-user control over who has permission to access files. You can see the ACL for a file or directory with the getfacl command:
```sh
hdfs dfs -getfacl /user/<net_id>_nyu_edu/testdir
```
Expand Down
134 changes: 36 additions & 98 deletions docs/hpc/03_storage/01_intro_and_data_management.mdx

Large diffs are not rendered by default.

39 changes: 0 additions & 39 deletions docs/hpc/03_storage/02_available_storage_systems.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,14 +1,14 @@
# Data Transfers

:::tip Globus
Globus is the recommended tool to use for large-volume data transfers due to the efficiency, reliability, security and ease of use. Use other tools only if you really need to. Detailed instructions available at [Globus](./04_globus.md)
Globus is the recommended tool to use for large-volume data transfers due to the efficiency, reliability, security and ease of use. Use other tools only if you really need to. Detailed instructions available at [Globus](./03_globus.md)
:::

## Data-Transfer nodes
Attached to the NYU HPC cluster Torch, the Torch Data Transfer Node (gDTN) are nodes optimized for transferring data between cluster file systems (e.g. scratch) and other endpoints outside the NYU HPC clusters, including user laptops and desktops. The gDTNs have 100-Gb/s Ethernet connections to the High Speed Research Network (HSRN) and are connected to the HDR Infiniband fabric of the HPC clusters. More information on the hardware characteristics is available at [Torch spec sheet](../10_spec_sheet.md).

### Data Transfer Node Access
The HPC cluster filesystems include `/home`, `/scratch`, `/archive` and the [HPC Research Project Space](./05_research_project_space.mdx) are available on the gDTN. The Data-Transfer Node (DTN) can be accessed in a variety of ways
The HPC cluster filesystems include `/home`, `/scratch`, `/archive` and the [HPC Research Project Space](./04_research_project_space.mdx) are available on the gDTN. The Data-Transfer Node (DTN) can be accessed in a variety of ways
- From NYU-net and the High Speed Research Network: use SSH to the DTN hostname `dtn011.hpc.nyu.edu` or `dtn012.hpc.nyu.edu`

:::info
Expand Down Expand Up @@ -42,12 +42,12 @@ where username would be your user name, project1 a directory to be copied to the

### Windows Tools
#### File Transfer Clients
Windows 10 machines may have the Linux Subsystem installed, which will allow for the use of Linux tools, as listed above, but generally it is recommended to use a client such as [WinSCP](https://winscp.net/eng/docs/tunneling) or [FileZilla](https://filezilla-project.org/) to transfer data. Additionally, Windows users may also take advantage of [Globus](./04_globus.md) to transfer files.
Windows 10 machines may have the Linux Subsystem installed, which will allow for the use of Linux tools, as listed above, but generally it is recommended to use a client such as [WinSCP](https://winscp.net/eng/docs/tunneling) or [FileZilla](https://filezilla-project.org/) to transfer data. Additionally, Windows users may also take advantage of [Globus](./03_globus.md) to transfer files.

### Globus
Globus is the recommended tool to use for large-volume data transfers. It features automatic performance tuning and automatic retries in cases of file-transfer failures. Data-transfer tasks can be submitted via a web portal. The Globus service will take care of the rest, to make sure files are copied efficiently, reliably, and securely. Globus is also a tool for you to share data with collaborators, for whom you only need to provide the email addresses.

The Globus endpoint for Torch is available at `nyu#torch`. Detailed instructions available at [Globus](./04_globus.md)
The Globus endpoint for Torch is available at `nyu#torch`. Detailed instructions available at [Globus](./03_globus.md)

### rclone
rclone - rsync for cloud storage, is a command line program to sync files and directories to and from cloud storage systems such as Google Drive, Amazon Drive, S3, B2 etc. rclone is available on DTNs. [Please see the documentation for how to use it.](https://rclone.org/)
Expand Down
File renamed without changes.
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
# Research Project Space (RPS)

## Description
Research Project Space (RPS) volumes provide working space for sharing data and code amongst project or lab members. RPS directories are built on the same parallel file system (GPFS) like HPC Scratch. They are mounted on the cluster Compute Nodes, and thus they can be accessed by running jobs. RPS directories are backed up and there is no old file purging policy. These features of RPS simplify the management of data in the HPC environment as users of the HPC Cluster can store their data and code on RPS directories and they do not need to move data between the HPC Scratch and the HPC Archive file systems.
Research Project Space (RPS) volumes provide working space for sharing data and code amongst project or lab members. RPS directories are built on the same parallel file system (VAST) like HPC Scratch. They are mounted on the cluster Compute Nodes, and thus they can be accessed by running jobs. RPS directories are backed up and there is no old file purging policy. These features of RPS simplify the management of data in the HPC environment as users of the HPC Cluster can store their data and code on RPS directories and they do not need to move data between the HPC Scratch and the HPC Archive file systems.

:::note
- Due to limitations of the underlying parallel file system, ***the total number of RPS volumes that can be created is limited***.
:::info
- Due to limitations of the underlying parallel file system, the total number of RPS volumes that can be created is limited.
- There is an annual cost associated with RPS.
- The disk space and inode usage in RPS directories do not count towards quota limits in other HPC file systems (Home, Scratch, and Archive).
:::
Expand Down
66 changes: 66 additions & 0 deletions docs/hpc/03_storage/05_best_practices.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# Best Practices on HPC Storage
## User Quota Limits and the myquota command
All users have quote limits set on HPC fie systems. There are several types of quota limits, such as limits on the amount of disk space (disk quota), number of files (inode quota) etc. The default user quota limits on HPC file systems are listed [on our Data Management page](./01_intro_and_data_management.mdx#hpc-storage-mounts-comparison-table).

:::warning[Home directory inode quotas]
_One of the common issues users report is running out of inodes in their home directory._ This usually occurs during software installation, for example installing conda environment under their home directory. Running out of quota causes a variety of issues such as running user jobs being interrupted or users being unable to finish the installation of packages under their home directory.
:::

Users can check their current utilization of quota using the myquota command. The myquota command provides a report of the current quota limits on mounted file systems, the user's quota utilization, as well as the percentage of quota utilization.

In the following example the user who executes the `myquota` command is out of inodes in their home directory. The user inode quota limit on the `/home` file system **30.0K inodes** and the user has **33000 inodes**, thus **110%** of the inode quota limit.
```sh
$ myquota
Quota Information for NetID
Hostname: torch-login-2 at 2025-12-09 17:18:24

Filesystem Environment Backed up? Allocation Current Usage
Space Variable /Flushed? Space / Files Space(%) / Files(%)

/home $HOME YES/NO 0.05TB/0.03M 0.0TB(0.0%)/54(0%)
/scratch $SCRATCH NO/YES 5.0TB/5.0M 0.0TB(0.0%)/1(0%)
/archive $ARCHIVE YES/NO 2.0TB/0.02M 0.0TB(0.0%)/1(0%)
```
You can use the following command to print the list of files within each sub-folder for a given directory:
```sh
$cd $HOME
$du --inodes -h --max-depth=1
6 ./.ssh
88 ./.config
2 ./.vnc
2 ./.aws
3 ./.lmod.d
5.3K ./.local
3 ./.dbus
408 ./ondemand
2 ./.virtual_documents
6 ./.nv
6.7K ./.pixi
33 ./workshop_scripts
5 ./.cupy
6 ./.gnupg
1 ./.emacs.d
194 ./.nextflow
6 ./.terminfo
2 ./.conda
2 ./.singularity
3 ./.vast-dev
1 ./custom
185 ./genai-workshop
6 ./.atuin
1 ./.apptainer
9 ./.subversion
4 ./packages
1.4K ./.cache
15K .
```

## Large number of small files
In case your dataset or workflow requires to use large number of small files, this can create a bottleneck due to read/write rates. Please refer to [our page on working with a large number of files](./06_large_number_of_small_files.md) to learn about some of the options we recommend to consider.

## Installing Python packages
:::warning
Your home directory is limited to a relatively small number of inodes (30,000). Creating conda/python environments in you home directory, this can eat easily exhaust your inode quota.
:::

Please review the [Package Management section](../06_tools_and_software/01_intro.md#package-management-for-r-python--julia-and-conda-in-general) of the [Torch Software Page](../06_tools_and_software/01_intro.md).
51 changes: 0 additions & 51 deletions docs/hpc/03_storage/06_best_practices.md

This file was deleted.

Original file line number Diff line number Diff line change
@@ -1,7 +1,11 @@
# Transferring Cloud Storage Data with rclone

:::tip Globus
Globus is the recommended tool to use for large-volume data transfers due to the efficiency, reliability, security and ease of use. Use other tools only if you really need to. Detailed instructions available at [Globus](./03_globus.md)
:::

## Transferring files to and from Google Drive with RCLONE
Having access to Google Drive from the HPC environment provides an option to archive data and even share data with collaborators who have no access to the NYU HPC environment. Other options to archiving data include the HPC Archive file system and using [Globus](./04_globus.md) to share data with collaborators.
Having access to Google Drive from the HPC environment provides an option to archive data and even share data with collaborators who have no access to the NYU HPC environment. Other options to archiving data include the HPC Archive file system and using [Globus](./03_globus.md) to share data with collaborators.

Access to Google Drive is provided by [rclone](https://rclone.org/drive/) - rsync for cloud storage - a command line program to sync files and directories to and from cloud storage systems such as Google Drive, Amazon Drive, S3, B2 etc. [rclone](https://rclone.org/drive/) is available on Torch cluster as a module, the module versions currently available (March 2025) are:
- **rclone/1.68.2**
Expand Down Expand Up @@ -344,7 +348,7 @@ Please enter 'q' and we're done with configuration.

### Step 4: Transfer
:::warning
Please be sure to perform data transfers on a data transfer node (DTN). It can degrade performance for other users to perform transfers on other types of nodes. For more information please see [Data Transfers](./03_data_transfers.md)
Please be sure to perform data transfers on a data transfer node (DTN). It can degrade performance for other users to perform transfers on other types of nodes. For more information please see [Data Transfers](./02_data_transfers.md)
:::

Sample commands:
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ Thus you can consider the following options:
- Reinstall your packages if some of the files get deleted
- You can do this manually
- You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/)
- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/05_research_project_space.mdx)
- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/04_research_project_space.mdx)
:::
</details>

Expand Down
2 changes: 1 addition & 1 deletion docs/hpc/06_tools_and_software/05_r_packages_with_renv.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ Thus you can consider the following options:
- Reinstall your packages if some of the files get deleted
- You can do this manually
- You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/)
- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/05_research_project_space.mdx)
- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/04_research_project_space.mdx)
- Use Singularity and install packages within a corresponding overlay file - Details available at [Squash File System and Singularity](../07_containers/04_squash_file_system_and_singularity.md)
:::
</details>
Expand Down
2 changes: 1 addition & 1 deletion docs/hpc/06_tools_and_software/06_conda_environments.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,7 @@ Thus you can consider the following options:
- Reinstall your packages if some of the files get deleted
- You can do this manually
- You can do this automatically. For example, within a workflow of a pipeline software like [Nextflow](https://www.nextflow.io/)
- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/05_research_project_space.mdx)
- Pay for "Research Project Space" - for details see [Research Project Space](../03_storage/04_research_project_space.mdx)
- Use Singularity and install packages within a corresponding overlay file - Details available at [Squash File System and Singularity](../07_containers/04_squash_file_system_and_singularity.md)
:::
</details>
Expand Down
3 changes: 1 addition & 2 deletions docs/hpc/12_tutorial_intro_shell_hpc/03_moving_looking.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ The NYU HPC clusters have multiple file systems for user’s files. Each file sy
| /home | $HOME | Program development space; storing small files you want to keep long term, e.g. source code, scripts. | NO | 20 GB |
| /scratch | $SCRATCH | Computational workspace. Best suited to large, infrequent reads and writes. | YES. Files not accessed for 60 days are deleted. | 5 TB |
| /archive | $ARCHIVE | Long-term storage | NO | 2 TB |
| /vast | $VAST | Flash memory for high I/O workflows | YES. Files not accessed for 60 days are deleted. | 2 TB |

Please see [HPC Storage](../03_storage/01_intro_and_data_management.mdx) for more details.

Expand Down Expand Up @@ -374,4 +373,4 @@ The directories are listed alphabetical at each level, the files/directories in
- To view files, use `ls`.
- You can view help for a command with `man command` or `command --help`.
- Hit `tab` to autocomplete whatever you’re currently typing.
:::
:::
Loading
Loading