...
It should be noted that this example assumes the use of only one GPU per task and requests an equal amount of memory and CPU resources based on the total resources of each node. The amount of CPU and RAM memory utilized can be increased or decreased based on the user's experience with their system.
...
Requesting access
To request access to the clusters please use the following form:
https://www.crc.rice.edu/app/rice_signup.php
Slurm configuration
To obtain information about the number of nodes, number of CPUS, memory and number of GPUs in each cluster use the following command:
sinfo -o "%N %c %m %f %G " -p your_partition
NOTS (commons)
This partition includes 16 volta GPU nodes, each equipped with 80 CPUs and 182GB of RAM. In addition, each node includes 2 NVIDIA GPUs.
...
Code Block | ||
---|---|---|
| ||
#SBATCH --account=ctbp-common #SBATCH --partition=ctbp-common #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem=64G #SBATCH --gres=gpu:1 ml gomkl/2021a OpenMM/7.7.0-CUDA-11.4.2 |
NOTS (ctbp-onuchic)
This partition includes one GPU node, equipped with an AMD EPYC chip featuring 16 CPUs and 512GB of RAM. In addition, each node includes 8 NVIDIA A40 GPUs with 48GB of memory.
Code Block | ||
---|---|---|
| ||
#SBATCH --account=ctbp-onuchic #SBATCH --partition=ctbp-onuchic #SBATCH --nodes=1 #SBATCH --ntasks=1 #SBATCH --cpus-per-task=2 #SBATCH --mem=64G #SBATCH --gres=gpu:1 ml gomkl/2021a OpenMM/7.7.0-CUDA-11.4.2 |
OpenMM on NOTS
You can deploy and run you own version of OpenMM via conda environment. For that, first install the OpenMM inside a conda environment requesting the modules already installed on NOTS. Note that in order to run with Nvidia GPUs, it has to be complicated with CUDA/<version>.
Code Block | ||||
---|---|---|---|---|
| ||||
# Load conda and gpu modules module load Anaconda3/2022.05 CUDA/11.4.2 # Create the openmm environment conda create --prefix $HOME/openmm # Activate the new env. source /opt/apps/software/Anaconda3/2022.05/bin/activate conda activate $HOME/openmm # Then install OpenMM. You can also follow by installing your favorite MD wrapper conda install -c conda-forge openmm cudatoolkit=11.4.2 h5py openmichrom opensmog |
This would be an example of a running slurm script.
Code Block | ||||
---|---|---|---|---|
| ||||
#!/bin/bash -l
#SBATCH --account=ctbp-common
#SBATCH --partition=ctbp-common
#SBATCH --job-name=Template-OPENMM
#SBATCH --ntasks=1
#SBATCH --threads-per-core=1
#SBATCH --cpus-per-task=1
#SBATCH --mem-per-cpu=2G
#SBATCH --gres=gpu:1
#SBATCH --time=00:05:00
#SBATCH --export=ALL
module purge
module load Anaconda3/2022.05 CUDA/11.4.2
source /opt/apps/software/Anaconda3/2022.05/bin/activate
conda activate $HOME/openmm
python your_script.py |
ARIES
This partition includes 19 22 GPU nodes and 2 High Memory CPU nodes:
- 19 x MI50 Nodes (gn01-gn19): 1x AMD EPYC 7642 processor (96 CPUs), 512GB RAM, 2TB storage, HDR Infiniband, 8x AMD Radeon Instinct MI50 32GB GPUs.
- 3x MI100 Nodes (gn20-gn22): 2x AMD EPYC 7V13 processors (128 CPUs), 512GB RAM, 2TB storage, HDR Infiniband, 8x AMD Radeon Instinct MI100 32GB GPUs
- 2x Large Memory Nodes (hm01-02): 2x AMD EPYC 7302 processors (64 CPUs), 4TB RAM, 4TB storage, HDR Infiniband.
To submit a job to GPU , each equipped with an AMD EPYC chip featuring 48 CPUs and 512GB of RAM. In addition, each node includes 8 AMD MI50 GPUs with 32 GB of memory each. To submit a job to this queue, it is necessary to launch 8 processes in parallel, each with a similar runtime to minimize waiting time. This ensures that all of the GPUs are used efficiently.
Code Block | ||
---|---|---|
| ||
#SBATCH --account=commons
#SBATCH --partition=commons
#SBATCH --nodes=1
#SBATCH --ntasks=1
#SBATCH --export=ALL
#SBATCH --time=06:00:00
#SBATCH --gres=gpu:8
module load foss/2020b OpenMM |
...
Code Block | ||
---|---|---|
| ||
#SBATCH --account=commons #SBATCH --partition=commons #SBATCH --ntasks=8 #SBATCH --cpus-per-task=6 #SBATCH --threads-per-core=1 #SBATCH --mem-per-cpu=3G #SBATCH --gres=gpu:8 #SBATCH --time=2406:00:00 #SBATCH --export=ALL module load foss/2020b OpenMM |
PODS
This partition includes 80 GPU nodes, each equipped with an AMD EPYC chip featuring 48 CPUs and 512GB of RAM. In addition, each node includes 8 AMD MI50 GPUs with 32 GB of memory each.
...
language | bash |
---|
...
Checking usage
In order to determine if your process is running correctly, in each cluster you can connect directly to each compute server while you are running the file with ssh. Then use the command top to check the CPU and memory usage, rocm-smi to check the GPU usage for AMD/RADEON GPUs and nvidia-smi to check the GPU usage for NVIDIA GPUs,
Remote access to the clusters
...
Code Block | ||
---|---|---|
| ||
ssh-keygen -t rsa |
Save the key in the default key and leave the passphrase empty. This will generate a pair of public and private keys, with the default file names id_rsa and id_rsa.pub. Don't share or expose your private key.
...
To make it easier to connect to the remote machine in the future, you can create an or edit your ssh config file . This in ~/.ssh/config. This file allows you to specify connection settings and aliases for different remote machines. To create an ssh config file, open the ~/.ssh/config file in a text editor and enter the following information, replacing user_id with your username on the remote machine:
Code Block |
---|
Host crc User user_id HostName gw.crc.rice.edu IdentityFile ~/.ssh/id_rsa Host aries User user_id HostName aries.rice.edu ProxyJump crc Port 22 IdentityFile ~/.ssh/id_rsa Host nots User user_id HostName nots.rice.edu ProxyJump crc Port 22 IdentityFile ~/.ssh/id_rsa |
Save the file as ~/.ssh/config.
Connect Test the connection from you local machine to the remote machine using the alias. To . The gateway will be accessible without a password. You should be able to connect to the remote machine using the alias, gateway enter the following command in your terminal:
Code Block | ||
---|---|---|
| ||
ssh crc |
Exit to your local machine with Ctrl+D
Copy the keys to the compute servers
To add the keys to the compute servers add the keys from your local machine to ~/.ssh/authorized_keys in the compute machine. For that in your local machine get the public key by executing the following command:
Code Block | ||
---|---|---|
| ||
cat ~/.ssh/id_rsa.pub |
Connect from your local machine to the compute servers This will connect you to the remote machine using the settings and alias specified in the ssh config file . The gateway will be accessible without a password. Exit and try connecting to the compute serverswith the following command:
Code Block | ||
---|---|---|
| ||
ssh nots |
The compute server will still ask You will be prompted for a password. To connect to them without a password we will need to copy the keys to them too.
Copy the keys to the compute servers
Copy the public key to the compute servers Enter the following commands in your terminal, replacing user with the correct rice user idOnce you have entered it, you can edit or create the ~/.ssh/authorized_keys file on the compute server using a text editor like vi. Make sure to create the folder .ssh first if it doesn't exist:
Code Block | ||
---|---|---|
| ||
ssh-copy-id aries
ssh-copy-id nots | ||
mkdir .ssh
vi ~/.ssh/authorized_keys |
Add the contents of your local machine's ~/.ssh/This will copy your public key, id_rsa.pub , to the remote machine and add it to file to a new line in the authorized_keys file on the remote machine.. Save the file exit the text editor (:wq) and then exit to your local machine with Ctrl+D.
To test Test the connection. Enter the following command in your terminal to connect to the remote machine:
Code Block | ||
---|---|---|
| ||
ssh ariesnots |
You should now be able to connect to the compute servers server without being prompted for a password.
Repeat these steps for each additional compute server you want to connect to.
More Information
Attachments | ||
---|---|---|
|