Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Restrictions apply only for GPU nodes

...

One way to regulate the number of jobs running simultaneously is to add dependencies to the jobs with the #SBATCH --dependency=previous_job_id command.Another way to regulate the number of jobs running simultaneously is to use job arrays and adding a percentage at the end to set the limit of jobs running at the same time, for example using #SBATCH --array=0-39%10 to run 40 jobs, but only 10 at the same time.

Another way to regulate the number of jobs running simultaneously is to add dependencies to the jobs with the #SBATCH --dependency=previous_job_id command. A sample script to submit multiple jobs with dependencies can be found here

Important note: set up ssh communication keys between nodes, as explained in the pdf at the end of this page (ARIES_Quick_Start_wl52_20220406.pdf).

Updated policies effective 8/19/2024

The intention of these policies is to ensure that all CTBP users are able to effectively use this resource.

The policies will be adjusted if we find that they are not sufficient to ensure smooth operation of the resources.

If you believe users are not being good citizens (i.e. trying to work around, or ignore, the rules), or if you are simply having trouble getting your jobs to go through, don't hesitate to let us know (email Paul Whitford), so that we can address any issues before they become problems. 

Aries Usage Policies for GPU nodes: Effective 8/19/2024
1) Only 1-node (8 GPUs) jobs are allowed.
2) The time limit for each job is 6 hours.
3) Each user may have no more than a total of 10 jobs in the queue at a time. This includes all jobs, regardless of state (running, pending, held, etc). Also, a job array with N jobs would count as N towards this limit.
4) Exceptions to these rules may be requested and considered on a case-by-case basis. (email Paul Whitford)

Partitions

Partition commons

...

For submitting OpenMM jobs, a singularity container containers with OpenMM pre-installed is are available .in:

Code Block
languagebash
firstline1
titleSingularity container for OpenMM
linenumberstrue
container=/home/pcw2/bin/openmm-ctbp.sif

This container does not have OpenSMOG, or other CTBP-specific tools, and they will need to be installed with pip3. For example, to install OpenSMOG: 

Code Block
$pip3 install OpenSMOG

 If you can not find a container with suitable options available, you may contact Prof. Whitford, or you may be able to use pip3.

Usage example with bash submission script, openmm run python script, and input files can be downloaded below.

...

Code Block
languagebash
firstline1
titleJob submission script
linenumberstrue
#!/bin/bash -l
#SBATCH --job-name=ctbpexample
#SBATCH --nodes=1
#SBATCH --cpus-per-task=96        #set to 96 if not using MPI (OpenMM does not use MPI)
#SBATCH --tasks-per-node=1
#SBATCH --export=ALL
#SBATCH --mem=0                   #each GPU assigned 32 GB by default
#SBATCH --gres=gpu:8 
#SBATCH --time=1-0006:00:00         #max run time is 16 dayhours

Launcher_GPU

For submitting jobs, another possibility is to use the Launcher_GPU module and bind each run/task to one gpu. For this, remember to set the '--cpus-per-task' as the total number of cpus divided by the number of '--tasks-per-node'. The example bellow, as required when using Aries, the job will simultaneously use all 8 GPUs on each node. To do this, we will run 8 simulations, each on a single GPU. For this, OpenMM was loaded as module. 

...

Code Block
languagebash
firstline1
titleJob submission script Launcher_GPU
linenumberstrue
#!/bin/bash -l

#SBATCH --job-name=my_job
#SBATCH --account=commons
#SBATCH --partition=commons
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=8
#SBATCH --cpus-per-task=6
#SBATCH --threads-per-core=2
#SBATCH --mem-per-cpu=3G
#SBATCH --gres=gpu:8
#SBATCH --time=2406:00:00
#SBATCH --export=ALL

echo "Submitting simulations..."

module purge

# Using Launcher_GPU on ARIES 
module load GCC/10.2.0 OpenMPI/4.0.5 OpenMM/7.5.0 foss/2020b Launcher_GPU

# This is for controling Launcher
export LAUNCHER_WORKDIR=`pwd`
export LAUNCHER_JOB_FILE=$PWD/launcher_jobs_sim
export LAUNCHER_BIND=1

echo "Job started on " `date`
echo "Running on hostname" `hostname`
echo "Job $SLURM_JOB_ID is running on: $SLURM_NODELIST"
echo "Job SLURM_SUBMIT_DIR is $SLURM_SUBMIT_DIR"
echo "Running on $SLURM_NNODES nodes"
echo "Running on $SLURM_NPROCS processors"
echo "CPUS per task is $SLURM_CPUS_PER_TASK"
echo "LAUNCHER_WORKDIR: $LAUNCHER_WORKDIR"
echo "Number of replicas is $max_replicas"
df -h

# This will adjust the total number of runs to nodes*8
max_replicas=$((SLURM_NNODES*8))

rm $LAUNCHER_WORKDIR/launcher_jobs_sim &> /dev/null

# Create Launcher_job_file needed by $LAUNCHER_DIR/paramrun
for i in `seq 1 $max_replicas`
do
	echo "python run_code.py input_$i output_${i}.log" >> $LAUNCHER_WORKDIR/launcher_jobs_sim
done

# This line launches the jobs in the launcher_jobs_sim file
$LAUNCHER_DIR/paramrun

echo "My job finished at:" `date`

...