Welcome to the Info TEST server!

Skip to content. | Skip to navigation

Sections
Info Services > Computing Guide > Cluster Processing > Appendix > Advanced Job Submitting (Torque/Slurm)

Advanced Job Submitting (Torque/Slurm)

Below are some advanced examples of submitting jobs to the various scheduling systems available at NRAO.

As of Oct. 2019, mpicasa is not aware of Torque , SLURM, HTCondor or any other cluster scheduling system. This means there is no cooperation between CASA and the scheduler for things like cgroups or other resource containers for multi-node jobs. It works but on a sort-of honer system. For example, imagine a script requesting 2 nodes, each with 4 cores. Torque will return the list of hostnames and mpicasa will launch processes on those hostnames, but only the mother superior has any resource limits. The other processes on the other hosts are not bound to the cgroup created by torque limiting the number of cores via cpuset. So as long as they are well behaved, it all works.

 

Torque

Serial Job

You can use #PBS directives in the script you submit via qsub.  These directives are the same as command-line options to qsub.  For example, if you wanted to use the -V command-line option to qsub, you could instead include it in your script with the line #PBS -V.  See below for more examples.

The default walltime for batch jobs is 100 days. Your job will be killed if it is still running after 100 days unless you have set a walltime. Also, setting a walltime shorter than 100 days will increase the odds of your job starting when resources are scarce.

Jobs are not restarted if there is a node failure.  Also, any reservations are removed from a node if that node reboots.

#!/bin/sh

# Set PBS Directives
# Lines starting with "#PBS", before any shell commands are
# interpreted as command line arguments to qsub.
# Don't put any commands before the #PBS options or they will not work.
#
#PBS -V                               # Export all environment variables from the qsub command environment to the batch job.
#PBS -l pmem=16gb                     # Amount of memory needed by each process (ppn) in the job.
#PBS -d /lustre/aoc/observers/nm-4386 # Working directory (PBS_O_WORKDIR)
#PBS -m bea                           # Send email on begin, end, and abort of job

# Because these start with "##PBS", they are not read by qsub.
# These are here as examples. ##PBS -l mem="16gb" # physmem used by job. Ignored if NUM_NODES > 1. Won't kill job. ##PBS -l pmem="16gb" # physmem used by any process. Won't kill job. ##PBS -l vmem="16gb" # physmem + virtmem used by job. Kills job if exceeded. ##PBS -l pvmem="16gb" # physmem + virtmen used by any process. Kills job if exceeded. ##PBS -l nodes=1:ppn=1 # default is 1 core on 1 node ##PBS -M nm-4386@nrao.edu # default is submitter ##PBS -W umask=0117 # default is 0077 ##PBS -l walltime=1:0:0:0 # default is 100 days. This set it to 1 day # casa's python requires a DISPLAY for matplot, so create a virtual X server
xvfb-run -d casa --nogui -c /lustre/aoc/observers/nm-4386/run_casa.py

 

Parallel Single-node Job

The procedure for submitting parallel batch jobs is very similar to submitting serial jobs.  The differences are setting the ppn qsub option to something other than 1 and how casa is executed.

The qsub option ppn specifies the number of cores per node requested by the job.  If this option is not set, it defaults to 1.  It is used in conjunction with the -l nodes option.  For example, to request one node with 8 cores you would type -l nodes=1:ppn=8.

The scheduler creates a file containing the requested node and core count assigned to the job.  The location of this file is stored in the environment variable PBS_NODEFILE.  This file can tell mpicasa on which nodes to run.

#!/bin/sh

# Set PBS Directives
# Lines starting with "#PBS", before any shell commands are
# interpreted as command line arguments to qsub.
# Don't put any commands before the #PBS options or they will not work.
#
#PBS -V    # Export all environment variables from the qsub command environment to the batch job.
#PBS -l pmem=16gb        # Amount of memory needed by each process (ppn) in the job.
#PBS -d /lustre/aoc/observers/nm-4386 # Working directory (PBS_O_WORKDIR)
#PBS -l nodes=1:ppn=8 # Request one node with 8 cores

CASAPATH=/home/casa/packages/RHEL7/release/current

xvfb-run -d mpicasa -machinefile $PBS_NODEFILE $CASAPATH/bin/casa --nogui -c /lustre/aoc/observers/nm-4386/run_mpicasa.py

For more information regarding how to set memory requests see the Memory Options section of the documentation.

 

Parallel Multi-node Job

For multi-node jobs we recommend using the -L options instead of the -l options.

 #!/bin/sh

# Set PBS Directives
# Lines starting with "#PBS", before any shell commands are
# interpreted as command line arguments to qsub.
# Don't put any commands before the #PBS options or they will not work.
#
#PBS -V    # Export all environment variables from the qsub command environment to the batch job.
#PBS -d /lustre/aoc/observers/nm-4386 # Working directory (PBS_O_WORKDIR)

#PBS -L tasks=2:lprocs=4:memory=10gb
# tasks is the number of nodes
# lprocs is the number of cores per node
# memory is the amount of memory per node

CASAPATH=/home/casa/packages/RHEL7/release/current

xvfb-run -d mpicasa -machinefile $PBS_NODEFILE $CASAPATH/bin/casa --nogui -c /lustre/aoc/observers/nm-4386/run_mpicasa.py

 

Slurm

Serial Job

You can use #SBATCH directives in the script you submit via sbatch. These directives are the same as command-line options to sbatch. For example, if you wanted to use the --mem=2G command-line option to sbatch, you could instead include it in your script with the line #SBATCH --mem=2G. See below for more examples. The default TimeLimit for batch jobs is 100 days. Your job will be killed if it is still running after 100 days unless you have set a TimeLimit. Jobs are not restarted if there is a node failure. Also, any reservations are removed from a node if that node reboots.

#!/bin/sh

# Set SBATCH Directives
# Lines starting with "#SBATCH", before any shell commands are
# interpreted as command line arguments to sbatch.
# Don't put any commands before the #SBATCH directives or they will not work.
#
#SBATCH --export=ALL                     # Export all environment variables to job
#SBATCH --mem=16G # Amount of memory needed by the whole job. #SBATCH -D /lustre/aoc/observers/nm-4386 # Working directory #SBATCH --mail-type=BEGIN,END,FAIL # Send email on begin, end, and fail of job # Because these start with "##", they are not read by Slurm.
# These are here as examples. ##SBATCH --mem=16g # Amount of memory needed by the whole job. ##SBATCH --mem-per-cpu=2G # Amount of memory needed by each core ##SBATCH --ntasks=8 # Request 8 cores. Default is 1 core ##SBATCH --mail-user=nm-4386@nrao.edu # Default is submitter ##SBATCH --time=1-2:3:4 # Request 1 day, 2 hours, 3 minutes and 4 seconds. Default is 100 days)
# casa's python requires a DISPLAY for matplot, so create a virtual X server xvfb-run -d casa --nogui -c /lustre/aoc/observers/nm-4386/run_casa.py

Run job

sbatch run_casa.sh

 

Parallel Single-node Job

 Using the --ntasks option can end up requesting more than one node if you ask for enough tasks. Using --ntasks-per-node ensures that the job stays on one node.

#!/bin/sh

# Set PBS Directives
# Lines starting with "#SBATCH", before any shell commands are
# interpreted as command line arguments to sbatch.
# Don't put any commands before the #SBATCH directives or they won't work.

#SBATCH --export=ALL                     # Export all environment variables to job
#SBATCH --mem=128G # Amount of memory for the whole job
#SBATCH --ntasks=8 # Number of tasks for the whole job #SBATCH -D /lustre/aoc/observers/nm-4386 # Working directory CASAPATH=/home/casa/packages/RHEL7/release/current xvfb-run -d mpicasa $CASAPATH/bin/casa --nogui -c /lustre/aoc/observers/nm-4386/run_mpicasa.py

# mpicasa should be able to detect the number of nodes and cores
# defined by Slurm, so a machinefile shouldn't be necessary.
# But if you still want one, here is how to create it and use it.
#srun hostname > /tmp/machinefile.%%
#xvfb-run -d mpicasa -machinefile /tmp/machinefile.$$ $CASAPATH/bin/casa --nogui -c /lustre/aoc/observers/nm-4386/run_mpicasa.py
#rm -f /tmp/machinefile.$$

Parallel Multi-node Job

The procedure for submitting parallel batch jobs is very similar to submitting serial jobs. The differences are setting the --ntasks option and how casa is executed. The --ntasks option specifies the number of cores requested by the job. If this option is not set, it defaults to 1. The scheduler sets several variables defining the allocation but not a file containing the nodes like Torque/Moab used to. If you want such a file to give to mpicasa, you will need to create it. containing the requested node and core count assigned to the job. The location of this file is stored in the environment variable PBS_NODEFILE. This file can tell mpicasa on which nodes to run.

#!/bin/sh

# Set PBS Directives
# Lines starting with "#SBATCH", before any shell commands are
# interpreted as command line arguments to sbatch.
# Don't put any commands before the #SBATCH directives or they will not work.

#SBATCH --export=ALL                     # Export all environment variables to job
#SBATCH -D /lustre/aoc/observers/nm-4386 # Working directory
#SBATCH --nodes=2 # Request 2 nodes
#SBATCH --ntasks=8 # Request 8 cores total (4 per node)
#SBATCH --mem=10G # Amount of memory per node
CASAPATH=/home/casa/packages/RHEL7/release/current xvfb-run -d mpicasa $CASAPATH/bin/casa --nogui -c /lustre/aoc/observers/nm-4386/run_mpicasa.py

Run job

sbatch run_casa.sh

 

Search All NRAO