Welcome to the Info TEST server!

Skip to content. | Skip to navigation

Sections
Info Services > Computing Guide > Cluster Processing > Appendix > Advanced Job Submitting (Slurm)

Advanced Job Submitting (Slurm)

Below are some advanced examples of submitting jobs to the various scheduling systems available at NRAO.

However, mpicasa is aware of the Slurm scheduling system because of the version of Open MPI that mpicasa uses.  This means that all processes on all nodes in the job will be contained in the proper cgroups.  It also means that you no longer need to use the -n or the -machinefile options with mpicasa because mpicasa will get this information from Slurm-created variables.

 

Slurm

You can use #SBATCH directives in the script you submit via sbatch. These directives are the same as command-line options to sbatch. For example, if you wanted to use the --mem=2G command-line option to sbatch, you could instead include it in your script with the line #SBATCH --mem=2G. See below for more examples. If possible, please set a time limit with the --time option.  You job will be killed after this amount of runtime but it can also allow your job to start sooner because the scheduler knows how much time it needs.  If you have not set a time limit in your job, it will be killed after 100 days.  Jobs are also killed if the node reboots.

Serial Job

Save the following example to a file called run_casa.sh and edit as needed.

#!/bin/sh

# Set SBATCH Directives
# Lines starting with "#SBATCH", before any shell commands are
# interpreted as command line arguments to sbatch.
# Don't put any commands before the #SBATCH directives or they will not work.
#
#SBATCH --export=ALL                          # Export all environment variables to job.
#SBATCH --mail-type=BEGIN,END,FAIL # Send email on begin, end and fail of job.
#SBATCH --chdir=/lustre/aoc/observers/nm-4386 # Working directory #SBATCH --time=1-2:3:4 # Request 1day, 2hours, 3minutes, and 4seconds. #SBATCH --mem=16G # Memory needed by the whole job. # casa's python requires a DISPLAY for matplot, so create a virtual X server xvfb-run -d casa --nogui -c /lustre/aoc/observers/nm-4386/run_casa.py

Run job

sbatch run_casa.sh

 

Parallel Single-node Job

Because CASA uses one process as an "MPI Client", requesting 8 cores for example will produce 7-way parallelization.  If you actually want 8-way parallelization, you have two options.  1. You can request N + 1 cores e.g. --ntasks-per-node=9, but this is less efficient since this "MPI Client" usually uses very little resources.  2. You can add the --oversubscribe option to mpicasa along with -n 9 which forces this "MPI Client" to run on one of the 8 processing cores which should not affect performance in most cases.

#!/bin/sh

# Set PBS Directives
# Lines starting with "#SBATCH", before any shell commands are
# interpreted as command line arguments to sbatch.
# Don't put any commands before the #SBATCH directives or they won't work.

#SBATCH --export=ALL                          # Export all environment variables to job
#SBATCH --chdir=/lustre/aoc/observers/nm-4386 # Working directory #SBATCH --time=8-0:0:0 # Request 8days
#SBATCH --mem=128G # Memory for the whole job
#SBATCH --nodes=1 # Request 1 node
#SBATCH --ntasks-per-node=8 # Request 8 cores
CASAPATH=/home/casa/packages/RHEL7/release/current # Use a specific version of CASA xvfb-run -d ${CASAPATH}/bin/mpicasa ${CASAPATH}/bin/casa --nogui -c run_mpicasa.py

# mpicasa should be able to detect the number of nodes and cores
# defined by Slurm, so a machinefile shouldn't be necessary.
# But if you still want one, here is how to create it and use it.
#srun hostname > /tmp/machinefile.$$
#xvfb-run -d ${CASAPATH}/bin/mpicasa -machinefile /tmp/machinefile.$$ ${CASAPATH}/bin/casa --nogui -c run_mpicasa.py
#rm -f /tmp/machinefile.$$

# If you actually want 8-way parallelization instead 7-way
#xvfb-run -d ${CASAPATH}/bin/mpicasa --oversubscribe -n 9 ${CASAPATH}/bin/casa --nogui -c run_mpicasa.py

Parallel Multi-node Job

The procedure for submitting parallel batch jobs is very similar to submitting serial jobs. The differences are setting the --nodes and --ntasks-per-node options, and how casa is executed. The --nodes and --ntasks-per-node options specifies the number of nodes and the number of cores on each node requested by the job. The default of both of these options is 1.  Since the version of Open MPI that CASA uses is aware of Slurm, a machinefile is not necessary like it was with Torque/Moab. If you still want to use a machinefile, you will need to create it with the srun command. See the comments in the example file below.

 

#!/bin/sh

# Set PBS Directives
# Lines starting with "#SBATCH", before any shell commands are
# interpreted as command line arguments to sbatch.
# Don't put any commands before the #SBATCH directives or they will not work.

#SBATCH --export=ALL                          # Export all environment variables to job
#SBATCH --chdir=/lustre/aoc/observers/nm-4386 # Working directory
#SBATCH --time=30-0:0:0 # Request 30days after which the job will be killed
#SBATCH --mem=64G # Memory per node
#SBATCH --nodes=2 # Request exactly 2 nodes
#SBATCH --ntasks-per-node=6 # Request 12 cores total (6 per node)
CASAPATH=/home/casa/packages/RHEL7/release/casa-6.4.0-16 # Use specific version of CASA xvfb-run -d ${CASAPATH}/bin/mpicasa ${CASAPATH}/bin/casa --nogui -c /lustre/aoc/observers/nm-4386/run_mpicasa.py

# mpicasa should be able to detect the number of nodes and cores
# defined by Slurm, so a machinefile shouldn't be necessary.
# But if you still want one, here is how to create it and use it.
#srun hostname > /tmp/machinefile.$$
#xvfb-run -d ${CASAPATH}/bin/mpicasa -machinefile /tmp/machinefile.$$ ${CASAPATH}/bin/casa --nogui -c /lustre/aoc/observers/nm-4386/run_mpicasa.py
#rm -f /tmp/machinefile.$$

# If you actually want 12-way parallelization instead of 11-way use --oversubscribe
#xvfb-run -d ${CASAPATH}/bin/mpicasa --oversubscribe -n 13 ${CASAPATH}/bin/casa --nogui -c run_mpicasa.py

Run job

sbatch run_casa.sh

 

Search All NRAO