Welcome to the Info TEST server!

Skip to content. | Skip to navigation

Sections
Info Services > Computing Guide > Cluster Processing > Appendix > Translating between Torque, Slurm, and HTCondor

Translating between Torque, Slurm, and HTCondor

This page attempts to describe the differences in various cluster scheduling systems.  Some basic understanding of each system is expected.  While Torque and Slurm are very similar in their usage, HTCondor is somewhat different.  With Torque and Slurm you can use command-line arguments to specify the requirements of a job.  With HTCondor you need to create a Submit Description File that specifies the requirements and defines the script to execute.

These translations are not meant to be exact.  You should have basic understanding of the cluster systems involved.  There are some examples at the end of this document.

 

Submit Options

DescriptionTorque/MoabSlurmHTCondor
Script directive #PBS #SBATCH NA
Queue/Partition -q <queue> -p <partition>

requirements = (<partition> == True)

+partition = "<partition>"

Node count -l nodes=<count> -N <min>[-max]> NA
Core count -l ppn=<count> -n <count> OR -c <count>
request_cpus = <count>
Wall clock limit -l walltime=<hh:mm:ss> -t <min> OR -t <days-hh:mm:ss> periodic_remove = (time() - JobStartDate) > (<seconds>)
Stdout -o <filename> -o <filename> output = <filename>
Stderr -e <filename> -e <filename> error = <filename>
Copy environment -V --export=ALL getenv = true
Email notification -m [a|b|e]

--mail-type=[ALL, END, FAIL, BEGIN, NONE]

notification = [Always, Complete, Error, Never]
Email address -M <user_list> --mail-user=<user_list> notify_user = <user_list>
Job name -N <name> -J <name> OR --job-name=<name> batch_name = <name>
Working directory -d <path> OR -w <path>
-D <path> initialdir
Memory per node -l mem=<count[kb, mb, gb, tb]> --mem=<count[K, M, G, T]> request_memory = <count> G
Memory per core -l pmem=<count[kb, mb, gb, tb]> --mem-per-cpu=<count[K, M, G, T]> NA
Virtual memory per node -l vem=<count[kb, mb, gb, tb]> NA NA
Virtual memory per core -l pvmem=<count[kb, mb, gb, tb]> NA NA
Memory per job -L tasks=1:memory=<count[kb, mb, gb, tb]> --mem=<count[K, M, G, T]> request_memory = <count> G
Job arrays -t <arrayspec> --array=<arrayspec>  queue seq <first> [<increment>] <last> |
Variable list -v <var>=<val>[,<var>=<val>] --export=<var>=<val>[,<var>=<val>] environment = "<var>=<val> [<var>=<val>]"
Script args -F <arg1>[,<arg2>,...] sbatch script <arg1>[,<arg2>,...]  

 

 

 

Commands

DescriptionTorque/MoabSlurmHTCondor
Job alter qalter scontrol update condor_qedit
Job connect to  NA srun --jobid <jobid> --pty bash -l condor_ssh_to_job <jobid>
Job delete qdel <jobid> scancel <jobid> condor_rm <jobid>
Job delete all user's jobs qdel all scancel --user=<user> condor_rm <user>
Job info detailed qstat -f <jobid> scontrol show job <jobid> condor_q -long <jobid>
Job info detailed qstat -f <jobid> scontrol show job <jobid> condor_q -analyze -verbose <jobid>
Job info detailed qstat -f <jobid> scontrol show job <jobid> condor_q -better-analyze -verbose <jobid>
Job info detailed qstat -f <jobid> scontrol show job <jobid> condor_q -better-analyze -reverse -verbose <jobid>
Job show all qstat -1n squeue condor_q -global -all
Job show all verbose qstat -1n squeue -all condor_q -global -all -nobatch
Job show all verbose qstat -1n squeue -all condor_q -global -all -nobatch -run
Job show DAGs  NA NA condor_q -dag -nobatch
Job submit qsub sbatch condor_submit
Job submit simple echo "sleep 27" | qsub srun sleep 27 condor_run "sleep 27" &
Job submit interactive qsub -I srun --pty bash condor_submit -i
Node show free nodes nodesfree sinfo --states=idle --partition=<partition> -N condor_status -const 'PartitionableSlot && Cpus == TotalCpus'
Node show resources qstat -q sjstat -c
Node show state pbsnodes -l all sinfo -Nl condor_status -state

 

 

 

Variables

 

DescriptionTorque/MoabSlurmHTCondor
Job Name PBS_JOBNAME SLURM_JOB_NAME
Job ID PBS_JOBID SLURM_JOBID
Tasks per node PBS_NUM_PPN SLURM_NTASKS_PER_NODE
Cores per step on this node PBS_NUM_PPN SLURM_CPUS_ON_NODE
Queue/Partition submitted to PBS_O_QUEUE SLURM_JOB_PARTITION
Queue/Partition running on PBS_QUEUE SLURM_JOB_PARTITION
User PBS_O_LOGNAME SLURM_JOB_USER
Number of nodes in job PBS_NUM_NODES SLURM_NNODES
Number of nodes in job PBS_NUM_NODES SLURM_JOB_NUM_NODES
Submit Host PBS_O_HOST SLURM_SUBMIT_HOST
Working dir PBS_O_WORKDIR PWD
Machine file PBS_NODEFILE NA

 

 

 

Example Commands

 

Torque

 

qsub -V -N casatest01 -l pmem=16gb,pvmem=16gb -d /lustre/aoc/observers/nm-4386 -l walltime=2:30:00 -m ae run_casa.sh

 

Slurm

 

sbatch --export ALL -J casatest01 --mem=16G -D /lustre/aoc/observers/nm-4386 -t 0-2:30:00 --mail-type=END,FAIL run_casa.sh


HTCondor

 Create a Submit Description File (E.g. run_casa.htc) 

executable = run_casa.sh
batch_name = casatest01
request_memory = 16 G
notification = Always
environment = "CASA_HOME=/home/casa/packages/RHEL7/release/current PPR_FILENAME=PPR.xml"
initialdir = /lustre/aoc/observers/nm-4683

log = condor.$(ClusterId).log
stdout = condor.$(ClusterId).log
stderr = condor.$(ClusterId).log

queue

Then submit that file

condor_submit run_casa.htc

 

While you can set a wall clock limit for an HTCondor job, it isn't advised in most cases.

 

Search All NRAO