Info Services > Computing Guide > Astronomy Support > Cluster Scheduler - NM

Cluster Scheduler - NM

Requesting Interactive Nodes

Users may want to request interactive node(s) if they are planning on running casa, or some other software, interactively. This is the least efficient use of the cluster so please don't reserve nodes for more than you need.

There are two clusters available: one in New Mexico named nmpost which is where the VLA and VLBA archive is and one in Virginia named cvpost where the ALMA archive is.

Login to the NRAO

If you are outside the NRAO, you must first login to the appropriate ssh gateway.

  • For the nmpost cluster
ssh ssh.aoc.nrao.edu
  • For the cvpost cluster
ssh ssh.cv.nrao.edu

Login to the Head Node

Once on the NRAO network, login to the appropriate head node.

  • For the nmpost cluster
ssh nmpost-master
  • For the cvpost cluster
ssh cvpost-master

Run Nodescheduler

Once on the head node, run the nodescheduler program. You are allowed to request up to two weeks of time on a single node. For example:

nodescheduler --request 14 1

This requests 14 days of time on one node. The nodescheduler command will return a job id like 3001.nmpost-master.aoc.nrao.edu to your terminal window.  The job id is used by the system to uniquely identify the job.  You will need this id to terminate your job early if you finish with the node, or if you need to extend your time.

When your node is ready you will receive an email from the system with the subject "Cluster interactive: time begun" which will tell you the name of the cluster machine available to you.

After approximately 75% of your time has passed the system will send you an email warning. This would be a good time to get an extension if you are needing one. DO NOT wait until the last minute if you need a time extension on your node. You will receive another email warning approximately one hour from the end of your requested time. When your time expires the system will kill all processes you have running on that node.

If you finish your work before your allocation ends, please release the node. The original email you received has the terminate command in it. The argument to the --terminate option is the unique job id for your job. For example:

nodescheduler --terminate 3001

All nodes are the same and have the same access to Lustre. It's best to release a node when you're through and then request another when ready, rather than locking one up for weeks at a time and leaving it idle. Only one user per node is allowed for interactive time.

To Submit a CASA Job to Torque

Users may want to use this method if they don't need to interact with CASA.  NOTE: nm-* and cv-* type accounts have their home set to the Lustre filesystem.  NRAO staff are on a different filesystem and therefore should set WORK_DIR to their Lustre area.

Create a Submit Script

Create a cluster.req file like the following.  (this is an example for the nmpost cluster)

# This is a config file for submitting jobs to the cluster scheduler.
# The COMMAND is expected to be a script or binary.
# This config file happens to be for running casa.

#
# These are required
#
WORK_DIR="/lustre/aoc/observers/nm-4386"
COMMAND="/lustre/aoc/observers/nm-4386/run_casa.sh"
# Please use at least one of MEM, PMEM, VMEM, PVMEM. Or use MEMORY.
#MEM="16gb"    # physmem used by job. Ignored if NUM_NODES > 1. Won't kill job.
PMEM="16gb"    # physmem used by any process. Won't kill job.
#VMEM="16gb"    # physmem + virtmem used by job. Kills job if exceeded.
PVMEM="16gb"    # physmem + virtmen used by any process. Kills job if exceeded.
#MEMORY="16gb"    # sets MEM and PMEM (Deprecated).
# # These are optional # #NUM_NODES="2" # default is 1 #NUM_CORES="4" # default is 1 #MAILTO="nm-4386" # default is the user submitting the job #QUEUE="batch" # default is the batch queue #STDOUT="my_out" # file relative to WORK_DIR. default is no output #STDERR="my_err" # file relative to WORK_DIR. default is no output #UMASK="0117" # default is 0077 # MAIL_OPTIONS: # n no mail will be sent. # a mail is sent when the job is aborted by the batch system. # b mail is sent when the job begins execution. # e mail is sent when the job terminates. MAIL_OPTIONS="abe" # default is "n" therefore no email
# JOB_NAME: <= 15 non-whitespace characters. First character alphabetic. #JOB_NAME="testing1" # default is _qsub.

Create a Run Script

Create a run_casa.sh file like the following.  (this is an example for the nmpost cluster)

#!/bin/sh

#This script is ment to be set in the COMMAND variable
#in the configure file to submit.  That submit script will create the
#clusterspec file for us in the WORK_DIR we specified in the configure file.

WORK_DIR="/lustre/aoc/observers/nm-4386/scheduler" cd ${WORK_DIR} # casa's python requires a DISPLAY for matplot so create a virtual X server xvfb-run -d casa --nogui -c /lustre/aoc/observers/nm-4386/ParallelScript.py

Login to the Head Node

  • For the nmpost cluster
ssh nmpost-master
  • For the cvpost cluster
ssh cvpost-master

Run submit

submit -f cluster.req

FAQ

  • How do I see all the reservations on the cluster?
qstat -n1
  • How can I see which node I reserved?

If you have a running reservation you can find out which node it is on by logging into the head node and run something like this (user nm-4386 is an example)

qstat -1nu nm-4386
  • My node was rebooted.  What happened to my reservation?

Persistent reservations cause problems within the Torque job scheduler by trying to restart your batch process which is not usually what you want to happen.  If your node is rebooted your reservation is released. Please request another node.

  • Why is my job exiting unexpectedly?

You job may have exceeded the requested amount of RAM.  The scheduler will kill your job if you set VMEM and/or PVMEM and exceeded either of those limits.  If you have the MAIL_OPTIONS set to at least "e" then you should receive an e-mail message when the job ends.  If it reports Exit_status as -10 then your job exceeded VMEM.  If it reports Exit_status as 255 then a process in your job exceeded PVMEM.

  • What are the different memory options?

We recommend setting both PMEM and PVMEM.  PMEM allows the sceduler to put your job on nodes with enough memory without using swap while PVMEM allows the scheduler to kill your job if any one process uses more memory than requested.

MEM

Maximum amount of physical memory used by the job. Ignored on Linux if the number of nodes is greater than 1.

Scheduler will not kill the job if any process or the whole job exceeds MEM.

Sets "data seg size" in ulimit a.k.a. "Max data size in /proc/$$/limits.
Sets "max memory size" in ulimit a.k.a. "Max resident set" in /proc/$$/limits.

PMEM

Maximum amount of physical memory used by any single process of the job.  If asking for multiple virtual processors (via ppn or NUM_CORES) then the scueduler will multiply NUM_CORES by PMEM and look for that much available physical memory.

Scheduler will not kill the job if any process or the whole job exceeds PMEM.

Sets "data seg size" and in ulimit a.k.a. "Max data size" in /proc/$$/limits.

Sets "max memory size" in ulimit a.k.a. "Max resident set" in /proc/$$/limits.

VMEM

Maximum amount of virtual memory (physical + swap) used by all concurrent processes in the job.

Scheduler will kill the job if the whole job exceeds VMEM.  Exit_status=-10.

Sets "virtual memory" in ulimit a.k.a. "Max address space" in /proc/$$/limits.

PVMEM

Maximum amount of virtual memory (physical + swap) used by any single process in the job.  If asking for multiple virtual processors (via ppn or NUM_CORES) then the scheduler will multiply NUM_CORES by PVMEM an look for that much available virtual memory.

Scheduler will kill the job if any one process exceeds VMEM.  Exit_status=255.

Sets "virtual memory" in ulimit a.k.a. "Max address space" in /proc/$$/limits.

Info Services Contacts