Preparing scripts for batch processing (Torque/Slurm)
Preparing scripts for batch processing
CASA offers the capability to execute python scripts either by calling them from the prompt of an interactive session with execfile('myScript.py') or by starting a non-interactive session that runs the script, with the command casa -c myScript.py. This section covers the latter option and shows how to create a python script for calibrating and imaging VLA data, along with a convenient way to pass arguments to the script.
Depending on the scheduler you are using, the script and the command to submit the script will be different.
Torque
Once the scripts are ready to run, submission is done using qsub as discussed in Access and Running Jobs. Log in to nmpost or cvpost, and run the command:
qsub runVLAcal-torque.sh
Slurm
Once the scripts are ready to run, submission is done using sbatch as discussed in Access and Running Jobs. Log in to nmpost or cvpost, and run the command:
sbatch runVLAcal-slurm.sh
VLACalibration.py
The script presented here is derived from the VLA Continuum Tutorial 3C391-CASA5.0.0 where a detailed discussion of the calibration and imaging parameters is presented. To make the script reusable and scalable, it is interesting to use variables to pass arguments to the many CASA tasks that the script will execute, and one of the many possible ways to assign values to the script variables is through environment variables.
To read the environment variables inside the python script:
# Extracted from VLACalibration.py
import os
visfile = os.getenv('MS_FILE')
workDir = os.getenv('WORK_DIR')
imageDir = os.getenv('IMAGE_DIR')
infoDir = os.getenv('INFO_DIR')
imageID = os.getenv('IMAGE_ID')
This approach is a convenient way to transfer arguments between a shell script and a python script without explicitly adding them to the command line and having to worry about argument orders and indexes. On this example, the variables are set inside the shell script using export:
# Extracted from runVLAcal.sh
export MS_FILE=3c391_ctm_mosaic_10s_spw0.ms
export WORK_DIR=WORK
export IMAGE_DIR=IMAGES
export INFO_DIR=INFO
export IMAGE_ID=St_I
These environment variables can also be assigned manually from the command line, using the same command.
After assigning the variables, the python calls to the CASA tasks is done using the variables, for example:
flagdata(vis = visfile,
flagbackup = True,
mode = 'manual',
scan = '1')
Where visfile is the variable that contains the name of the MS (visibility) file where the data is stored.
The example script VLACalibration.py has a series of calls to the tasks gaincal or applycal, most of them with the same or similar inputs. For the inputs that are the same through a series of calls, we declare variables in the beginning of the script:
# Extracted from VLACalibration.py
# Calibration parameters
REFANT = 'ea21'
SPW = '0:5~58'
GainTables = []
And use them when calling the tasks:
gaincal(vis = visfile,
caltable = caltable,
field = 'J1331+3030',
refant = REFANT,
spw = SPW,
gaintype = 'K',
solint = 'inf',
combine = 'scan',
minsnr = 5,
gaintable = GainTables)
Notice the value of the argument caltable = caltable, and that GainTables is a list that was initially declared as empty. The value of caltable is different for each call to gaincal. However, it is explicitly declared outside the task call, because we want it to be a variable in the python environment, and the reason for this is that we can simply append the value of caltable to the GainTables list to run the next call to gaincal.
So, in our script, we add an explicit caltable declaration before running gaincal, and append the new calibration table to GainTables after it has been created by gaincal. The block that frequently appears in the script is:
caltable = basename + '.K0' # extension of the next calibration table to be generated
gaincal(vis = visfile,
caltable = caltable,
field = 'J1331+3030',
refant = REFANT,
spw = SPW,
gaintype = 'K',
solint = 'inf',
combine = 'scan',
minsnr = 5,
gaintable = GainTables) # GainTables has the calibration tables generated on the previous calls
GainTables.append(caltable) # adds the new calibration table to the end of the list
In the case of applying the calibration to the calibrator sources, it is interesting to do it in a loop to avoid entering the same python code N times, where N is the number of calibrator sources. Instead, we create a list containing the calibrator sources and iterate through them.
fields = ['J1331+3030', 'J0319+4130', 'J1822-0938']
for item in fields:
applycal(vis = visfile,
field = item,
gaintable = GainTables,
gainfield = ['',item,'','','','',''],
interp = ['','nearest','','','','',''],
calwt = [False],
parang = True)
Once the scripts are ready to run, submission is done using qsub as discussed in Access and Running Jobs. Log in to nmpost or cvpost, and run the command:
qsub runVLAcal.sh