Lustre Quick Guide - CV
Minimalist Instructions
If outside the NRAO network, connect to the Bastion host: ssh.cv.nrao.edu
.
Next,
- Connect via ssh to
cvpost-master
- Reserve one node for two days (change to your preference please)
nodescheduler --request 2 1
When it is ready you will receive an email indicating the node you have reserved. - Connect via ssh from the Bastion host to your reserved node.
Lustre Access and Use Instructions
This guide is open to the public as we have many remote clients accessing our computer storage.
Performance
If you want Lustre access to be fast follow these guidelines:
- Always access NAASC Lustre from the compute
cvpostNNN
nodes. - For the few machines with desktop access to NAASC Lustre, only use those for casual browsing, never intense data reduction; the connection between the desktop and Lustre is much slower than between a cluster node and Lustre.
- Avoid performing recursive searches such as
ls -r
orfind
especially at the top level - Ensure every file is at least 1 MB in size (CASA, by and large, does this)
- Access files in parallel as much as possible (CASA does this)
- CV Lustre is much slower than NAASC Lustre and data reduction on it is not recommended
The "Lustres"
There are two physically and logically distinct Lustre installations at NRAO's Charlottesville office. One is called CV Lustre, one is NAASC Lustre. The distinction is a functional differences:
- NAASC Lustre: high performance scratch area for NAASC related data reduction; found at
/lustre/naasc/
on cluster nodes, NAASC visitor workstations, and the Bastion host (polaris, aka ssh.cv.nrao.edu) and - CV Lustre: general storage for CV staff up to 2 TB; found under
/lustre/cv/
These are distinct Lustre filesystems. While NAASC Lustre is optimized for I/O speed, the focus for CV Lustre has been more to provide raw disk space even at the expense of performance. NAASC Lustre is for use in ALMA-related data reduction and analysis; for any other data (VLA, GBT, EVLA, etc., you should be using CV Lustre instead.
Storage Limits
Note: A new policy on quotas is being developed as of 2015-10-05 which may supercede the advice here.
NAASC Lustre: (/lustre/naasc) is limited to 2 TB of storage per user; larger quotas will be made available for projects and for pipeline related
work.
CV Lustre: (/lustre/cv) is limited to 2 TB or 100,000 inodes (whichever comes first) of storage per user.
Note: quotas will be enforced in the /lustre/naasc/observers
and /lustre/naasc/projects
areas by Group ID.
Lustre Access
Lustre can be accessed via the bastion host (for purposes of copying data via scp offsite), from select visitor workstations, and all the compute "cvpost"
nodes.
For data reduction, please use /lustre/naasc/
.
Copying Between CV and NAASC Lustre
As NAASC Lustre is scratch space, you may want to use space on CV Lustre for longer term storage. You should use one of the "cvpostNNN" nodes to move data from one to the other, e.g.,
mv mybigfile.fits /lustre/cv/users/pmurphy/
Where to Crunch data
Where you attempt to access the Lustre filesystem will in part determine the performance.
For best performance, please reserve access to one of the compute "cvpost" nodes as described in this document.
Any other access to Lustre will use a much slower network.
How to Crunch Data
http://casaguides.nrao.edu
Please be sure to put all your data in your Lustre area.