Data Storage and Retrieval
Data Storage
The NRAO has three Lustre file systems: two at the NAASC (/lustre/naasc for operations and /lustre/cv/ for all other uses) and one at the NMASC (/lustre/aoc). Lustre is a parallel distributed filesystem used in many large-scale computing facilities. It allows NRAO desktops, public machines and clusters at a particular site to share a large file space thus removing the need for repeatedly copying data between systems for processing. They are primarily designed for performance, which is achieved by aggregating individual disk throughput across a large number of disks. As a side effect, the resulting storage volume is typically large compared to desktop storage.
Observer accounts like nm-4386 reside on the NMASC Lustre filesystem in /lustre/aoc/observers/<account name> while observer accounts like cv-4386 reside on the CV Lustre filesystem in /lustre/cv/observers/<account name>. Observers should store all data products and scratch files in this area. NRAO staff will need an area like /lustre/aoc/users/ or /lustre/naasc/sciops/ set up for them by the local IT Helpdesk. Lustre is a shared resource among staff and observers, so we ask that everyone keep their usage as far below the 5TB limit as possible.
NRAO staff users, please do not use your space on the filer (e.g. /users/krowe) to store large data as you probably have a quota measured in GBs nor use it for processing as it can be around ten times slower than Lustre. Exceeding your quota will break many things.
For more information on Lustre see the Lustre FAQ in the Appendix.
Data Retrieval
The NRAO supports the following methods for securely transporting data to remote facilities and has plans to support XSEDE's Globus Connect platform. For the following examples, <account name> would be your nm-* or cv-* account name.
SFTP
SFTP is an encrypted ftp protocol. Once connected, sftp behaves much like any ftp client.
NMASC (nmpost)
sftp <account name>@sftp.aoc.nrao.edu
NAASC (cvpost)
sftp <account name>@sftp.cv.nrao.edu
SCP
SCP is an encrypted copy that can transfer between remote hosts. The format is scp <user>@<remotemachine>:/<remote path> <local path>.
The example below would copy all files ("*") in <account name>'s data sub-directory to the current directory (".") on your local machine.
- NMASC (nmpost)
scp <account name>@ssh.aoc.nrao.edu:/lustre/aoc/observers/<account name>/data/* .
NAASC (cvpost)
scp <account name>@ssh.cv.nrao.edu:/lustre/cv/observers/<account name>/data/* .
LFTP
LFTP is a more sophisticated version of the classic ftp protocol which, among other things, uses multiple channels to speed performance.
NMASC (nmpost)
lftp -u <account name> sftp://sftp.aoc.nrao.edu
NAASC (cvpost)
lftp -u <account name> sftp://sftp.cv.nrao.edu
RSYNC
RSYNC is a versatile, file-copying tool that only copies necessary files; that is the ones that are missing in your local copy. This is useful if, for example, you have deleted some files from your local copy and want to copy just those missing files.
The example below would copy all the files in <account name>'s data sub-directory to a local directory on your machine. Without the trailing "/" rsync would copy the directory and its contents , with a trailing "/" it copies only the contents of the directory. Adding "--delete" to the arguments list will keep the two areas exactly in sync by removing files on your local machine if they have been removed from the remote copy.
NMASC (nmpost)
rsync -vaz <account name>@ssh.aoc.nrao.edu:/lustre/aoc/observers/<account name>/data/ .
NAASC (cvpost)
rsync -vaz <account name>@ssh.cv.nrao.edu:/lustre/cv/observers/<account name>/data/ .
Globus
See How To Log In and Transfer Files with Globus for a step-by-step guide to using Globus.
NMASC (nmpost)
The endpoint for NMASC is nrao#nm. To authenticate, use your observer login if you are an observer or your regular Linux account if you are NRAO staff.
NAASC (cvpost)
The NAASC doesn't yet have an endpoint.