PDSF Data transfer notes
Eric Hjort has extensive notes on the data transfer methods developed for PDSF (http://drupal.star.bnl.gov/STAR/comp/grid/documentation/srm-instructions-for-bulk-file-transfer-to-pdsf). However, until the full chain is working (SRM transfer from RCF:HPSS to PDSF:HPSS, together with fileCatalog updates), the transferhas to be done by hand. These notes are meant to detail that process.
1) Identify a disk cache
I assume that the HPSS SRM transfer isn't working. You'll need to identify ~1TB of space to use as a cache both at RCF and PDSF.
2) Identify files to grab
I usually grab about 1 day's worth of MuDST's at a time:
get_file_list.pl -keys "path,filename" -delim "/" -cond "daynumber=NNN,production=P0Yxx,storage=HPSS,filetype=daq_reco_MuDst,collision=Whatever" -limit 0 > P0Yxx.Whatever.NNN.list
hpss_user.pl -r /path/to/disk/area/ -f P0Yxx.Whatever.NNN.list
3) Transfer files
I usually pull from PDSF, but this is not necessary.
module load globus
myproxy-init myproxy.nersc.gov -l aarose
Then, as starofl (preferably on pdsf2)
myproxy-get-delegation -s myproxy.nersc.gov -l aarose
cd /path/to/disk/at/pdsf
nohup globus-url-copy -r -vb -p 12 gsiftp://stargrid04.rcf.bnl.gov/path/to/disk/at/rcf/ file:///`pwd`/
And that's it. There's a little bit of effort needed to optimize, i.e. the hpss_user.pl command takes ~1 day to complete. The rates are not constant - you'll get the bulk of the data quickly, and stragglers will appear 10 hours later. I usually have 1 day's data transferring while I'm pulling the next day's data from HPSS. It takes 5-6 hours at 40MB/s to transfer one day's worth of MuDST's, so start the transfer when the bulk of the data is available.
- andrewar's blog
- Login or register to post comments