PDSF Data transfer notes

 Eric Hjort has extensive notes on the data transfer methods developed for PDSF (http://drupal.star.bnl.gov/STAR/comp/grid/documentation/srm-instructions-for-bulk-file-transfer-to-pdsf). However, until the full chain is working (SRM transfer from RCF:HPSS to PDSF:HPSS, together with fileCatalog updates), the transferhas to be done by hand. These notes are meant to detail that process.

 

1) Identify a disk cache

I assume that the HPSS SRM transfer isn't working. You'll need to identify ~1TB of space to use as a cache both at RCF and PDSF.

 

2) Identify files to grab

I usually grab about 1 day's worth of MuDST's at a time:

 

get_file_list.pl -keys "path,filename" -delim "/" -cond "daynumber=NNN,production=P0Yxx,storage=HPSS,filetype=daq_reco_MuDst,collision=Whatever" -limit 0 > P0Yxx.Whatever.NNN.list

hpss_user.pl -r /path/to/disk/area/ -f P0Yxx.Whatever.NNN.list

 

3) Transfer files

I usually pull from PDSF, but this is not necessary.

module load globus

myproxy-init myproxy.nersc.gov -l aarose

Then, as starofl (preferably on pdsf2)

myproxy-get-delegation -s myproxy.nersc.gov -l aarose

cd /path/to/disk/at/pdsf

nohup globus-url-copy -r -vb -p 12 gsiftp://stargrid04.rcf.bnl.gov/path/to/disk/at/rcf/ file:///`pwd`/

 

And that's it. There's a little bit of effort needed to optimize, i.e. the hpss_user.pl command takes ~1 day to complete. The rates are not constant - you'll get the bulk of the data quickly, and stragglers will appear 10 hours later. I usually have 1 day's data transferring while I'm pulling the next day's data from HPSS. It takes 5-6 hours at 40MB/s to transfer one day's worth of MuDST's, so start the transfer when the bulk of the data is available.