The online Web server front page is available here. This Drupal section will hold complementary informations.
A list of all operation manuals (beyond detector sub-systems) is available at Operations.
Please use it a startup page.
To get access to the STAR SSH gateways (which will also allow access to the generic Online Linux Pool) please follow the steps below:
The online gatekeepers are named stargw.starp.bnl.gov.
ssh -AX username@stargw.starp.bnl.gov
Logging In Via SSH
Linux Users:
If you would like to be able to use EVO in the 1006 trailer, there is a conference PC setup for use. There is a generic account on the computer for everyone to share.
The account credentials are:
Username: rhicstar
Password: (See below)
Log On To: Conference (This computer)
I will not post the password anywhere that is not encrypted for security purposes, so please come see me in my office (Building 510 Room 1-179) or send me an e-mail containing your GPG public key. If you do not have a GPG public key, please bring your laptop, (for desktop users, call me, and I'll come to see you) and I'll help you set it up. It is quite useful.
This page provides an overview of the Online Linux Pool (OLP). The OLP is a cluster of computers made available to STAR collaborators with the primary intent of allowing real-time and near real-time run support activities, but with general usage and various computing development and testing projects envisioned as resources permit.
The OLP currently consists of 60 Penguin Altus 1300 rack-mount computers physically located in the DAQ Room, plus two servers that provide home directories (over NFS), user authentication (NIS), and Condor pool management. The "worker" nodes are named onl01, onl02, ..., onl60.starp.bnl.gov. These 60 pool nodes have 64-bit Scientific Linux 5.8 (with 32-bit libraries). Any user with access to the stargw.starp.bnl.gov SSH gateways has access to these 60 nodes. Users of the RACF will recognise the "rterm" command, which if executed on a stargw host will attempt to connect to one of the nodes with relatively low load.
Remote filesystems:
All nodes have access to several remote filesystems that may be useful to online computing:
Additionally, onl01-onl06 are configured to access trigger data at:
Condor
A Condor pool is set up on these nodes. Currently onl01-30 are in the pool (moduo a few specialized nodes not accepting jobs), serving as execute hosts.
rterm is available on the Accessing The STAR Protected Network hosts to select the least-loaded system for login. Only a subset of nodes are tagged as interactive for rterm. That list is currently onl01-10 .
Cron
conjobs are accepted and can run only on onl11,12, and 13. To access the exported Web directories in write mode, you need to be part of the onlweb group. Every year before the run, a list of point of contact is compiled and used to determine who should be granted access (this is not given by default).
General system details (hardware, OS, etc):
The Penguin nodes have 64-bit Scientific Linux 5.8 installations (with 32-bit libraries), with these basic hardware specs:
2 x Dual Core AMD Opteron Processor 265, 1800MHz (4 cores per system, no HT)
8GB RAM (PC3200 DDR 400MHz ECC)
4 SATA disk bays
Usage suggestions and miscellaneous note for users:
To reduce the burden on the network and the home directory NFS file server, it is advisable for heavy users of distributed jobs (ie. Condor jobs) to avoid unnecessary access to their individual home directories. As much as possible, please consolidate access to your home directories, and use the local disks as needed for storage. Small, short-term needs (up to the order of 100MB or so) can use subdirectories under /tmp, while larger demands should use directories under /scratch on each individual node. We expect at some point in the future to provide a shared file system (other than the home directories) of some significant size, but are not there yet.
The OLP nodes only allow access based on SSH keys. If you have access to the stargw SSH gateways, you will also automatically have access to the OLP. To make it most convenient, it is suggested that you familiarize yourself with SSH key agents and SSH key forwarding, which can (nearly) eliminate all need for typing passwords/passphrases.
This page will list by year action items, run plans and opened questions. It will server as a repository for documents serving as basis for drawing the requirements. To see documents in this tree, you must belong to the Software and Computing OG (the pages are not public).
This tree will contain information pertaining to run 9.
Run preparation meetings are held at the usual time i.e. on Friday between 3-5 PM (room reserved , will try to keep to one hour weekly). The following groups are invited to join:
The goal of the meetings are to discuss any issues with the infrastructure, networking, code readiness, resource and associated needs as well as any other computing related issues relevant to the smooth running of online operations. The forum and meeting also serves as a vehicule for passing information on time constraints and requirements to the diverse groups in a structured and cohesive manner.
None so far.
This tree will contain information pertaining to run 8.
Run preparation meetings are held on Friday between 3-4 PM (room reserved up to 5 PM). The following groups are invited to join:
The goal of the meetings are to discuss any issues with the infrastructure, networking, code readiness, resource and needs or any other computing and related issues relevant to the smooth running of online operations. The forum and meeting also serves as a vehicule for passing information on time constraints and requirements to and through the diverse group in a structured and cohesive manner.
In Run VII, the forum was used to discuss the security plan and several key reshape of the online computing structure to achieve minimum cyber-security accreditation.
The experts on call for software related run support are:
Role | Name | Primary phone | Office Phone | Other |
Oflline QA + FastOffline production | Jerome Lauret | (631) 786-0479 | (631) 344-2450 | |
Gene Van Buren | (631) 312-4324 | (631) 344-7953 | (631) 775-6620 | |
Online QA, PPlots | Paul Sorensen | (510) 375-5582 | (631) 344-2420 | |
David Kettler | (206) 218-3885 | (206) 616-8141 | ||
Hardware support, online tools | Wayne Betts | (631) 804-6897 | (631) 344-3285 | |
Database | Micheal DePhillips | (631) 356-2257 | (631) 344-2499 | (631) 744-3295 |
When multiple choices are available, the name in bold indicates the current on-call expert. Please, consult this page prior to calling the expert.
Facing a new paradigm of introducing CyberSecurity DOE regulations into our infrastructure, several action items were presented at the 2006 run critique meeting. The presentation is attached below as STAR-Critique-06.pdf (see below). The urgent and immediate items, some of which requiring deep restructuring, were:
The run preparation will be established within the following guidelines
The following table is a first cut to understanding the inter-connections between online hardware.
The main from end Web interface begins from https://www.star.bnl.gov/starkeyw/ (see step by step instructions in the next section). This SSH public key management system has been designed in STAR to address the following requirements:
Such system was developed for STAR and named the "SSH Key Management system" aka SKM. More information can be found in this publication. A side benefit for users also can be seen in the reduction in the number of passwords to remember and type.
You should use your RCF username and Kerberos password (credentials) to enter this interface.
Here is a typical scenario of the system usage:
* Current admins are Wayne Betts and Jerome Lauret.
At this point, John Doe has key-based access to JDOE@FOO. Simple enough? But wait, there's more! Now John Doe realizes that he also needs access to the group account named "operator" on host BAR. Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR. And if Mr. Doe should leave STAR, then an administrator simply removes (disables) him from the system and his keys are removed from both hosts.
There are three things to keep track of -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:
People want access to specific user accounts at specific hosts.
The system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host . To be clear: while the Web interface allows any user to log in, the system does not have any automatic user account detection mechanism at this time, each "{user-}account" has to be added by hand by an administrator for that account to be listed as a possible association for node FOO or BAR.
Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service. The keyservices_client periodically (at five minute intervals by default) polls a central service for its information. In other words, the back-end database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the related account's authorized_keys files accordingly.
In our case, orion.star.bnl.gov hosts all the server services (starkeyw and starkeyd via Apache, and a MySQL database), but they could all be on separate servers if desired.
Only RHEL and Scientific Linux with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or Solaris. Please contact one of the admins (Wayne Betts, Jerome Lauret) if you'd like to volunteer and add your sub-system node to SRKM or if you have any questions.
User access to the Web interface is currently based on the RCF Kerberos authentication. You will hence need a valid BNL/RCF account to access the Web interface and manage key associations for your account.
In 2012, SKM was extended to implement volatile key association (lifetime and expiration may be set to each key associations). This feature allows granting access to a given user to a privileged account on a temporary debugging-need basis (as one example). This feature has also been seen as in use for group account of operational nature having rotating and changing teams at each new runs (in such case, the new list of who is associated to such account need to be re-assessed yearly and the associations would be set for example to expire after a year's period). This is a feature - the default has no expiration.
Active feedback
Sub-system | Coordinator | Calibration POC | Online monitoring POC |
MTD | Rongrong Ma | - same - | - same - |
EMC |
Raghav Kunnawalkam Elayavalli Nick Lukow |
- same - |
Note: L2algo, bemc and bsmdstatus |
EPD | Prashant Shanmuganathan | N/A | - same - |
BTOF | Frank Geurts | - same - | Frank Geurts Zaochen Ye |
ETOF | Florian Seck | - same - | Florian Seck Philipp Weidenkaff |
HLT | Hongwei Ke | - same - | - same - |
Other software coordinators
sub-system | Coordinator |
iTPC (TPC?) | Irakli Chakaberia |
Trigger | Akio Ogawa |
DAQ | Jeff Landgraf |
... |
In RUN: EEMC, EMC, EPD, ETOF, GMT, TPC, MTD, TOF
Test: FST, FCS, STGC (no tables)
Desired init dates where announced to all software coordinators:
- Geometry tag has a timestamp of 20191120 - Simulation timeline [20191115,20191120[ - DB initialization for real data [20191125,...] Please initialize your table content appropriate yi.e. sim flavor initial values are entered at 20191115 up to 20191119 (please exclude the edge), ofl initial values at 20191125 (run starting on the 1st of December, even tomorrow's cosmic and commissioning would pick the proper values).
EMC = ready
ETOF = ready - initialized at 2019-11-25, no sim (confirming)
TPC = NOT ready [look at year 19 for comparison]
MTD = ready
TOF = Partially ready? INL correction, T0, TDC, status and alignement tables initialized
EPD = gain initialized at 2019-12-15 (!?), status not initialized, no sim
EEMC = ready? (*last init at 2017-12-20)
GMT = ready (*no db tables)
EMC = ready
ETOF = ready? initialized at 2019-11-25, no sim
TPC = NOT ready
MTD = ready
TOF = NOT ready
EPD = gain initialized at 2019-12-15 (!?), status not initialized, no sim
EEMC = ready? (*last init at 2017-12-20)
GMT = ready (*no db tables)
Sub-system | Coordinator | Calibration POC | Online monitoring POC |
MTD | Rongrong Ma | - same - | - same - |
EMC EEMC |
Raghav Kunnawalkam Elayavalli Nick Lukow |
- same - |
Note: L2algo, bemc and bsmdstatus |
EPD | [ TBC] | - same - | - same - |
BTOF | Frank Geurts | - same - | Frank Geurts Zaochen Ye |
ETOF | Florian Seck | - same - | Florian Seck Philipp Weidenkaff |
HLT | Hongwei Ke | - same - | - same - |
TPC | Irakli Chakaberia | - same - |
Flemming Videbaek
|
Trigger detectors | Akio Ogawa | - same - | - same - |
DAQ | Jeff Landgraf | N/A |
Sub-system | Coordinator | Calibration POC | Online monitoring POC |
MTD | Rongrong Ma | - same - | - same - |
EMC EEMC |
Raghav Kunnawalkam Elayavalli Nick Lukow |
- same - |
Note: L2algo, bemc and bsmdstatus |
EPD | Prashanth Shanmuganathan (TBC) | Skipper Kagamaster | - same - |
BTOF | Zaochen | - same - | Frank Geurts Zaochen Ye |
ETOF | Philipp Weidenkaff | - same - | Philipp Weidenkaff |
HLT | Hongwei Ke | - same - | - same - |
TPC | Yuri Fisyak | - same - | Flemming Videbaek |
Trigger detectors | Akio Ogawa | - same - | - same - |
DAQ | Jeff Landgraf | N/A | |
Forward Upgrade | Daniel Brandenburg | - same - | FCS - Akio Ogawa sTGC - Daniel Brandenburg FST - Shenghui Zhang/Zhenyu Ye |
---
Status - 2021/10/13
Sub-system | Coordinator | Calibration POC | Online monitoring POC |
MTD | Rongrong Ma | - same - | - same - |
EMC EEMC |
Raghav Kunnawalkam Elayavalli |
- same - |
Note: L2algo, bemc and bsmdstatus |
EPD | Prashanth Shanmuganathan (TBC) | Skipper Kagamaster | - same - |
BTOF | Zaochen | - same - | Frank Geurts Zaochen Ye |
ETOF | Philipp Weidenkaff | - same - | Philipp Weidenkaff |
HLT | Hongwei Ke | - same - | - same - |
TPC | Yuri Fisyak | - same - | Flemming Videbaek |
Trigger detectors | Akio Ogawa | - same - | - same - |
DAQ | Jeff Landgraf | N/A | |
Forward Upgrade | Daniel Brandenburg | - same - | FCS - Akio Ogawa sTGC - Daniel Brandenburg FST - Shenghui Zhang/Zhenyu Ye |
---
Below are the related meetings:
TPC Software – Richard Witt NO GMT Software – Richard Witt NO EMC2 Software - Alice Ohlson Yes FGT Software - Anselm Vossen Yes FMS Software - Thomas Burton Yes TOF Software - Frank Geurts Yes Trigger Detectors - Akio Ogawa ?? HFT Software - Spyridon Margetis NO (no DB interface, hard-coded values in preview codes)
Coordinator Possible POC ------------ --------------- TPC Software – Richard Witt GMT Software – Richard Witt EMC2 Software - Alice Ohlson Alice Ohlson FGT Software - Anselm Vossen FMS Software - Thomas Burton Thomas Burton TOF Software - Frank Geurts Trigger Detectors - Akio Ogawa HFT Software - Spyridon Margetis Hao Qiu
Directories we inferred are being used (as reported in the RTS Hypernews) | |||
scaler | Len Eun and Ernst Sichtermann (LBL) | This directory usage was indirectly reported | |
SlowControl | James F Ross (Creighton) | ||
HLT | Qi-Ye Shou | The 2012 directory had a recent timestamp but owned by mnaglis. Aihong Tang contacted 2013/02/12 Answer from Qi-Ye Shou 2013/02/12 - will be POC. |
|
fmsStatus | Yuxi Pan (UCLA) | This was not requested but the 2011 directory is being overwritten by user=yuxip FMS software coordinator contacted for confirmation 2013/02/12 Yuxi Pan confirmed 2013/02/13 as POC for this directory |
|
Spin PWG monitoring related directories follows |
|||
L0trg | Pibero Djawotho (TAMU) | ||
L2algo | Maxence Vandenbroucke (Temple) | ||
cdev | Kevin Adkins (UKY) | ||
zdc | Len Eun and Ernst Sichtermann (LBL) | ||
bsmdStatus | Keith Landry (UCLA) | ||
emcStatus | Keith Landry (UCLA) | ||
fgtStatus | Xuan Li (Temple) | This directory is also being written by user=akio causing protection access and possible clash problems. POC contacted on 2013/02/08, both Akio and POC contacted again 2013/02/12 -> confirmed as OK. |
|
bbc | Prashanth (KSU) |
Sub-system | Coordinator | Check done |
DAQ |
Jeff Landgraf | |
TPC | Richard Witt | |
GMT | Richard Witt | |
EMC2 | Mike Skoby Kevin Adkins |
|
FMS | Thomas Burton | |
TOF | Daniel Brandenburg | |
MTD | Rongrong Ma | |
HFT | Spiros Margetis | (not known) |
Trigger | Akio Ogawa | |
FGT | Xuan Li |
Sub-system | Coordinator | Calibration POC |
DAQ | Jeff Landgraf | - |
TPC | Richard Witt | - |
GMT | Richard Witt | - |
EMC2 | Mike Skoby Kevn Adkins |
- |
FMS | Thomas Burton | - |
TOF | Daniel Brandenburg | - |
MTD | Rongrong Ma | Bingchu Huan |
HFT | Spiros Margetis | Jonathan Bouchet |
Trigger | Akio Ogawa | - |
FGT | Xuan Li | N/A |
Not needed 2013/11/25 | ||
SlowControl | Chanaka DeSilva | OKed on second Run preparation meeting |
HLT | Zhengquia Zhang | Learn incidently on 2014/01/28 |
HFT | Shusu Shi | Learn about it on 2014/02/26 |
Not needed 2013/11/25 | ||
L0trg | Zilong Chang Mike Skoby |
Informed 2013/11/10 and created 2013/11/15 |
L2algo | Nihar Sahoo | Informed 2013/11/25 |
Not needed 2013/11/25 | ||
zdc | may not be used (TBC) | |
bsmdStatus | Janusz Oleniacz | Info will be passed from Keith Landry 2014/01/20 Possible backup, Leszek Kosarzewski 2014/03/26 |
emcStatus | Janusz Oleniacz | Info will be passed from Keith Landry 2014/01/20 Possible backup, Leszek Kosarzewski 2014/03/26 |
Not needed 2013/11/25 | ||
bbc |
Akio Ogawa | Informed 2013/11/15, created same day |
Run 15 was preapred essentiallydiscussing with indviduals and a comprehensive page not maintained.
scaler | ||
SlowControl | ||
HLT | Zhengqiao | Feedback 2015/11/24 |
HFT | Guannan Xie | Spiros: Feedback 2015/11/24 |
Akio: Possibly not needed (TBC). 2016/01/13 noted this was not used in Run 15 and wil probably never be used again. | ||
fmsTrg | Confirmed neded 2016/01/13 | |
fps | Akio: Not neded in Run 16? Perhaps later. | |
L0trg | Zilong Chang | Zilong: Feedback 2015/11/24 |
L2algo | Kolja Kauder | Kolja: will be POC - 2015/11/24 |
cdev | Chanaka DeSilva | |
zdc | ||
bsmdStatus | Kolja Kauder | Kolja: will be POC - 2015/11/24 |
bemcTrgDb | Kolja Kauder | Kolja: will be POC - 2015/11/24 |
emcStatus | Kolja Kauder | Kolja: will be POC - 2015/11/24 |
Not needed since Run 14 ... May drop from the list | ||
bbc |
Akio Ogawa | Feedback 2015/11/24, needed |
rp |
Sub-system | Coordinator | Calibration POC |
DAQ | Jeff Landgraf | - |
TPC | Richard Witt Yuri Fisyak |
- |
GMT | Richard Witt | - |
EMC2 | Kolja Kauder Ting Lin |
- |
FMS | Oleg Eysser | - |
TOF | Daniel Brandenburg | - |
MTD | Rongrong Ma | (same confirmed 2015/11/24) |
HFT | Spiros Margetis | Xin Dong |
HLT | Hongwei Ke | (same confirmed 2015/11/24) |
Trigger | Akio Ogawa | - |
RP | Kin Yip | - |
This is to serve as a repository of information about networking in the online environment.
The network layout at the STAR experiment has grown from a base laid over ten years ago, with a number of people working on it and adding devices over time with little coordination or standardization. As a result, we have, to put it bluntly, a huge mess of a network, with a mix of hardware vendors and media, cables going all over the place, many of which are unlabelled and now buried to the point of untraceability. We have SOHO switches all over the place, of various brands, ages and capabilities. (It was only about one year ago all hubs were at least replaced with switches, or so I think – I haven’t found any hubs since then.) There are a handful of “managed” switches, but they are generally lower-end switches and we have not taken advantage of even their limited monitoring capabilities. (In the case of the LinkSys switches purchased one year ago, I found their management web interface poor – slow, buggy and not very helpful.)
In addition to the general messiness, a big (and growing) concern has been that during each of the past several years, there have been a handful of periods of instability in the starp network, typically lasting from a few minutes to hours (or even possibly indefinitely in the most recent cases which were resolved hastily with switch hardware replacements in the middle of RHIC runs). The cause(s) of these instabilities has never been understood. The instabilities have typically manifested as slow communications or complete lack of communication with devices on the South Platform (historically, most often VME processors). Speculation has tended to focus on ITD security scanning. While this has been shown to be potentially disruptive to some individual devices and services, broad effects on whole segments of the network have never been conclusively demonstrated, nor has there been a testable, plausible explanation for the mechanism of such instability.
The past year included the two most significant episodes of instability yet on starp, in which LinkSys SLM 2048 switches (after weeks or months of stability) developed problems that appeared to be similar to prior issues, only more severe. The two had been purchased as a replacement (plus spare) for a Catalyst 1900 on the South Platform. When the first started showing signs of trouble, it was replaced by the second, which failed spectacularly later in the run, becoming completely unresponsive through its web interface and pings, and was only occasionally transmitting any packets at all, it seemed. (After all devices were removed, and the switch rebooted, it returned to normal on the lab bench, but has not been put back into service.)
At this point, all devices were removed from the LinkSys switch and sent through a pair of unmanaged SOHO switches, which themselves each link to an old 3Com switch on the first floor. Since then, no more instabilities have been noted, but it has left a physical cabling mess and a network layout that is quite awkward. (And further adding to the trouble, at least one of the SOHO switches has a history of sensitivity to power fluctuations, every once in a while needing to be power-cycled after power dips or outages.
In addition, there have been superficially similar episodes of problems on the DAQ/TRG network, which shares no networking hardware with the starp network. As far as I know, these episodes spontaneously resolved themselves. (Is this true?) Speculation has been on “odd” networked devices (such as oscilloscopes) generating unusual traffic, but here too there is no conclusive evidence of the cause. Having no explanation, it seems likely this behavior will be encountered again.
There are several “core” pieces currently. Core is defined somewhat vaguely as connecting lots of devices or requiring relatively high performance:
1. ITD’s main switch in the DAQ room
2. DAQ’s event builder switch in the DAQ room
3. the starp switch on the South Platform
4. the DAQ/TRG switch on the South Platform
5. the Force 10 switches for the HPSS network in the DAQ room
It seems likely that any reshape will have to include those same core components, though perhaps some combinations are possible at the hardware level using VLANs or other technologies. (combining starp and DAQ/TRG on the platform on to a single large switch, for instance)
This switch chassis is in the networking rack in the northwest corner of the DAQ room. It is managed by ITD. STAR has no way to interact with this switch at the software/configuration level.
Slot 1: WS-X4013 (Supervisor II Engine, fiber uplink to 515 and local management port)
Slot 2: WS-X4548-GB-RJ45 (48 1Gb/s copper ports @8:1 oversubscription) port 43 is 162 subnet, rest are subnet 60.
Slot 3: WS-X4232-RJ-XX (32 copper 100 Mb/s) plus a WS-U5404-FX-MT daughter card with 4 MTRJ fiber ports at 100Mb/s)
Slot 4: WS-4148-RJ (48 copper 100Mb/s) - mix of subnets 60 and 162?
Slot 5: WS-4148-RJ (48 copper 100Mb/s) - all subnet 60?
Slot 6: WS-X4306-GB (6 GBIC (not mini!) ports, 3 of which have 1000-SX modules with SC connectors)
Here we can keep miscellaneous files documenting the state of the network.
First, I have attached an image showing the current (late 2009/early 2010) switch layout and links in the WAH. ("WAH_switches.pdf")
Then there is an "after" picture with a rough idea of the patch panel placement to replace most of the unmanaged switches. ("WAH_patch_panels.pdf")
For the South Platform, a more refined patch panel plan was put together in June 2010 ("Network Plan for South Platforms.doc")
There is an attachment with general guidelines for installing UTP ("Cat5e_Network_cable.ppt")
WAH: (starp and DAQ/TRG devices are scattered throughout these locations. I am going to use the term “satellite racks” to include all locations within the C-AD PASS system that are NOT on the South Platform. Also, note that the satellite racks are semi-mobile, and the entire detector platform (North and South) can move into the Assembly Building.):
- PMD racks: ~3 devices on starp and ~3 on DAQ/TRG
- FMS/FPD east side: Handful of devices on DAQ/TRG and on star
- Southwest corner work area: rarely more than two systems here, but might want starp, “trailers” and DAQ/TRG networks here for use as needed
- EEMC racks, west side: Handful of devices on DAQ/TRG and on starp
- FPD/FMS west racks: Handful of devices on DAQ/TRG and on starp
- PP2PP east and west: at least one VME processor on DAQ/TRG on each side - these are in the RHIC tunnel, technically not in the WAH.
- South platform – (IMPORTANT NOTE: The south platform must remain electrically isolated from the rest of the facility – there can be no conducting cables running from the South Platform to other locations)
o First floor: Three rows of 8-9 racks each (volatile, in that subsystems and components are installed or removed each year)
o Second floor: Three rows of 8-9 racks each (volatile)
- North platform: currently unoccupied, but has had devices in the past and a switch on the starp network is still present there, with a fiber link back to the South Platform (somewhere!)
Control Room:
- Perimeter (~3 dozen PCs), almost all on starp, but
o 2-3 on DAQ/TRG
o 4-5 on C-AD 108
o 1-2 on C-AD 90 network?.
o Numerous small unmanaged switches in this room currently
DAQ Room: (Highest performance of the entire facility is needed in rack row DA, including a minimum 56-port switch with non-blocking/line rate 1Gb inter-links on the DAQ/TRG network)
- three “rows” plus two networking racks:
o the “old” network rack and the “new network rack” near the northwest corner
o rack row “DA” on west side (nearest the Control Room)
o shelf row in middle with a racks at each end.
o East row: ~6 stand-alone starp servers (one of which has a DAQ/TRG connection as well), along with a handful of VME devices on starp. DAQ or trigger might have a device or two here. The rack space is primarily occupied by devices on a C-AD network.
GMR:
- 3 PCs – generally stable area.
Clean room:
- several jacks needed, network use may vary between starp, daq/trg and the 130.199.162 subnet depending on the active use at any time
1006C and 1006D (trailers):
- typically only subnet 130.199.162 is needed here.
Online network reshape notes from the week of Oct. 18, 2009
During this week, three meetings were held to discuss the STAR online networking reshape plans.
The first meeting included Jeff Landgraf, Wayne Betts, Dan Orsatti (ITD) and Frank Burstein (ITD). At this meeting the ITD network engineers presented two proposals for core network components based on information previously provided to them by STAR. The two options were Force-10 based and Cisco-based, with costs of approximately $150,000 and $100,000 respectively. They included a shared infrastructure for the DAQ/TRG and STARP networks, including a switch redundancy in the DAQ room to handle the two networks and meet DAQ’s relatively high performance needs in the DAQ room. These ITD options are generally smart, expandable, highly configurable and well-supported by ITD, and meet the initial requirements.
However, in informal discussions since then, Bill Christie suggested that we should consider the possibility of radiation damage and/or errors in any electronic equipment in the WAH. While this had been mentioned as a possibility in the past, it was not generally taken seriously by those of us in STAR looking after the networks. Nor is there any way for us to test this to a standard of “beyond reasonable doubt” (or any other standard really). At Bill’s suggestion, we (Jeff L., Wayne B., Jack E., Yuri G. and Bill C.) met with three members of C-AD’s networking group, who stated they were certain that radiation could impair switches and strongly suggested that ITD’s suggested equipment was inappropriate for a radiation area. They also provided some feedback from individuals at two other laboratories that networking equipment in radiation areas are subject to upsets, with one explanation for effects on metal-oxide semiconductors, which at face value would suggest that newer (thus generally smaller) electronic components would be less susceptible, however my intuition is that smaller electronics are denser, and more easily upset by smaller deposited charge, and thus might be more susceptible.
Here are excerpts from the other labs:
From JLab: "The flash memory loses its ability to hold data, making it useless. We have worked around the problem by pulling cable or fiber back to lower radiation areas wherever we can. Because we made these cabling changes when we were only using cisco fixed-configuration 100Mbit switches ( 29XX models), I have no data for Gigabit switches. Since our experience is that it's the flash memory that fails, I'd expect no better performance from any other switches. All of our switches that use modular supervisor modules are outside of radiation areas."
From FermiLab: "The typical devices used employ metal oxide semiconductors and the lock up happens when ionizing radiation is trapped in the gate region of the devices. We see this happen at our two detectors (CDF and DZero) when losses go up and power supplies circuits latch up. The other thing working in the positive direction is that when IC feature sizes go down, there is less likelihood for the charge to get trapped so they are more radiation tolerant. Having said all that I can't answer your specific question because we don't put switches or routers in the tunnel at all."
All this said, the general consensus was that we should move as much “intelligence” as far away from the beam line as reasonably possible. (Until now, the “big” switches on the platform have actually been about as close to the beam line as possible!) This means putting any switches in rack rows 1C. Given both the cost and the radiation concern, we (the STAR personnel) agreed to investigate less expensive switches than ITD’s suggestion, while trying to provide some level of intelligence for monitoring. We also have a consensus that the DAQ/TRG and STARP networks should try to use common hardware whenever possible, and that we should work to remove as many SOHO-type unmanaged switches as possible as time permits (replacing them with well-documented and labelled patch panels feeding back to core switches). The C-AD personnel also recommended Cisco’s 2950, 2960 and 3750 switches and Garrett products in general. One more miscellaneous tidbit from Jack we should avoid LanCast media convertors.
The final meeting of the week included Jerome, Wayne and Matt Ahrenstein, in which Jerome was briefed on the two prior meetings and he generally agreed with the direction we are taking. At this meeting, we selected an additional area to try to clean-up before the run, specifically the racks on the west side, where there are at least four 8-port unmanaged switches (3 on DAQ/TRG and one on STARP). He also suggested we consult with Shigeki from the RACF about the whole affair, and is trying to arrange such a meeting as soon as possible.
In addition to this, Jeff has also stated that while either ITD solution would meet DAQ’s needs for several years, he believes he can obtain adequate performance for far less money with lower end equipment. Here is Jeff's latest on the DAQ needs for the network:
"My target is 20Gb/sec network capability across switches. In likely scenarios, the network capability would be significantly higher than this because hi bandwidth nodes would all be on the same switch (ironically, the cheaper switches mostly seem to be line-speed switches internally, unlike the big cisco switches...) However, in the current year, I'll have a hard limit of 12 gigabit ethernet cards incoming on EVBs for a hard max of 12Gb/sec. The projected desired data, according to the trigger board is around 6Gb/sec (600MB/sec). I don't expect much more than a factor of two through the EVBs above this 600MB/sec in the lifetime of STAR (meaning current TPC + HFT + FGT), although there are big uncertainties particularly for the HFT. The one lump in the planning involves potential L3 farms - and I don't know how this will play out. There are many scenarios some of which would not impact the network (ie... specialized hardware plugged into the TPX machines...), but my current approach is that the network needs will have to be incorporated in the L3 farm design plan..."
Where does this leave us? We need to quickly evaluate options for the “big” switches for the DAQ room and the South Platform. The DAQ and Trigger groups have 3(?) similar managed switches that might be adequate for the South platform (including a spare), and we should look into the Cisco models suggested by C-AD. We also should let ITD make another round of suggestions based on our discussions to date, and especially focus with them on what to do with the large ITD switch in the DAQ room that currently has the link to the rest of the campus “public” network. And we need to do this rather hastily.
Do we support multiple networks on single switches with VLANs, switch port segmentation or other means? For instance, at remote spots, like PMD’s racks, can we put in a single switch and have it handle both starp and DAQ/TRG? Daniel Orsatti's most recent advice was leaning towards having a few large switches in four or five core places with VLANs and installing patch panels at or near the various locations needing network connections.
Is there a single brand/line of switch equipment that meets most or all of our goals? Can we get a line of switch products that includes a range from small (~8 port) switches up to the large switches required for DAQ’s event builders or ITD’s main switch, such that they can interoperate and be part of shared monitoring? (If we go with a patch-panels-to-big-switches approach, then the small switches would not be necessary.)
What kind of monitoring can we expect and how much effort will it take for it to be useful? SNMP-based? Nagios? Etc…
Can we setup a shared but “private” monitoring network for the managed switches, such that starp and DAQ/TRG monitoring share the same infrastructure? (Most likely, yes.)
Can fiber connectors be easily changed/replaced/repaired? STAR apparently does not have the tools to terminate fibers at this point. Do we want to acquire the tools and know-how to do this, or continue to rely on ITD and/or folks like Frank Naase (C-AD) who have done most of our fiber termination to date?
The goal of the online networking reshape is to provide a stable and well-understood networking environment with the possibility of future expansion to meet STAR’s foreseeable needs over time. The physical layout needs to be well understood, with elements of redundancy and/or easily swapped parts on hand as much as possible. The devices on the network should be known, including their location, what other systems they are expected to interact with and traffic volumes. Significant networking errors should be detected at the switch level and allow for troubleshooting without significant disruption to large parts of the network.
Along the way, it will be very useful to increase the availability of knowledge and sources of assistance related to the network. Naturally this calls for a well documented network in any case. Consolidating networking hardware into a common brand or line for the multiple online networks (which are currently a hodgepodge) may reduce the number of errors encountered, improve the ability of STAR's personnel to understand more fascets of the networking environment and allow for better monitoring of the network performance. Our network should mesh well with existing ITD infrastructure so that their expertise can be brought to bear as needed. However, ITD expertise cannot be the sole source of support for the online networks – at least two individuals in STAR (but not much more than that) should have broad access to realtime network data and configuration. STAR’s 24-hour on-call experts (DAQ and online computing in particular) need to be able to respond quickly to incidents and gather clues and information from all sources.
I think we need to start from the core and work outwards. This will allow us to finish as much as possible before the run starts and start to see the most benefits as early as possible. The two big pieces at the core (in order of importance) are:
1. DAQ’s event builder switch, which calls for 56 (let’s say 64) non-blocking/line speed 1Gb/s ports. No matter what, this piece needs to be put in place before the run starts. We can probably limp by with everything else as it exists now if we have to, but this has to be a new piece of hardware in place before December 1 (is this a reasonable deadline?).
2. Whatever ITD wants to replace the current Catalyst 4000-series chassis and blades in the DAQ room.
After this, the next items for consideration/replacement are the starp and DAQ/TRG switches on the South Platform.
Then it is on to the satellite racks in the WAH with their relatively small number of devices.
Then the DAQ room, cleaning up the handful of unmanaged switches that exist for both starp and DAQ/TRG.
Control Room clean-up. The available wall jacks in the Control Room are insufficient for the number of devices, and many of the jacks are inaccessible behind the west side console, but at least this area is always accessible and has had few problems, so it isn’t a high priority.
April 1, 2025 (no, not an April Fools!) - THIS PAGE IS OBSOLETE
Instead, please refer to https://drupal.star.bnl.gov/STAR/public/operations/WAH-Network-Switch-NPS-details
This documents the Network Power Switch plugs used to remotely power cycle STAR's network switches in the Wide Angle Hall.
Updated February 8, 2019 (Ideally, STAR's RackTables would be the definitive source for this information, but it is far from complete.)
*ID | Location | Switch IP name | NPS IP name | NPS plug | NPS access method | NPS type |
SW22 | east racks | east-trg-sw.trg.bnl.local | pxl-nps.starp.bnl.gov | 8 | telnet, http (ssh and https available, but not enabled) | APC AP7901 (August 2015) |
SW56 | east racks | east-s60.starp.bnl.gov | eastracks-nps.trg.bnl.local | 8 | ssh (slow to respond to initial connection) | APC AP7901 (August 2012) |
SW59 | SP 1C4 | splat-s60.starp.bnl.gov | netpower1.starp.bnl.gov | 3 | telnet, http | APC |
SW2 | SP 1C4 | splat-trg2.trg.bnl.local | netpower1.starp.bnl.gov | 1 | telnet, http | APC |
SW27 | SP 1C4 | switch1.trg.bnl.local | netpower1.starp.bnl.gov | 2 | telnet, http | APC |
SW60 | SP 1C4 | splat-s60-2.starp.bnl.gov | netpower2.starp.bnl.gov | A1 | ssh (has key for wbetts) | WTI NPS-8 |
SW28 | SP 1C4 | switchplat.scaler.bnl.local | netpower2.starp.bnl.gov | A2 | ssh ssh (has key for wbetts) | WTI NPS-8 |
SW55 | west racks | west-s60.starp.bnl.gov | westracks-nps.trg.bnl.local | 1 | ssh, http | APC |
SW30 | west racks | switch2.trg.bnl.local | eemc-pwrs1.starp.bnl.gov | A4 | telnet | old WTI |
SW51 | NP 1st floor | nplat-s60.starp.bnl.gov | north-nps1.starp.bnl.gov | 1 | telnet, ssh, http | APC AP7900B (January 2019) |
A. Only use managed switches and have each networked device plug directly into a managed switch port.
- Eliminate all “dumb” consumer/SOHO/desktop switches – they are not robust, add to confusion when troubleshooting and prevent isolation of individual devices
- allow the blocking of any single device at any time through its nearest switch’s management interface
- block the addition of any new, unknown nodes and/or be informed of anything showing up unexpectedly
- ability to monitor individual ports for traffic volumes, link settings, errors, major links going down, preferably with some history/logging.
- allow real-time monitoring and alerts for unusual event (capabilities will be hardware/vendor dependent and subject to available time to develop monitoring tools and become familiar with capabilities)
B. All devices should be within 10-15 feet of a “core” patch panel or network switch.
- Individuals working on detector subsystems should not have to install network cables that cross rack rows, go from one floor (or room) to another, etc.
- Piecemeal additions of network segments by subsystems should not be done – that is to say, no one should be adding switches to the network other than core personnel using “approved” devices consistent with the rest of the network components.
- This calls for cabled and labeled patch panels and/or switches liberally placed throughout the WAH, the Control Room and the DAQ Room.
C. Some degree of “commonality” between the infrastructures of the starp and DAQ/TRG networks. Same line of hardware, media convertors (when needed), switches, monitoring tools, possibly even shared switches with VLANs. This is a big question – are VLAN’s viable to share switch hardware amongst starp and DAQ/TRG? A shared “private” management network for the switches is likely a good idea.
D. An easily extensible network, such that new locations can be added easily, and existing locations can have additional capacity added and subtracted in accord with the other goals.
E. Redundant links (fibers or copper, as appropriate) available between all linked core components (preferably with automatic failover).
F. Spares on hand for just about everything – a good reason to use as few models of hardware as possible. If we develop a plan with 10 small 8-port switches in various locations, ideally all 10 will be identical and we will have one or two spares on the shelf at all times.
G. All network components should be on UPS power so that short and/or localized power outages do not bring down portions of the network. This is not terribly important, but should be kept in mind and allowed for when feasible.
H. (Added after the initial items above) Move IC-based devices (switches) away from beam line and attempt to reduce radiation load. Our working hypothesis, based on anecdotal evidence, is that at least some of the networking problems last year were caused by errors caused by radiation. The two "big" switches on the South Platform have historically always been in just about the WORST place for radiation load, so these need to be moved away from the beam line.
Document everything!
All hardware with an IP address should be labelled.
All installed cables should have a label on each end that is adequate to quickly locate the other end.
All patch panel ports with cables connected should be labelled appropriately to identify the other end.
All network equipment (switches, patch panels, cable runs, etc.) need to be documented, preferably in appropriate documents in Drupal.
Copper connections:
Use Cat5e or higher graded cables.
Use yellow cables for devices connected to the STARP network (130.199.60-61.x IP addresses).
Use green cables for devices connected to the DAQ/TRG network (172.16.x.x IP addresses).
Use colors other than yellow and green for any other network connections.
Use T568A termination when adding connectors to bare cable.
Fiber Connections:
Use 50 micron multi-mode fiber.
Use 1000Base-SX fiber transcievers where possible.
“starp”: 130.199.60.0/23
“DAQ/TRG”: 172.16.0.0/16 (non-routed)
“HPSS”: RCF network for DAQ → HPSS transfers
“Alexei”: Alexei’s video camera and laser network (currently consists of a switch on the South Platform and a switch in the DAQ room connected by a fiber pair?). This includes 3-4 PCs including obsolete Windows OSes (e.g. Win 98). No devices on this network are dual-homed, so it is very isolated from everything else and is mentioned here for completeness.
“trailers”: 130.199.162. - includes wired connections for printers, vistors’ laptops and workstations not directly involved in operations and may exist outside of the trailers, such as the Control Room for visitors’ laptops while on shift.
“Wireless”: Not really relevant conceptually, but there are also three ITD wireless access points in the area.
“C-AD 108” and “C-AD 90”: C-AD has at least two networks operating in the DAQ and Control Rooms, which are left well enough alone in their hands, but are mentioned here for the sake of completeness.
This page will now hold the shift accounting pages. They complement the Shift Sign-up process by documenting it.
Requests to serve additional shifts should be made PRIOR to the final calculation of the dues to the shift committee (D. Keane and D. Smirnov). Please, refer to the important dates section in this document for the "until when" you could make such request.
The below table shows the percentage of missed shifts over the past 4 years. This information can be used to exclude authors from the author list in Run 16.
Note: If your institution is in this table and fails again to fulfill its dues and the 4 years average is below the threshold defined by our author exclusion policy (see STAR Note 0545), author would be excluded.
Institution | Missed percentage, historical |
---|---|
Frankfurt Institute for Advanced Studies (FIAS) | 71% |
Institute of Modern Physics, Lanzhou | 41% |
University of Rajasthan | 31% |
Pusan National University | 25% |
STAR shifts begin January 12, 2016 with cosmic data taking shifts.
As usual, period coordinators are pre-assigned / pre-signed as selected by the Spokesperson office.
2) We've agreed to pre-assign the following QA shifts under the new family-related policy: Sevil Salur (LBNL) FEB 16 Richard Witt (Yale) FEB 23 Juan Romero (UC Davis) MAY 31 3) Bob Tribble (TAMU) is pre-assigned to a shift during APR 12-19. 4) To correct an unusual rounding anomaly, we've agreed to subtract one week from Valparaiso U dues.
Dear STAR Collaborators: We have just received the guidance from DOE (to BNL) that there will be 20 cryo-week of RHIC run instead of the originally planned 22 weeks. Our shift sign-up was designed for 22 weeks. For those who have already signed up for the last two weeks, please try to un-sign and help to fill other open slots. By now we have 8 open slots and need to un-sign 24. For those who are not able to re-sign to other spots, we will credit your dues, but may ask for help if slots open due to unexpected events (visa etc.). I am looking forward to a successful run 16 and exciting physics from it. Happy Holidays! Zhangbu
After:
Requests to serve additional shifts should be made PRIOR to the final calculation of the dues to the shift committee (D. Keane and D. Smirnov). Please, refer to the important dates section in this document for the "until when" you could make such request.
The below table shows the percentage of missed shifts over the past 4 years. This information can be used to exclude authors from the author list in Run 17.
Note: If your institution is in this table and fails again to fulfill its dues and the 4 years average is below the threshold defined by our author exclusion policy (see STAR Note 0545), author would be excluded.
STAR shifts begin January XX, 2017 with cosmic data taking shifts.
As usual, period coordinators are pre-assigned / pre-signed as selected by the Spokesperson office.
Feb. 7-March 7 Oleg Eyser (BNL) March 7- April 4 Sal Fazio (BNL) April 4 – April 28 Shuai Yang (BNL) April 28-May 23 Xiaofeng Luo (CCNU) May 23 – June 20 Jinlong Zhang (LBL) June 20 – July 11 Nihar Sahoo (TAMU)
0) Bob Tribble: SL, evening, beginning Mar 21 1) Pavla + Pavol: 5 shifts as below. 2) Juan Romero wants QA for 1 week, beginning May 02. 3) Sevil Salur wants QA for 1 week, beginning Mar 07. 4) Richard Witt wants QA for 1 week, beginning Mar 21. 5) Lanny Ray, as always, is pre-assigned the first QA shift. 6) Jan Rusnak wants QA for 1 week, beginning Apr 04. 7) FIAS wants pre-assigned shifts like last year: Day, beginning Apr 4: Belousov, 2 weeks of shift crew; Evening, beginning Apr 4: Pugash, 2 weeks of shift crew; Day, beginning Apr 4: Vassiliev, 1 week DO trainee + 1 week DO; Evening, beginning Apr 4: Zyzak, 1 week DO trainee + 1 week DO
9 WEEKS PRE-ASSIGNED QA AS FOLLOWS ================================== Lanny Ray (UT Austin) QA Mar 5 Richard Witt (USNA/Yale) QA Mar 19 Sevil Salur (Rutgers) QA Apr 16 Wei Li (Rice) QA Apr 23 Kevin Adkins (Kentucky) QA May 14 Juan Romero (UC Davis) QA May 21 Jana Bielcikova (NPI, Czech Acad of Sci) QA May 28 Yanfang Liu (TAMU) QA June 25 Yanfang Liu (TAMU) QA July 02 8 WEEKS PRE-ASSIGNED REGULAR SHIFTS AS FOLLOWS ================================== Bob Tribble (BNL) Feb 05 SL evening Daniel Kincses (Eotvos) Mar 12 DO Trainee Day Daniel Kincses (Eotvos) Mar 19 DO Day Mate Csanad (Eotvos) Mar 12 SC Day Ronald Pinter (Eotvos) Mar 19 SC Day Carl Gagliardi (TAMU) May 14 SL day Carl Gagliardi (TAMU) May 21 SL day Grazyna Odyniec (LBNL) July 02 SL evening
For the calculation of shift dues, there are two considerations.
1) The length of time of the various shift configurations (2 person, 4 person no trainees, 4 person with trainees, plus period coordinators/QA shifts)
2) The percent occupancy of the training shifts
For many years, 2) has hovered about 45%, which is what we used to calculate the dues. Since STAR gives credit for training shifts (as we should) this needs to be factored in or we would not have enough shifts.
The sum total of shifts needed are then divided by the total number of authors minus authors from Russian institutions who can not come to BNL.
date weeks crew training PC OFFLINE
11/26-12/10 2 2 0 0 0
12/10-12/24 2 4 2 1 0
12/24-6/30 27 4 2 1 1
7/02-7/16 2 4 0 1 1
Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 522 shifts.
The total number of shifters is 303 - 30 Russian collaborators = 273 people
Giving a total due of 1.9 per author.
For a given institution, their load is calculated as # of authors - # of expert credits x due -> Set to an integer value as cutting collaborators into pieces is non-collegial behavior.
However, this year, this should have been:
date weeks crew training PC OFFLINE
11/26-12/10 2 2 0 0 0
12/10-12/24 2 4 2 1 0
12/24-6/02 23 4 2 1 1
6/02-6/16 2 4 0 1 1
Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 456 shifts for a total due of 1.7 per author.
We allowed some people to pre-sign up, due to a couple different reasons.
Family reasons so offline QA:
James Kevin Adkins
Jana Bielčíková
Sevil Selur
Md. Nasim
Yanfang Liu
Additionally, Lanny Ray is given the first QA shift of the year as our experience QA shifter.
This year, to add an incentive to train for shift leader, we allowed people who were doing shift leader training to sign up for both their training shift and their "real" shift early:
Justin Ewigleben
Hanna Zbroszczyk
Jan Vanek
Maria Zurek
Mathew Kelsey
Kun Jiang
Yue-Hang Leung
Both Bob Tribble and Grazyna Odyniec sign up early for a shift leader position in recognition of their schedules and contributions
This year because of the date of Quark Matter and the STAR pre-QM meeting, several people were traveling on Tuesday during the sign up. These people I signed up early as I did not want to punish some of our most active colleagues for the QM timing:
James Daniel Brandenburg
Sooraj Radhakrishnan
3 other cases that were allowed to pre-sign up:
Panjab University had a single person who had the visa to enter the US, and had to take all of their shifts prior to the end of their contract in March. So that the shifter could have some spaces in his shifts for sanity, I signed up:
Jagbir Singh
Eotvos Lorand University stated that travel is complicated for their group, and so it would be good if they could insure that they were all on shift at the same time. Given that they are coming from Europe I signed up:
Mate Csanad
Daniel Kincses
Roland Pinter
Srikanta Tripathy
Frankfurt Institute for Advanced Studies (FIAS) wanted to be able to bring Masters students to do shift, but given the training requirements and timing with school and travel for Europe, this leaves little availability for shift. So I signed up:
Iouri Vassiliev
Artemiy Belousov
Grigory Kozlov
This is to serve as a repository of information about various STAR tools used in experimental operations.
This section contains information about using EVO for STAR meetings.
If you would like to be able to use EVO in the 1006 trailer, there is a conference PC setup for use. There is a generic account on the computer for everyone to share.
The account credentials are:
Username: rhicstar
Password: (See below)
Log On To: Conference (This computer)
I will not post the password anywhere that is not encrypted for security purposes, so please come see me in my office (Building 510 Room 1-179) or send me an e-mail containing your GPG public key. If you do not have a GPG public key, please bring your laptop, (for desktop users, call me, and I'll come to see you) and I'll help you set it up. It is quite useful.
FUSE is a kernel module that acts as a bridge between the kernel’s built-in filesystem functions and user-space code that “understands” the (arbitrary) structure of the mounted content. It allows non-root users to add filesystems to a running system.
Typically, FUSE-mounted filesystems are (nearly) indistinguishable from any other mounted filesystem to the user.
Some examples of FUSE in action:
The Fuse project FileSystems page has a more complete list and links to individual software projects that use FUSE.
SSHFS allows a user (not necessarily root) on host A (the "client") to mount a directory on host B (the "server") using the (almost) ubiquitous SSH client-server communication protocols. Generally, no configuration changes or software installations are required on host B.
The directory on host B then looks like a local directory on host A, at a location in host A's directory structure chosen by the user (in a location where user A has adequate privileges of course).
Unlike NFS, the user on host A must authenticate as a known user on host B, and the operations performed on the mounted filesystem are performed as known user on host B. This avoids the "classic" NFS problem of UID/GID clashes between the client and server.
Here is a sample session with some explanatory comments:
In this example, host A is "stargw1" and host B is "staruser01". The user name is wbetts on both hosts, but the user on host B could be any account that the user can access via SSH.
First, create a directory that will serve as the mountpoint:
[wbetts@stargw1 ~]$ mkdir /tmp/wbssh [wbetts@stargw1 ~]$ ls -ld /tmp/wbssh drwxrwxr-x 2 wbetts wbetts 4096 Oct 13 10:52 /tmp/wbssh
Second, mount the remote directory using the sshfs command:
[wbetts@stargw1 ~]$ sshfs staruser01.star.bnl.gov: /tmp/wbssh
In this example, no remote username or directory is specified, so the remote username is assumed to match the local username and the user’s home directory is selected by default. So the command above is equivalent to:
% sshfs wbetts@staruser01.star.bnl.gov:/home/wbetts /tmp/wbssh
That’s it! (No password or passphrase is required in this case, because wbetts uses SSH key agent forwarding)
Now use the remote files just like local files:
[wbetts@stargw1 ~]$ ls -l /tmp/wbssh |head -n 3 total 16000
-rw-rw-r-- 1 1003 1003 6412 Oct 19 2005 2005_Performance_Self_Appraisal.sxw
-rw-rw-r-- 1 1003 1003 10880 Oct 19 2005 60_subnet_PLUS_SUBSYS.sxc [wbetts@stargw1 ~]$ ls -ld /tmp/wbssh drwx------ 1 1003 1003 4096 Oct 11 15:56 /tmp/wbssh
The permissions on our mount point have been altered -- now the remote UID is shown (a source of possible confusion) and the permissions have morphed to the permissions on the remote side, but this is potentially misleading too…
[root@stargw1 ~]# ls /tmp/wbssh ls: /tmp/wbssh: Permission denied
Even root on the local host can’t access this mount point, though root can see it in the list of mounts.
In addition to the ACL confusion, there can be some quirks in behaviour, where sshfs doesn't translate perfectly:
[wbetts@stargw1 ~]$ df /tmp/wbssh Filesystem 1K-blocks Used Available Use% Mounted on
sshfs#staruser01.star.bnl.gov: 1048576000 0 1048576000 0% /tmp/wbssh
Ideally the user unmounts it once finished, else it sits there indefinitely (it is probably subject to the same timeouts (TCP, firewall conduit, SSH config, etc.) as an ordinary ssh connection, but in limited testing so far, the connection has been long term) Here is the unmount command:
[wbetts@stargw1 ~]$ fusermount -u /tmp/wbssh/ [wbetts@stargw1 ~]$ ls /tmp/wbssh [wbetts@stargw1 ~]$
Some additional details:
By default, users other than the user who initiated the mount are not permitted access to the local mountpoint (not even root), but that can be changed by the user, IF it is permitted by the FUSE configuration (as decided by the admin of the client node). The options though are not very granular. The three possible options are:
In any case, whoever accesses the mount point will act as (and have the permissions of) the user on host B specified by the mounter. This requires careful evaluation of the options permitted and user education on the possibilities of allowing inappropriate or unnecessary access to other users.
The mount is not tied to the specific shell it is started in. It lasts indefinitely it seems – the user can log out of host A, kill remote agents, etc. and the mount remains accessible on future logins. (Interpretation: an agent of some sort is maintained on the client (host A) on the user’s behalf. (If multiple users have access to the user account on A, this could be worrisome, in the same manner as the allowance of others to access the mount point mentioned above.))
Here are some potential advantages and benefits of using SSHFS, some of which are mentioned above:
And some drawbacks:
And some final details about the configuration of the online gatekeepers that presumably are prime candidates for the use of SSHFS:
The standard installation of FUSE for Scientific Linux 4 seems to not be quite complete. A little help is required to make it work:
In /etc/rc.d/rc.local:
/etc/init.d/fuse start /bin/chown root.fuse /dev/fuse /bin/chmod 660 /dev/fuse
“fuse” group created – each user who will use SSHFS needs to be a member of this group (must be kept in mind if we use NIS or LDAP for user management on the gateways)
The default openssh packages from Scientific Linux 3, 4 and 5 (~openssh 3.6, 3.9 and 4.3 respectively) do not support sftp-subsystem logging. Later versions of openssh do (starting at version ~4.4). This provides the ability to log file accesses and trace them to individual (authenticated) users.
I grabbed the latest openssh source (version 5.1) and built it on an SL4 machine with no trouble:
% ./configure --prefix=/opt/openssh5.1p1 --without-zlib-version-check --with-tcp-wrappers % make % make install
Then in the sshd_config file, append "-f AUTHPRIV -l INFO" to sftp-subsystem line. This activates the logging level (INFO) and causes the logs to be sent to /var/log/secure. (To be tried: VERBOSE log level).
Even at the INFO level, the logs are fairly detailed. Shown below is a sample session, with the client commands on the left and the resulting log entries from the server (carradine, using port 2222 for testing) on the right. For brevity, the time stamps from the log have been removed after the first entry.
CLIENT COMMANDS | SERVER LOG (/var/log/secure) |
sshfs -p 2222 wbetts@carradine.star.bnl.gov:/home/wbetts/ carradine_home | Nov 20 14:30:29 carradine sshd[29120]: Accepted publickey for wbetts from 130.199.60.84 port 41746 ssh2 carradine sshd[29122]: subsystem request for sftp carradine sftp-server[29123]: session opened for local user wbetts from [130.199.60.84] |
ls carradine_home | carradine sftp-server[29123]: opendir "/home/wbetts/." carradine sftp-server[29123]: closedir "/home/wbetts/." |
touch carradine_home/test.txt | carradine sftp-server[29123]: sent status No such file carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE,CREATE,EXCL mode 0100664 carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 0 carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE mode 00 carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 0 carradine sftp-server[29123]: set "/home/wbetts/test.txt" modtime 20081120-14:36:36 |
cat /etc/DOE_banner >> carradine_home/test.txt | carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE mode 00 carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 1119 |
rm carradine_home/test.txt | carradine sftp-server[29123]: remove name "/home/wbetts/test.txt" |
fusermount -u carradine_home/ | carradine sftp-server[29123]: session closed for local user wbetts from [130.199.60.84] |
From these logs, we would appear to have a good record of the who/what/when of sshfs usage. But the need to build our own openssh packages puts a burden on us to track and install updated openssh versions in a timely fashion, rather than relying on the distribution maintainer and the OS's native update manager(s). The log files on a heavily utilised server may also become unwieldy and cause a performance degredation, but I've not made any estimates or tests of these issues.
Here are the specific relevant packages installed on the client test nodes (stargw1 and stargw2):
fuse-2.7.3-1.SL
fuse-libs-2.7.3-1.SL
fuse-devel-2.7.3-1.SL
fuse-sshfs-2.1-1.SL
kernel-module-fuse-2.6.9-78.0.1.ELsmp-2.7.3-1.SL
(Exact versions should not be terribly important, but it appears that fuse-2.5.3 included up to SL4.6 requires more tweaking after installation than fuse 2.7.3 included in SL4.7).
Concatenate the following certs into one file in this example I call it: Global_plus_Intermediate.crt/etc/pki/tls/certs/wildcard.star.bnl.gov.Nov.2012.cert – host cert.
/etc/pki/tls/private/wildcard.star.bnl.gov.Nov.2012.key – host key (don’t give this one out)
/etc/pki/tls/certs/GlobalSignIntermediate.crt – intermediate cert.
/etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt –root cert.
/etc/pki/tls/certs/ca-bundle.crt – a big list of many cert.
cat /etc/pki/tls/certs/GlobalSignIntermediate.crt > Global_plus_Intermediate.crt cat /etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt >> Global_plus_Intermediate.crt cat /etc/pki/tls/certs/ca-bundle.crt >> Global_plus_Intermediate.crt
openssl pkcs12 -export -in wildcard.star.bnl.gov.Nov.2012.cert -inkey wildcard.star.bnl.gov.Nov.2012.key -out mycert.p12 -name tomcat -CAfile Global_plus_Intermediate.crt -caname root -chain
keytool -list -v -storetype pkcs12 -keystore mycert.p12
<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true" maxThreads="150" scheme="https" secure="true" keystoreFile="/home/lbhajdu/certs/mycert.p12" keystorePass="changeit" keystoreType="PKCS12" clientAuth="false" sslProtocol="TLS"/>
One particular detail to be aware of: the name of the pool nodes is now onlNN.starp.bnl.gov, where 01<=NN<=14. The "onllinuxN" names were retired several years ago.
Historical page (circa 2008/9):
GOAL:
Provide a Linux environment for general computing needs in support of the experiemental operations.
HISTORY (as of approximately June 2008):
A pool of 14 nodes, consisting of four different hardware classes (all circa 2001) has been in existence for several years. For the last three (or more?) years, they have had Scientific Linux 3.x with support for the STAR software environment, along with access to various DAQ and Trigger data sources. The number of significant users has probably been less than 20, with the heaviest usage related to L2. User authentication was originally based on an antique NIS server, to which we had imported the RCF accounts and passwords. Though still alive, we have not kept this NIS information maintained over time. Over time, local accounts on each node became the norm, though of course this is rather tedious. Home directories come in three categories: AFS, NFS on onllinux5, and local home directories on individual nodes. Again, this gets rather tedious to maintain over time.
There are several "special" nodes to be aware of:
PLAN:
For the run starting in 2008 (2009?), we are replacing all of these nodes with newer hardware.
The basic hardware specs for the replacement nodes are:
Dual 2.4 GHZ Intel Xeon processors
1GB RAM
2 x 120 GB IDE disks
These nodes should be configured with Scientific Linux 4.5 (or 4.6 if we can ensure compatibility with STAR software) and support the STAR software environment.
They should have access to various DAQ and Trigger NFS shares. Here is a starter list of mounts:
SERVER | DIRECTORY on SERVER | LOCAL MOUNT PONT | MOUNT OPTIONS |
evp.starp | /a | /evp/a | ro |
evb01.starp | /a | /evb01/a | ro |
evb01 | /b | /evb01/b | ro |
evb01 | /c | /evb01/c | ro |
evb01 | /d | /evb01/d | ro |
evb02.starp | /a | /evb02/a | ro |
evb02 | /b | /evb02/b | ro |
evb02 | /c | /evb02/c | ro |
evb02 | /d | /evb02/d | ro |
daqman.starp | /RTS | /daq/RTS | ro |
daqman | /data | /daq/data | rw |
daqman | /log | /daq/log | ro |
trgscratch.starp | /data/trgdata | /trg/trgdata | ro |
trgscratch.starp | /data/scalerdata | /trg/scalerdata | ro |
startrg2.starp | /home/startrg/trg/monitor/run9/scalers | /trg/scalermonitor | ro |
online.star | /export | /onlineweb/www | rw |
WISHLIST Items with good progress:
WISHLIST Items still needing significant work:
An SSH public key management system has been developed for STAR (see 2008 J. Phys.: Conf. Ser. 119 072005), with two primary goals stemming from the heightened cyber-security scrutiny at BNL:
A benefit for users also can be seen in the reduction in the number of passwords to remember and type.
In purpose, this system is similar to the RCF's key management system, but is somewhat more powerful because of its flexibility in the association of hosts (client systems), user accounts on those clients, and self-service key installation requests.
Here is a typical scenario of the system usage:
At this point, John Doe has key-based access to JDOE@FOO. Simple enough? But wait, there's more! Now John Doe realizes that he also needs access to the group account named "operator" on host BAR. Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR. And if Mr. Doe should leave STAR, then an administrator simply removes him from the system and his keys are removed from both hosts.
There are three things to keep track of here -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:
People want access to specific user accounts at specific hosts.
So the system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host.
(To be clear -- the system does not have any automatic user account detection mechanism at this time -- each desired "user account@host" association has to be added "by hand" by an administrator.)
This Key Management system, as seen by the users (and admins), consists simply of users' web browsers (with https for encryption) and some PHP code on a web server (which we'll call "starkeyw") which inserts uploaded keys and user requests (and administrator's commands) to a backend database (which could be on a different node from the web server if desired).
Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service. The keyservices_client periodically (at five minute intervals by default) interacts a different web server (serving different PHP code that we'll call starkeyd). The backend database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the authorized_keys files accordingly.
In our case, our primary web server at www.star.bnl.gov hosts all the STAR Key Manager (SKM) services (starkeyw and starkeyd via Apache, and a MySQL database), but they could each be on separate servers if desired.
Perhaps a picture will help. See below for a link to an image labelled "SKMS in pictures".
We have begun using the Key Management system with several nodes and are seeking to add more (currently on a voluntary basis). Only RHEL 3/4/5 and Scientific Linux 3/4/5 with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or even Solaris. We do not anticipate "forcing" this tool onto any detector sub-systems during the 2007 RHIC run, but we do expect it (or something similar) to become mandatory before any future runs. Please contact one of the admins (Wayne Betts, Jerome Lauret or Mike Dephillips) if you'd like to volunteer or have any questions.
User access is currently based on RCF Kerberos authentication, but may be extended to additional authentication methods (eg., BNL LDAP) if the need arises.
Client RPMs (for some configurations) and SRPM's are available, and some installation details are available here:
http://www.star.bnl.gov/~dmitry/skd_setup/
An additional related project is the possible implementation of a STAR ssh gateway system (while disallowing direct login to any of our nodes online) - in effect acting much like the current ssh gateway systems role in the SDCC. Though we have an intended gateway node online (stargw1.starp.bnl.gov, with a spare on hand as well), it's use is not currently required.
Here you go: https://www.star.bnl.gov/starkeyw/
You can use your RCF username and Kerberos password to enter.
When uploading keys, use your SSH public keys - they need to be in OpenSSH format. If not, please consult SSH Keys and login to the SDCC.
The STAR (ESL) Electronic Shiftlog is written in JSP (Java server pages) and requires a web server that can render JSP content. Unlike php JSP is compiled into JAVA classes using a method call “Just in Time” this means the page is compiled the first time the page is accessed, then it does not have to be compiled again for the life of the page or until the page is modified. The forbearer of JSP is serverlets these are also used in the shiftlog mostly to stream images. The technology differs in that serverlets need to be compiled in advance of being deployed.
Our JSP server is Apache Tomcat. Documentation and newer versions can be downloaded from http://tomcat.apache.org/. Although tomcat is a fully functional web server unto its self we prefer to allow the Apache web server to serve the HTML content and only require Tomcat to serve the JSP pages that Apache can not. This is accomplished by way of the mod_jk Apache Tomcat Connector using the ajp13 protocol. Tomcat hosts on port 8080. This is blocked from the outside but can be seen on a browser started up on the online web server its self.
The Tomcat server hosting the shiftlog is deployed on the online web server online.star.bnl.gov and run under the tomcat account. In order to log on to the online web server to administrate Tomcat and the ESL you will need keys mapped to the Tomcat user account. Please see Wayne Betts or Jérôme Lauret about getting your keys mapped. There are multiple version of Tomcat residing in /opt.
All versions of tomcat are placed in the /opt folder, in a sub folder clearly demoting the version number. (When you unzip Tomcat this is usually how it comes.) Examples are:
/opt/apache-tomcat-5.5.20/ /opt/apache-tomcat-6.0.18/
The currently used version of Tomcat is link to /opt/tomcat/. Below is an ls of the tomcat folder:
-bash-3.00$ ls -l /opt/tomcat lrwxrwxrwx 1 root root 22 Nov 17 11:11 /opt/tomcat -> ./apache-tomcat-6.0.18
Note that this folder is the tomcat’s users home directory. It contains the .ssh folder which holds your keys, so relinking this may cause you to become locked out if you do not transfer this folder in advance.
After you install a new version of Tomcat you will want to configure it.
There are some environment variables whose existences you will want to verify, and if they don’t exist you will want to set them, preferably in a start-up script so they will survive a server restart.
$CATALINA_HOME: /opt/tomcat $JAVA_HOME: /usr/java/default
Inside the Tomcat folder you will find these directories (and some others):
$CATALINA_HOME/bin/ $CATALINA_HOME/logs/ $CATALINA_HOME/webapps/ $CATALINA_HOME/conf/
$CATALINA_HOME/bin/ holds the executables (for linux and windows).
To startup the Tomcat server use:
% $CATALINA_HOME/bin/startup.sh
To shut it down use:
% $CATALINA_HOME/bin/shutdown.sh
You will want to modify the $CATALINA_HOME/bin/catalina.sh this is a script called by startup.sh its function is to invoke the java process which is the Tomcat server.
Directly under the header these lines are added:
# added by Levente Hajdu ##################################### " export JAVA_OPTS=$JAVA_OPTS" -Xmx512M -Djava.library.path=/usr/lib64 -Djava.awt.headless=true" #############################################################
A description of the options used follows
-Xmx512M sets the memory ceiling on the JAVA VM which runs the server to 512MB this should be sufficient for our needs. Any more consumption over this limit will lead to the Tomcat process being terminated.
-Djava.library.path this sets the library path for an optional set of native (non-JAVA) libraries which Tomcat can utilize for improved performance. If this is not present you will see suggestions to set it in the tomcat log.
Djava.awt.headless=true this line prevents a particular type of crash. This server also hosts the SUMS statistics pages. These use libraries (jFreeChart) to render images for display which have a relation to x-server libraries. If Tomcat is started by a user that has X-forwarding enabled but no server running, Tomcat would crash as it tries to execute the JSP without this line present.
You will be spending a lot of time in $CATALINA_HOME/conf/. The file that controls the Tomcat context paths is $CATALINA_HOME/conf/server.xml. This file requires editing when ever software is deployed at a new context path. Before you edit this file always make a backup. Each year of the shiftlog resides on a different context path. Here is the list:
http://online.star.bnl.gov/apps/shiftLog2003/
http://online.star.bnl.gov/apps/shiftLog2004/
http://online.star.bnl.gov/apps/shiftLog2005/
http://online.star.bnl.gov/apps/shiftLog2006/
http://online.star.bnl.gov/apps/shiftLog2007/
http://online.star.bnl.gov/apps/shiftLog2008/
http://online.star.bnl.gov/apps/shiftLog2009/
The current year is always at:
http://online.star.bnl.gov/apps/shiftLog/
If we look inside the $CATALINA_HOME/conf/server.xml file we will see an entry for each one of these paths:
<!--Shiftlog 2007--> <Context className="org.apache.catalina.core.StandardContext" cachingAllowed="true" charsetMapperClass="org.apache.catalina.util.CharsetMapper" cookies="true" crossContext="false" debug="0" docBase="/var/tomcat/webapps/shiftLog2007.war" mapperClass="org.apache.catalina.core.StandardContextMapper" path="/apps/shiftLog2007" privileged="false" reloadable="true" swallowOutput="false" useNaming="true" wrapperClass="org.apache.catalina.core.StandardWrapper"> <Environment description="" name="year" override="false" type="java.lang.Integer" value="2007"/> <Environment description="" name="isEditable" override="false" type="java.lang.Boolean" value="false"/> <Environment description="" name="runLogLink" override="false" type="java.lang.String" value="http://online.star.bnl.gov/RunLog/Summary.php?run="/> <Environment description="" name="runNumber" override="false" type="java.lang.Integer" value="7"/> </Context>
This is the block of XML for the shiftlog for 2007. With different versions of Tomcat the syntax of this file can change, however it usually doesn’t change too much. Lets go over the important properties in this block:
docBase – Tomcat supports web archive files (.war). This is basically a zip file with a special internal structure. The explanation of the preparation of one of these files would take a whole Drupal page unto its self.
Path – This is the context path at which the site will appear when you look at it over your web browser. It is the part of the url after the server name.
Environment – The environment sub-tag makes information available to the program. The format if fairly simple, However you have to be careful to set the override="false" or else the .war files ./WEB-INF/web.xml will over write these values with its own values.
The environment properties for the shiftlog are:
year – this is the shiftlog year. Example: “2007”
isEditable – this is a boolean value after the run has completed access to the editor is turned off by setting this to false.
runLogLink – This is the url for the run log. The shiftlog uses this to build links to the run log.
runNumber – this is almost the same as the year it’s just the number. Examples:
run 8 = 2008
run 9 = 2009
run 10 = 2010
The $CATALINA_HOME/webapps/ web apps folder holds the default pages that come pre-packaged with the Tomcat server. This is also the location where Tomcat unpacks the war files. The folder naming conventions can change from Tomcat version to Tomcat version.
The $CATALINA_HOME/logs/ directory, as you may have guessed, holds log files. You will want to look over all files in here even if Tomcat would seem to be functioning correctly. The logs can point out errors you many not be aware of. The file $CATALINA_HOME/webapps/catalina.out holds the stander output stream of your JSPs (not to be confused with the HTML output stream) along with Tomcats own stander output stream, making this a handy file for debugging.
To deploy a war file the procedure is as follows:
Stop Tomcat:
$CATALINA_HOME/bin/shutdown.sh
NOTE: If you deploy the tomcat administrative web interface shutting down the whole server is not strictly required because you could just shut down the context path, but I prefer to shut down the whole server as a matter of habit because time required is so short no one really notices.
If this is an upgrade of an existing .war file (else move to step 3), back up the old .war file. All war files are located in /var/tomcat/webapps/ here is the listing of the directory, note the convention for the naming of the web archive files:
-bash-3.00$ ls -1 /var/tomcat/webapps/shiftLog*.war /var/tomcat/webapps/shiftLog2003.war /var/tomcat/webapps/shiftLog2004.war /var/tomcat/webapps/shiftLog2005.war /var/tomcat/webapps/shiftLog2006.war /var/tomcat/webapps/shiftLog2007.war /var/tomcat/webapps/shiftLog2008t.war /var/tomcat/webapps/shiftLog2008.war /var/tomcat/webapps/shiftLog2009.war
When removing one of these files I move it to the /var/tomcat/webapps/old/ directory and rename it following the convention here:
shiftLog2007.Apr03.965628000.war shiftLog2007.Apr04.288184000.war shiftLog2007.Apr09.200079000.war shiftLog2007.Dec03.805483000.war shiftLog2007.Feb07.785336000.war ... shiftLog2007.Mar27.875569000.war shiftLog2007.Nov09.134343000.war shiftLog2007.Nov28.320967000.war shiftLog2007.Nov28.657299000.war
It is important to retain the backup in case there is something wrong with the new .war file, keeping the old one will allow you to roll back whilst the problem is being corrected.
Next copy over the new .war file from the node on which it resides. Scp is the method I use for this. The syntax is:
% scp [username]@[nodeName]:[Path&File]/var/tomcat/webapps/shiftLog[year].war
If this is a new deploy and not an upgrade of an existing .war file you will have to configure a context path in $CATALINA_HOME/conf/server.xml (else move to step 6)
If this is an upgrade you will have to dump (delete) the expanded .war file in $CATALINA_HOME/webapps/ it should be a directory having a name similar to that of the name of the .war file. You do not have to back this up because you already have the .war file backed up.
Startup Tomcat
% $CATALINA_HOME/bin/startup.sh
Open up a web browser and check that the page displays correctly
Run the shift log Java web start application to confirm that the developer has signed his or her jar files within the .war file, if not you will need to have the .war file rebuilt.
Because upgrades are done fairly frequently mostly for request for new features and some bug fixes I keep a script to do the upgrade process listed above, however the script requires modification before running it. The name of the script is $CATALINA_HOME/bin/deploy_year .
If you have done the upgrade but do not notice any change:
checked that you dumped $CATALINA_HOME/webapps/ (step 5)
also dump your web browsers cache
If you get the “page unavailable” message, check that the tomcat process is running. Use the command
ps –ef | grep tomcat | grep java
Even if it is running shut it down and try and restart it again, like an old car Tomcat may not start the first time you try to crank it over.
STAR experts deemed absolutely essential may request to be placed on the expert editor list to edit the ShiftLog directly via the web interface. The user must provide justification for needing to edit the ShiftLog remotely and provide their Kerberos (RCF) user name.
Administrator Notes:
The Tomcat web server will authenticate the user with Kerberos and Tomcat manages the session. We have written the custom module OnlineTomcatRealm.jar to do the authentication which is configured in $CATALINA_HOME/conf/server.xml.
ssh tomcat@online.star.bnl.gov
Edit the file $CATALINA_HOME/conf/tomcat-users.xml
Note: that $CATALINA_HOME may not be defined. However it is wherever Tomcat is installed. In our case this /opt/tomcat
The file looks like this:
<tomcat-users> <role rolename="manager"/> <role rolename="logEditor"/> <user username="jfaustus" roles="logEditor"/> <user username="mephistophilis" roles="logEditor"/> </tomcat-users>
Add a new user with the username and the roles set to "logEditor”.
The restart server:
$CATALINA_HOME/bin/shutdown.sh $CATALINA_HOME/bin/startup.sh
Check that it works and you’re done.
Uninterruptible Power Supplies at the experiment:
RackTables OBJECT NAME | LOCATION | MODEL | BATTERY TYPE | LAST BATTERY CHANGE | DEVICES POWERED | NOTES |
UPS7 | Control Room, Slow Controls Terminals, floor near south west corner | APC SMT1500NC | RBC7 | 6/2017 (original battery) |
sc5.starp.bnl.gov 2 LCDs for sc5 |
Serial #: AS1711333192 black tower |
Control Room, north of Slow Controls Terminals, floor | APC BR1000G (Back-UPS Pro 1000) |
RBC123 |
|
sc.starp.bnl.gov |
Serial #: 3B1204X18919 black "tower" |
|
Control Room, TPC Terminals, console shelf |
APC SMT1500RM2U (Smart-UPS 1500) |
RBC133 | 11/2014 (orig. battery put into service) 12/26/2024 |
chaplin + 2 LCDs sirius + LCD |
Serial#: AS1431232892 Rack-mount |
|
Control Room, TPC terminals |
APC DLA1500 (SMART-UPS 1500) |
RBC7 | 11/20/2021 |
gmt-ops + LCD |
Serial # AS0736230401 Black |
|
UPS14 (not in RackTables) |
Control Room, trigger systems, countertop |
APC SMT1500C (Smart-UPS 1500) |
RBC7 | original factory battery, May 2022 |
startrg + LCD |
Serial #: 352208X11667 black |
Contol Room, magnet terminals, behind the LCD for the Windows PC running magnet monitoring |
APC BR1500LCD (Back-UPS RS 1500) |
RBC109 | 1/24/2020 | rosas + LCD |
Serial #: 3B0935X21952 gray/black *nominally belongs to CAD* possible contacts are John Pomaro or anyone in Collider-Accelerator Support |
|
Control Room, under Shift Leader desk | APC BR1000G | RBC123 | 04/12/2023 |
shift-leader + 2 LCD |
Serial #: 3B1204X18994 black |
|
UPS1 | DAQ Room, L4 and server rack (center row, north end) | APC SMT1500RM2U (Smart-UPS 1500) |
RBC133 | September 2022 |
ovirt2 onldb5 (twice, redundant PS) new servers in 2022 TBC |
Serial #: AS1231125008 Manuf. date: July 2012 black, rack mount bought December 2012 |
DAQ Room, on the floor between DB1 and DB2 (the legacy DAQ and trigger racks - southern end of the middle row) |
APC SMT1500 (Smart-UPS 1500) |
RBC7 | Battery (3/2011) |
evp3 (bottom PS), trgscratch (top PS), sclrscratch (top PS), daqlocalmain network switch, daq-sw2 network switch trgscratch 12 disk external storage array (bottom PS) |
Serial #: AS1050221151 black bought March 2011 |
|
UPS13 | DAQ Room, northeast corner, floor | APC SMT1500C (Smart UPS 1500) |
RBC7 | October 2021 (original factory battery) | stargw3.starp.bnl.gov | Serial #: 3S2141X15140 Manuf. date: October 2021 black, bought April 2022 IP: 130.199.60.152 BNL tag: A111250 |
UPS2 | DAQ Room, rack DB2 (legacy DAQ rack) | APC SMT2200RM2U (Smart-UPS 2200) |
RBC43 |
evp |
Serial #: AS1431243644 Manuf. date: July 29, 2014 rack-mount |
|
UPS15 | DAQ Room floor north of shelves in center row | APC SMT2200RM2uC | RBC43 | factory original battery (February 2022) |
onldb4 (right PS) |
Serial #:AS2205260230 black |
UPS3 | DAQ Room DC3 |
APC SUA1500RM2U (Smart-UPS 1500) |
RBC24 | Dec. 6, 2019 |
barbados2 softioc4 daq-sw1 |
Serial #: AS0847123095 black, rack-mount |
DAQ Room DC4 | APC SMX1500RM2U with APC SMX48RMBP2U (external battery) |
RBC115 2x RBC115 |
? | various SGIS interlock equipment | Serial #: AS1039230480 C-AD equipment |
|
UPS4 | DAQ Room, rack DB1, bottom | APC SMT2200RM2U | RBC43 | October 2021 |
l2ana01 (bottom PS) l2ana02 PCI extension (for l2ana01) |
Serial #: AS1336140512 Manuf. date: Sept. 2013 |
UPS5 | DAQ Room, northern Online Linux Pool rack | APC SMT2200RM2U | RBC43 | January 26, 2023 |
onl30, onldb (x2) |
Serial #: AS1430241567 Manuf. date: July 22, 2014 <DAQ Room Power Panel> |
UPS6 | DAQ Room, L4 and server rack (DB8, middle row, north end) |
APC DLA1500RM2U (Smart-UPS 1500) |
RBC24 | Feb. 28, 2025 |
L4 network switch, |
Serial #: AS0340212578 black, rack-mount |
UPS9 | DAQ Room middle row shelves | APC SMT2200RM2UTW (Smart-UPS 2200) |
RBC43 (Note that the unit itself says it uses an RBC55, if one navigates through the onboard menu. This appears to be an error on the part of APC). | (original battery) |
satabeast3 (left PS) onldb3.starp (left PS) onldb4.starp (right PS) |
Serial #: AS1645262798 Manuf. date: November 2016 |
UPS10 | DAQ Room, middle row shelves (middle shelf) | APC SMC1500-2U | RBC132 | January 28, 2021 |
stardns1.starp.bnl.gov 24-disk SAS enclosure for trgscratch and sclrscratch onldb2.starp (left PS) |
Serial #: AS1539124741 Bought December 2015 to initiate self-test: push + hold Mute,then press Display for 2 seconds |
DAQ Room, bottom of rack "DB9" (center row, north end) |
APC SMT2200RM2U | RBC43 | cephnfs2 (left PS) dbbak (top PS) onlhome (top PS) stargw2 (in rack DB8) cephmon01 in rack DB8 (right PS) cephmon02 in rack DB8 (right PS) onlpool-s60-01 and onlpool-s60-02 (via a shared extension cord) |
Serial #:AS1645260493 Manuf. date: November 3, 2016 (bought May 2017) 2U rack-mount BNL tag: A76075 |
||
DAQ Room, NW corner networking rack |
APC SMX2000LV with |
RBC143 | October 2020? | Various networking equipment |
Serial #: AS1913351834 rack-mount |
|
UPS16 | WAH 1A9 | APC SMT1500RM2UC | RBC159 | 2/2023 (factory original battery) |
NPSlaser.starp (Remote power switch for TPC laser PC, though the PC is NOT plugged into it, only a "picomotor multi-axis driver") |
Serial #: 3S2205X11933 rack-mount, black |
WAH 1B1 | APC SMT1500RM2U (Smart-UPS 1500) | RBC133 | 10/2021 |
tofcontrol TOF USB hub |
Serial #: AS1617143314 rack-mount, black |
|
WAH 1C4 | APC SMT1500RM2U | RBC133 | 11/2018 | netpower1.starp.bnl.gov (with networking equipment in 1C4) netpower2.starp.bnl.gov (with networking equipment in 1C4) (This could be moved back to UPS11 at a "convenient power outage".) |
Serial #: AS1243245039 black, 2U rack-mount Manuf. date: October 2012 bought in January 2013 |
|
UPS11 | WAH 1C4 | APC SMT2200RM2U | RBC43 | January 10, 2025 |
netwpower2 was moved to the other UPS in this rack in 2024 (?) but could be moved back at if there is a "convenient" opportunity to do so. |
Serial #:AS1435142781 Manuf. date: July 28, 2014 (bought December 2014) rack mount Has overheated and shutdown while in service in the DAQ Room during AC failures (with ambient room temperatures above 90 F (reaching 100 at times)). So while it seems to be an otherwise reliable unit, it should not be used in an environment where the temperatures may have such uncomfortably high temperatures, nor in the immediate vicinity of other especially warm equipment. |
WAH 2A3 |
APC SMX1500RM2U (Smart-UPS 1500) |
RBC115? | unknown | gas leak detection systems in 2A2 and possibly C-AD interlock equipment in 2A1 |
Serial #: AS1039230484 rack-mount, black |
|
WAH 2A9 | APC SMT1500RM2U (Smart-UPS 1500) | RBC133 | grant (Wiener/VME) | Serial #: AS143611346 Manuf. date: Sept. 2014 black, rack mount bought March 2015 |
||
WAH 2A9 | APC SMT1500RM2U (Smart-UPS 1500) | RBC133 | April 2018 | TPC interlock distribution panel surge suppressor with: -cooling water flow meters -scserv -2x interlocks equipment in 2A8 |
Serial #: AS1243245306 Manuf. date: October 2012 black, rack mount bought ~Dec. 2012 |
|
(in Bldg. 510 when last seen, previously was in the WAH on the floor under the east stairs to RHIC tunnel) |
APC BE750G (Back-UPS ES750) |
RBC17 | original battery from ~fall 2010??? | nothing when last seen in the WAH (checked 11/20/2015) |
Serial #: 5B1039T74854 black |
|
WAH North Platform, 1st floor west | APC SMT1500RM2U (Smart-UPS 1500) | RBC133 | 01/2019 | north-nps1 (and thus all networking equipment on the north platform) | Serial #: AS1144220012 Manuf. date: October 2011 rack-mount, black |
|
AB, near the GMR | PowerWare | (Batteries likely were replaced at some point after that under a service contract, but details are unclear (handled by STSG)) February 2025 (nearly the whole unit was replaced) |
gas system equipment | This is a large UPS for circuits in the Gas Mixing Room, under the care of the STSG group. IP: gmr-ups.starp.bnl.gov BNL property tag 145850 bought in fall 2012 |
||
AB, mezzanine top floor (northeast corner) | Mitsubishi UP7011A | November 20, 2019 | unknown | CAD equipment, definitely not STAR's responsibility labelled "1006 UPS1" serial port is connected to a Ethernet console server, 130.199.41.64 installed January 2015 Contacts are John Mingoia and Anh Pham |
This list is maintained as information is made available and is sporadically checked for correctness. The maintainer of this list is often not informed when STSG adds, removes or replaces UPSes and batteries. Furthermore, anyone may remove or add equipment to UPSes without informing the maintainer of this list.
Spare batteries on hand:
In a cabinet in the DAQ Room (APC RBC numbers):
7: January 2023 (2 of them)
55: October 2022
109: March 2020
132: November 2021
141: October 2020
(STSG / electronics techs may have additional spares in the Building 510 labs)
HOSTNAME | SUBSYSTEM | PRIMARY CONTACT | RT TICKET (if any) | NOTES and EXPECTED RESOLUTION PATH |
autueil.starp | S&C | Wayne Betts | 2690 | Replace with a Windows 7 machine currently named madison in 1006C |
shift-leader.starp | ops | 2689 | Dell says this model (Optiplex 745) has been successfully "Tested for Basic Windows 7 Functionality" and the Windows 7 upgrade advisor tool from MS indicates no significant problems. Nonetheless, the plan is to replace this system with a Dell Optiplex 990 (BNL barcode 151457) currently in 510/1-179 (Windows 7). |
|
tpcgas.starp and its backup machine | TPC | Jim Thomas | 2626 | Two new computers are online now as tpcgas1 and tpcgas2. Peter Kravtsov completed one, the other needs additional configuration, for which Peter provided instructions, but they cannot be completed without swapping hardware, so backup machine is not "perfect" backup yet.) |
chaplin-run09, astaire-run09, sirius-run09 | TPC | Jim Thomas | - Moving to Linux has been discussed numerous times and is still a possibility; the primary hold-up is the TPC Alarm Handler, which is currently a Windows application. Without a replacement for it within Linux, the assumption has been that at least one Windows machine will need to be available, but in discussing with Alexei, it seems this TPC Alarm Handler is redundant with Slow Controls's STAR Alarm Handler, so may not be necessary after all. (resolution TBD) - One more note, discussing this with Alexei and Jim, we all generally seem to agree that they don't need 3 computers (that was a luxury afforded to them in the early days when the Control Room wasn't so crowded) - 2 would suffice. - Nov. 21 update (WB): It turns out these computers were bought with Vista licenses. Upgrading in place is a *painfully* lengthy process, but I am attempting it on astaire (with a fallback disk with the XP installation just in case). - Nov 25 update (WB): Alexei and Jim have definitively approved a Linux trial. The astaire PC will have replacement disks installed and a fresh Linux installation (SL 6.4). Testing of TPC usage is expected to be quick - once approved, will proceed with Linux installation on chaplin. They request to keep sirius while they try to migrate the TPC alarm handler to Linux (seeking source code from Peter Kravtsov) - if successful, will eliminate sirius, otherwise will proceed with attempted upgrade to Vista. - Jan 10 update (WB): astaire had Linux installed 3-4 weeks ago and TPC MEDM screens shown made to work nicely after some font adjustments. Approval to proceed with chaplin (keeping the original disks on stand-by). Also, the TPC alarm handler (currently "assigned" to sirius) was demonstrated to run fine using Wine on a Sc.Linux 6 machine, so that no longer seems to be a hold-up - simply compy over the Alarms folder, make some fairly obvious path adjustments and firewall openings and it works. Final disposition: chaplin and astaire have Sc.Linux installations on them. sirius still has Windows XP, but is only on a small private network for use with the WAH video and TPC laser systems. |
|
tofgas.starp | TOF | 2627 | Was replaced during Peter Kravtsov's visit in December, 2013. | |
deneb2.starp | general use on South Platform | 2680 | Replaced with one of the recovered Vista machines. Does not need much; does not play a direct role in STAR data-taking; just used during maintenance days as a terminal and web browser. |
|
fmsled | FMS | Steve Trentalange | a laptop in the Wide Angle Hall - not sure if there is a compelling reason for it to be a laptop going forward, but if so desired, we have available a Sony VIAO with a Vista key (barcode 136278); in any case, it does not need much computing power. FMS is not expected to be present in the 2014 run, so this is a relatively low priority. Steve expressed a preference for Windows 7 over Windows Vista, but I doubt it will make any difference, other than possibly giving a longer potential lifetime to the replacement. MP 11/22: The Sony VIAO machine has a Windows Vista installation on it. All necessary BNL configurations have been made. 1/10/2014 (WB): Unfortunately, the original fmsled laptop has a serious hardware problem and will not boot at all. Hopefully the disk can be recovered, though that is complicated somewhat by having PGP WDE. Final disposition: System is removed from the WAH and the network. Steve T. says there is nothing critival to recover from it. |
|
hoosier | BEMC | Steve Trentalange/ Oleg Tsai | 2770 | WB: 10/15 - Win 7 upgrade advisor says ok for both 32-bit and 64-bit Win 7 installations. JL: 11/22, assigned to MP MP: A Dell precision desktop has been allocated for use to replace the old hoosier machine. The machine has been brought up to date and is ready for use. Steve needs to test an HV device on the old machine to ensure that it works. Once he gives the go ahead we will switch over to the new machine. The switch over will hopefully take place during the week of 1/13/14. Mp: 1/27/14 - The replacement machine has been put in place. LabVIEW 2013 evaluation has been installed for the time being and Steve's VI worked on LabVIEW 2013 on the new Windows 7 machine. The new machine has been put in place, we now just need to get licenses for a legitimate version of LabVIEW and the machine should be finished. MP: 4/17/14 - LabVIEW 2013 has been purchased and installed on the machine. The Windows XP Machine has been disconnect and is no longer in use. |
emcsc / backup emcsc | BEMC | Steve Trentalange/ Oleg Tsai | WB: 10/17 - Win 7 upgrade advisor says it needs more RAM (currently only 512MB; 1GB min for 32-bit Win 7), and does not know about the compatibility of the National Instruments RS-485 adapter card. Meanwhile, there is a newer computer (unfortunately also with Win XP) available that was configured 1-2 years ago as a backup for emcsc (including LabVIEW 6.1 and an RS-485 adapter) but it has been sitting unused since then. Steve has suggested we try putting Windows 7 on the backup machine as a test, and if it works, put it into production. WB: 1/10: tested the old PCI 232/485 card in a Windows 7 machine, and was able to download drivers from National Instruments that allow the ports to be recognized, so this might not be a show stopper. Also, found NI's LabView version compatibiltiy chart and it indicates that LabView 2013 should be able to open VI's saved in version 6.1, so this too is looking positive. We need to get a version (possibly a trial version?) of the latest LabView to try this out. Mp: 4/17/14: A Windows 7 machine was delegated for replacement of the emcsc machine. The trial version of LabVIEW 2013 was installed along with the old PCI 232/485 card. The problem was that the LabVIEW 6 code was too old to run on LabVIEW 2013. The .vi would not run properly. I had a LabVIEW technical rep come out to the lab multiple times in order to troubleshoot the issue and the conclusion was that the old code would need to be revamped in order to run under LabVIEW 2013. Fortunately, in order for the emcsc machine to operate, it does not need a network connection (only the NI COM card). The XP machine has been deregistered and disconnected from the network, and will continued to be used until time allows for the LabVIEW code to be updated. |
|
videopc | ops | Alexei Lebedev | Have to evaluate the compatibility of the video capture card (and its software) with Windows Vista/7 WB: 1/10 - having looked into this, I thought it would be impossible, but Alexei informed me today that Andrei Brandin will be at BNL for the collaboration meeting in February, and he thinks he can make the current system work under Windows 7. But if not, we will move this machine to a small private network shared with the TPC Laser system control PCs. Final disposition: Andrei B. made no progess (or even any effort?) on his visit. The system still has Windows XP, but is only on a private network now. |
|
pp2pp-slow | PP2PP | Originally overlooked because it is not on a "star" subnet (it is 130.199.90.72), and the PP2PP subsystem has been inactive for some time. This is 9.5 year old Dell Pentium 4 system, so not likely a good candidate for Windows 7 or Vista, though it meets the minimum requirements. MP 2/25: After speaking with Wlodek Guryn and Kin Yip, this machine will not be used for Run14. The machine has been removed from the Control Room by one of Wlodek's guys and will be worked on off of the network. A PP2PP machine will been needed for next year, a replacement machine will need to be purchased and setup down the road. |
SYSTEM NAME | CONTACT/PRIMARY USER | LOCATION | RT TICKET (if any) |
RESOLUTION PLAN/SUMMARY |
JML.STAR.BNL.GOV | Jeff Landgraf | 510/1-184 | 2677 | Have discussed with Jeff - a new PC was ordered (expected to arrive by end of November). MP: The new PC has come and it all setup for BNL use. Jeff's profile has been setup. |
Bugrhoff (DHCP client) | Wayne Betts | 510/1-179 | old laptop - phased out in favor of newer one already in use | |
DBEAVISDT.STAR.BNL.GOV | Dana Beavis | 510/1-169 | JL: 09/27 - Ambiguity on group WB: computer has been moved to a C-AD building. MAC reg., IP address and domain group are no longer associated with STAR |
|
BCHRISTIE.STAR.BNL.GOV | Bill Christie | 510/1-180 | 2691 | JL: 09/27 - Update OK in the coming months if possible, suggest 7 (need to check) MP: 10/4 I ran the Win 7 Upgrade Advisor. The machines hardware and software is compatable with Win 7 (currently has Win XP 32-bit) |
KEATON2.STAR.BNL.GOV | Victor Perevoztchikov | 510/1-165 | 2720 | JL: 09/27 - Machine could be replaced by a Linux node (preferred) JL: 11/22, assigned to MP (new node needs to be purchased) MP: 12/5, A Dell Precision T3610 has been ordered. The machine supports RHEL and will be setup accordingly. MP: 2/28, The machine has been replaced with the T3610 setup with Scientific Linux 6. The old machine will be retired. |
MONROE2.STAR.BNL.GOV | Lidia Didenko | 510/1-173 | 2695 | possible to upgrade to Vista? (a license key is on the case) JL: 09/27 - Update OK, is Win 7 possible? Worried of CERT being messed up (saved in IE) MP: 10/4 I ran the Win 7 Upgrade Advisor. The machine's hardware and software is compatable with Win 7 (currently has Win XP 64-bit) MP: 11/20 The machine has been upgraded to Windows 7. Refer to ticket # 2695 |
BANCROFT.STAR.BNL.GOV | nobody | 1006C | WB: 10/18 - old machine has been pulled from service (it existed solely to operate an old SCSI scanner, which has also been retired) | |
CONFERENCE.STAR.BNL.GOV | 1006C | 2687 | WB: 10/17 - Vista has been installed on a machine from the Equipment Pool, and the original conference PC has been shut down. | |
GRANT.STAR.BNL.GOV | John Hammond | 901 | This is a file server for the electronics support group. It is largely up to John to move the shared content to a different server to retire this one. JL: 11/22, assigned to MP MP: 12/5, I spoke with John, he stated that he has a Windows 7 machine and will be moving the file server to that node himself. I will be in contact with him to record when the XP machine has been taken off the network. MP: 2/28, I spoke with John this week, he stated that the GRANT machine is still on the network but he will be taking it off at the end of this month. He has a Windows 7 machine to replace the XP machine, just needs to do the switch over. WB: 4/18/2014: John was copying the final directories to the replacement today and expects to turn grant off on Monday, 4/21. |
|
PADRAZO1.STAR.BNL.GOV | John Hammond | 901 | WB: 10/17 - John purchased and installed Windows 7 for this system on a new disk and Athena T. will start using it. The original disk has been put aside in case any files from Ken Asselta turn out to be needed. | |
PKUCZEWSKIDT.PHY.BNL.GOV | Phil Kuczewski | 901 | MP: 2/25, I sent an email to Phil on 1/2/14 regarding his Windows XP Machines. I never recieved a reply. | |
PKLAPTOP1.STAR.BNL.GOV | Phil Kuczewski | 901 | (laptop) MP: 2/25, I sent an email to Phil on 1/2/14 regarding his Windows XP Machines. I never recieved a reply. |
|
DAGOSTINOC.STAR.BNL.GOV | John Hammond Alex Tkatchev |
901 | WB: 10/17 - This is about 4 years old and has a Windows Vista product sticker, but the current plan is to make a fresh Linux installation and let Alex Tkatchev use the system for trigger-related development work. The original disk has been removed and a new one installed for the Linux installation. | |
PO-143966.STAR.BNL.GOV | Alex Tkatchev | 901 | WB: 10/15 - This is about 4 years old and has a Win 7 product sticker on it. WB: 2/28/14: If a fresh Win 7 install is made, I suggest adding a second disk (if it doesn't already have two) and making a RAID 1 array if possible. |
SYSTEM NAME | LOCATION | NOTE | ||
STAR-UTILITIES.STAR.BNL.GOV (on a C-AD network) | STAR Control Room | runs software provided by C-AD. |
||
ROSAS.STAR.BNL.GOV (on a C-AD network) | STAR Control Room | runs software provided by C-AD. |