Online Computing

General

The online Web server front page is available here. This Drupal section will hold complementary informations.
A list of all operation manuals (beyond detector sub-systems) is available at Operations.
Please use it a startup page.

Detector sub-systems operation procedures - Updated 2008, requested confirmation for 2009

 

Accessing The STAR Protected Network

Creating An Account

To get access to the STAR SSH gateways (which will also allow access to the generic Online Linux Pool) please follow the steps below:

  1. Obtain an RCF Account, and upload your public key to the RCF
  2. Go to The SKM page and login with your RCF account (Your AFS/Kerberos credentials)
  3. Upload your PUBLIC key in openssh format on the main page after logging in.  Your public key should have a name like "id_rsa.pub")
  4. Send an e-mail to STAR Support containing your full name, RCF username, BNL Life Number and a brief description of your intended use of the online resources and/or particular subsytem(s) to be supported
  5. Once you are notified that your account has been created, please follow the steps below to login.
  6. As a user of online resources, it is suggested that you subscribe to the Run Time System mailing list and Mattermost channel for announcements about maintenance periods and configuration changes. 

The online gatekeepers are named stargw.starp.bnl.gov.

ssh -AX username@stargw.starp.bnl.gov

Logging In Via SSH

Linux Users:

  1. You can either script this, or perform these steps manually.
  2. You can now ssh into any of the star protected nodes from here. Just remember to use "ssh -AX" each time in order to forward X11 and the ssh agent. (Please keep in mind that the star gateways are not currently available directly from outside BNL.  You will need to go through the RCF first.)

 

 

EVO Conference Computer

If you would like to be able to use EVO in the 1006 trailer, there is a conference PC setup for use.  There is a generic account on the computer for everyone to share.

The account credentials are:
Username: rhicstar
Password: (See below)
Log On To: Conference (This computer)

I will not post the password anywhere that is not encrypted for security purposes, so please come see me in my office (Building 510 Room 1-179) or send me an e-mail containing your GPG public key.  If you do not have a GPG public key, please bring your laptop, (for desktop users, call me, and I'll come to see you) and I'll help you set it up.  It is quite useful.

Online Linux Pool

This page provides an overview of the Online Linux Pool (OLP).  The OLP is a cluster of computers made available to STAR collaborators with the primary intent of allowing real-time and near real-time run support activities, but with general usage and various computing development and testing projects envisioned as resources permit.

The OLP currently consists of 60 Penguin Altus 1300 rack-mount computers physically located in the DAQ Room, plus two servers that provide home directories (over NFS), user authentication (NIS), and Condor pool management.  The "worker" nodes are named onl01, onl02, ..., onl60.starp.bnl.gov.  These 60 pool nodes have 64-bit Scientific Linux 5.8 (with 32-bit libraries).  Any user with access to the stargw.starp.bnl.gov SSH gateways has access to these 60 nodes.  Users of the RACF will recognise the "rterm" command, which if executed on a stargw host will attempt to connect to one of the nodes with relatively low load. 


Remote filesystems:

All nodes have access to several remote filesystems that may be useful to online computing:

  • /evp/a (read-only access to the DAQ Event Pool)
  • /daq/RTS (read-only access to daqman's /RTS export)
  • /daq/data (read-write(!) access to daqman's /data export)
  • /daq/log (read-only access to daqman's /log export)
  • /onlineweb/www (read-write access to the online web server's space for content to be shared over the web)
  • /afs the standard AFS tree

Additionally, onl01-onl06 are configured to access trigger data at:

  • /trg/trgdata (trgscratch's trgdata export)
  • /trg/scalerdata (startrg2's scalerdata export). 


Condor

A Condor pool is set up on these nodes.  Currently onl01-30 are in the pool (moduo a few specialized nodes not accepting jobs), serving as execute hosts.

rterm is available on the Accessing The STAR Protected Network hosts to select the least-loaded system for login.  Only a subset of nodes are tagged as interactive for rterm.  That list is currently onl01-10 .

Cron

conjobs are accepted and can run only on onl11,12, and 13. To access the exported Web directories in write mode, you need to be part of the onlweb group. Every year before the run, a list of point of contact is compiled and used to determine who should be granted access (this is not given by default).


General system details (hardware, OS, etc):

The Penguin nodes have 64-bit Scientific Linux 5.8 installations (with 32-bit libraries), with these basic hardware specs:

2 x Dual Core AMD Opteron Processor 265, 1800MHz (4 cores per system, no HT)

8GB RAM (PC3200 DDR 400MHz ECC)

4 SATA disk bays

  • onl01-onl30: 4 x 500GB disks (7200RPM) in a RAID configuration providing a 1.3 TB scratch space (mounted at /scratch)
  • onl31-onl60: 4 x 1TB disks (7200RPM) in a RAID configuration providing a 2.6 TB scratch space

Usage suggestions and miscellaneous note for users:

To reduce the burden on the network and the home directory NFS file server, it is advisable for heavy users of distributed jobs (ie. Condor jobs) to avoid unnecessary access to their individual home directories.  As much as possible, please consolidate access to your home directories, and use the local disks as needed for storage.  Small, short-term needs (up to the order of 100MB or so) can use subdirectories under /tmp, while larger demands should use directories under /scratch on each individual node.  We expect at some point in the future to provide a shared file system (other than the home directories) of some significant size, but are not there yet.

The OLP nodes only allow access based on SSH keys.  If you have access to the stargw SSH gateways, you will also automatically have access to the OLP.  To make it most convenient, it is suggested that you familiarize yourself with SSH key agents and SSH key forwarding, which can (nearly) eliminate all need for typing passwords/passphrases.

Online computing run preparation plans

This page will list by year action items, run plans and opened questions. It will server as a repository for documents serving as basis for drawing the requirements. To see documents in this tree, you must belong to the Software and Computing OG (the pages are not public).

Run IX

General

This tree will contain information pertaining to run 9.

Run preparation meetings are held at the usual time i.e. on Friday between 3-5 PM (room reserved , will try to keep to one hour weekly). The following groups are invited to join:

  • The S&C core support as appropriate
  • The online "Run Time System" representatives
    • DAQ - Jeff Landgraf
    • Slow Control - Yury Gorbunov
    • Trigger - Jon (Jack) Engelage
  • All software coordinators as listed on the Organization page

The goal of the meetings are to discuss any issues with the infrastructure, networking, code readiness, resource and associated needs as well as any other computing related issues relevant to the smooth running of online operations. The forum and meeting also serves as a vehicule for passing information on time constraints and requirements to the diverse groups in a structured and cohesive manner.

Related documents

None so far.

Related meetings

  • 08W40 You do not have access to view this node
  • 08W41 You do not have access to view this node
  • 08W43 You do not have access to view this node
  • 08W45 You do not have access to view this node
  • 08W46 You do not have access to view this node
  • 08W48 You do not have access to view this node
  • 09W01 You do not have access to view this node
  • 09W02 You do not have access to view this node 
  • 09W04 You do not have access to view this node

 

Run VIII

General

This tree will contain information pertaining to run 8.

Run preparation meetings are held on Friday between 3-4 PM (room reserved up to 5 PM). The following groups are invited to join:

  • The S&C core support as appropriate
  • The online "Run Time System" people
    • DAQ - Jeff Landgraf
    • Slow Control - Will Waggoner
    • Trigger - Jon Engelage
  • All software coordinators as listed on the Organization page

The goal of the meetings are to discuss any issues with the infrastructure, networking, code readiness, resource and needs or any other computing and related issues relevant to the smooth running of online operations. The forum and meeting also serves as a vehicule for passing information on time constraints and requirements to and through the diverse group in a structured and cohesive manner.

In Run VII, the forum was used to discuss the security plan and several key reshape of the online computing structure to achieve minimum cyber-security accreditation.

Related documents

Related meetings

  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node

 

 

Experts on call

The experts on call for software related run support are:

Role Name Primary phone Office Phone Other
Oflline QA + FastOffline production Jerome Lauret (631) 786-0479 (631) 344-2450  
Gene Van Buren (631) 312-4324 (631) 344-7953 (631) 775-6620
Online QA, PPlots Paul Sorensen (510) 375-5582 (631) 344-2420  
David Kettler (206) 218-3885 (206) 616-8141  
Hardware support, online tools
Wayne Betts (631) 804-6897 (631) 344-3285  
Database Micheal DePhillips (631) 356-2257 (631) 344-2499 (631) 744-3295


When multiple choices are available, the name in bold indicates the current on-call expert. Please, consult this page prior to calling the expert.

Run VII

Background

Facing a new paradigm of introducing CyberSecurity DOE regulations into our infrastructure, several action items were presented at the 2006 run critique meeting. The presentation is attached below as STAR-Critique-06.pdf (see below). The urgent and immediate items, some of which requiring deep restructuring, were:

  • We MUST establish an internal controlled perimeter to the unroutable network. This network will be accessible via a gatekeeper model. Vulnerable devices should be isolated to the internal network layer
  • All network and communication layers must be documented
  • Physical access to console were describe as part of the Shift procedure and shit alternance. Access to the online computing infrastructure MUST be controlled
  • All systems MUST be re mediated and brought up to the proper level of OS version and safety 
    • shall exceptions be needed, the device should have the proper control and monitoring
    • isolation in the private network of node we cannot upgrade due to operational-need is the other solution
  • OS flavor reduction – We propose to reduce the OS flavors to enhance and optimize support and maintenance
  • Group account access should be regulated via keys (ssh keys) and tight to indivdiuals (no a floating password without a clear understanding of who has it)
  • root access shall be restricted
    • A list of users having root access MUST exists at any point in time. In other words, only a few (documented) users should have root access privileges.
    • We must provide best effort to implement a configuration management strategy i.e. how changes occurs in our infrastructure shall follow a procedure and lead to an updated documentation.
  • Maintenance of computing equipment will be the responsibility of the S&C, DAQ and Slow Control groups as appropriate under general guidance of the S&C group.

 

The run preparation will be established within the following guidelines

  • General
    • Assess hardware replacement and cost (display, printer, UPS, switches, ...)
    • Assess sub-system needs for resources (disk space, bandwidth, database access, ...)
  • Networking 
    • Understand and reshape the current online Network spaghetti to a two layer model, with a gatekeeper model
    • Isolate vulnerable devices on a private network
    • Provide easer a routing or gatekeeper model ; reduce dual or tri-NIC connections
    • Patch all vulnerable machine and bring all equipment to appropriate level
  • Organizational needs – root access and password 
    • Establish a in-principle layer of responsibility and accountability
    • Determine root access and generic account access and usage
    • Provide infrastructure to manage keys as a function of nodes machine
    • Document procedure and equipment, establish principles for configuration management
    • Require for new equipment to comply with baseline control
      • New equipment shall not be brought randomly but integrated as part of the online infrastructure documentation
  • Software
    • Deploy a new Web server
    • Revisit all online common tools and needs – RunLog, ShiftLog, Web interfaces ...
    • Introduce technology and paradigm change for HTML-refresh poor-man's job approach
      • technique has spread and creates heavy load
    • Review Pplots needs and coverage
    • Introduce Scaler monitoring tool
    • Revisit Ganglia monitoring with special care on broadcast/multi-cast
  • Establish a first testbed of database consolidation for high-luminosity regime 
    • With help from Slow Control – IRMIS project

Understanding our online Network

The following table is a first cut to understanding the inter-connections between online hardware.

  • ch2connect.xls shows the NFS mounts between machines
  • Network-top level.pdf is a rough first cut of the network schematic

Patching and OS version-ing

  • July 28th 2006 
    • The matrix Old_Linux.pdf displays the list of nodes requiring attention
    • Two Windows machines (Alexei's Lebedev responsibility) require immediate attention.

 

Related meeting

  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node

 

New online web server (dean.star.bnl.gov)

New web server notes for content providers and users


There is a new web server (dean.star.bnl.gov) online to replace ch2linux.star.bnl.gov.  The "online.star.bnl.gov" alias was switched to dean.star.bnl.gov at about 2pm on Tuesday, Feb. 29, 2007.  There is perhaps as much as 24 hours of DNS propagation time for the alias change to make it around the world, during which time, there could be confusion about which system (dean or ch2linux) is actually being accessed.

We plan to keep ch2linux online for 1-2 weeks to help in debugging, and as a fallback for broken content until it is fixed.

A gotcha to watch out for is the hard-coding of the "ch2linux" name in any links.  Use of the "online.star.bnl.gov" alias is generally preferable.

For those of you with individual accounts on ch2linux, the accounts have been duplicated on the new server (if you have an account, you can immediately use the key management system ( https://www.star.bnl.gov/starkeyw ) to install openssh public keys if desired on both the current (ch2linux) and new (dean) web servers).

Some hints and suggestions for content maintainers:


Some of the configuration changes between ch2linux and dean (particularly to php) may require modifcations to existing content to work properly on the new server.  With php, the change that seems most likely to bite us is "register_globals = Off".  On ch2linux, this is set to On, allowing php automatic access to variables passed in POST or GET requests.  Here is a quick primer on the effect of turning this off, taken from the php.ini file:

;     Global variables are no longer registered for input data (POST, GET, cookies,
;     environment and other server variables).  Instead of using $foo,
;     you can use $_REQUEST["foo"] (includes any variable that arrives through the
;     request, namely, POST, GET and cookie variables), or use one of the specific
;     $_GET["foo"], $_POST["foo"], $_COOKIE["foo"] or $_FILES["foo"], depending
;     on where the input originates.  Also, you can look at the
;     import_request_variables() function.
;     Note that register_globals is going to be depracated (i.e., turned off by
;     default) in the next version of PHP, because it often leads to security bugs.
;     Read http://php.net/manual/en/security.registerglobals.php for further
;     information.

A second php issue is that we'd like to keep the default setting of "display_errors = Off" in php, as a security precaution.  However, since having it turned on is often useful for debugging, we can leave it on for a week or two in the initial stages, then turn it back to off.  A common issue with these php settings, is that you might notice mostly harmless "Notice" messages from php - commonly about uninitialized variables -- we all know to always initialize our variables, right?

If your php code (or perl, or whatever) is encountering file access errors, the problem may be stemming from SELinux.  I have fixed several file contexts and the local SE policy to fix problems with the RICH Scaler plots, the RunLog Browser and tomcat.  Unfortunately, content owners may have a difficult time diagnosing such problems.  One way is to login to the server, "cause" the error and then look at the output of "dmesg |tail -n 30" (30, 40, whatever it takes) and look for an audit messeages with "avc:  denied" lines that might be related to your content.  If you see such errors, inform Wayne Betts who can look into it further.  As a quick test, we can temporarily disable SELinux to see if it clears up any problems.



Another common issue has been database access controls.  Many of our databases have fairly granular access controls, and dean may not be configured for access to everything it needs.  If that is the suspected source of any problems, Mike DePhillips can look into it.

STAR's SSH Public Key Management System

SSH Public Key Management Tool

Overview

The main from end Web interface begins from https://www.star.bnl.gov/starkeyw/  (see step by step instructions in the next section). This SSH public key management system has been designed in STAR to address the following requirements:

  • Use of two-factor authentication for remote logins
  • Allow association of remote user as a one-to-many association: a remote user may associate his/her keys to a local domain user account onto one or more local so-called  "group" account which are not tight to one individual (such account is for example an "operator" account or even the "root" account)
  • Provide a simple Web front end to users to request, view and manage their own key associations (hence easily managing access to a domain)
  • Allow a set of system administrators to easily manage key association for a domain (globally disabling users having left STAR for example)
  • Using SSH key fingerprint, allow to identify which user is logging in to which accounts (a security requirement)
  • Be able to provide upon demand a list of who had access to which account on what machine and when in one click (historical records, easily access to access grant lists)

Such system was developed for STAR and named the "SSH Key Management system" aka SKM. More information can be found in this publication. A side benefit for users also can be seen in the reduction in the number of passwords to remember and type.

Notes

  • In purpose, this system is similar to the RCF's key management system (full instructions here), but is more powerful because of its flexibility in the association of hosts (client systems), user accounts on those clients, and self-service key installation requests.
  • The STAR SKM system has been initially used for managing the online computer access and has expanded since to manage all nodes in STAR running a specialized service (offline database, web server and so on), streamlining the security model by making it consistent across nodes.
  • The system was designed to be as secured as possible (central repository of keys, pull information only from clients and NO push to avoid multiple-point-of-corruption). In other words, each clients have a light weight daemon polling and pulling the SSH key association information our of a central DB for itself and handling installing keys. Clients are not allowed to manage keys (the Web interface only does). The client daemon creates no load.

 

Where do we start? What is a typical use example?

You should use your RCF username and Kerberos password (credentials) to enter this interface.

Here is a typical scenario of the system usage: 

  1. A sysadmin of a machine named FOO creates a user account named "JDOE" and, if not done already, installs the key_services client.
  2. A user account 'JDOE' on host 'FOO' is configured in the Key Management system by a key management administrator*.
  3. John Doe uploads (via the web) his or her public ssh key (in openssh format).
  4. John Doe requests (via the web) that his key be added to JDOE's authorized_keys file on FOO.
  5. A key management administrator approves the request, and the key_services client places the key in ~JDOE/.ssh/authorized_keys.

* Current admins are Wayne Betts and Jerome Lauret.

At this point, John Doe has key-based access to JDOE@FOO.  Simple enough?  But wait, there's more!  Now John Doe realizes that he also needs access to the group account named "operator" on host BAR.  Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR.  And if Mr. Doe should leave STAR, then an administrator simply removes (disables) him from the system and his keys are removed from both hosts.

 

More details

Slightly Deeper...

There are three things to keep track of -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:

People want access to specific user accounts at specific hosts.

The system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host . To be clear: while the Web interface allows any user to log in, the system does not have any automatic user account detection mechanism at this time, each  "{user-}account" has to be added by hand by an administrator for that account to be listed as a possible association for node FOO or BAR.

Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service.  The keyservices_client periodically (at five minute intervals by default) polls a central service for its information.  In other words, the back-end database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the related account's authorized_keys files accordingly.

In our case, orion.star.bnl.gov hosts all the server services (starkeyw and starkeyd via Apache, and a MySQL database), but they could all be on separate servers if desired.

Deployment Status and Future Plans

Only RHEL and Scientific Linux with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or Solaris. Please contact one of the admins (Wayne Betts, Jerome Lauret) if you'd like to volunteer and add your sub-system node to SRKM or if you have any questions.

User access to the Web interface is currently based on the RCF Kerberos authentication. You will hence need a valid BNL/RCF account to access the Web interface and manage key associations for your account.

In 2012, SKM was extended to implement volatile key association (lifetime and expiration may be set to each key associations). This feature allows granting access to a given user to a privileged account on a temporary debugging-need basis (as one example). This feature has also been seen as in use for group account of operational nature having rotating and changing teams at each new runs (in such case, the new list of who is associated to such account need to be re-assessed yearly and the associations would be set for example to expire after a year's period). This is a feature - the default has no expiration.

Run 19

Feedback from software coordinators

Active feedback

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashant Shanmuganathan N/A - same -
BTOF Frank Geurts - same - Frank Geurts
Zaochen Ye
ETOF Florian Seck - same - Florian Seck
Philipp Weidenkaff
HLT Hongwei Ke - same - - same -

Other software coordinators

sub-system Coordinator
iTPC (TPC?) Irakli Chakaberia
Trigger Akio Ogawa
DAQ Jeff Landgraf
...  

Run 20

Status of calibration timeline initialization

In RUN: EEMC, EMC, EPD, ETOF, GMT, TPC, MTD, TOF
Test: FST, FCS, STGC (no tables)
Desired init dates where announced to all software coordinators:

- Geometry tag has a timestamp of 20191120
- Simulation timeline [20191115,20191120[
- DB initialization for real data [20191125,...]

     Please initialize your table content appropriate yi.e.
sim flavor initial values are entered at 20191115 up to 20191119
(please exclude the edge),  ofl initial values at 20191125
(run starting on the 1st of December, even tomorrow's cosmic
and commissioning would pick the proper values).

 

 

Status - 2019/12/10

EMC  = ready
ETOF = ready - initialized at 2019-11-25, no sim (confirming)
TPC  = NOT ready [look at year 19 for comparison]
MTD  = ready
TOF  = Partially ready? INL correction, T0, TDC, status and alignement tables initialized
EPD  = gain initialized at 2019-12-15 (!?), status not initialized, no sim

EEMC = ready? (*last init at 2017-12-20)
GMT  = ready (*no db tables)



Status - 2019/12/09

EMC  = ready
ETOF = ready? initialized at 2019-11-25, no sim
TPC  = NOT ready
MTD  = ready
TOF  = NOT ready
EPD  = gain initialized at 2019-12-15 (!?), status not initialized, no sim

EEMC = ready? (*last init at 2017-12-20)
GMT  = ready (*no db tables)

 

 

Software coordinator feedback for Run 20 - Point of Contacts

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD [ TBC] - same - - same -
BTOF Frank Geurts - same - Frank Geurts
Zaochen Ye
ETOF Florian Seck - same - Florian Seck
Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Irakli Chakaberia - same -
Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  


---




Run 21

Status of calibration timeline initialization

- Geometry tag has a timestamp of 20201215
- Simulation timeline [20201210, 20201215]
- DB initialization for real data [20201220,...]

Status - 2020/12/10

 

Software coordinator feedback for Run 21 - Point of Contacts

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli

Nick Lukow

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashanth Shanmuganathan (TBC) Skipper Kagamaster - same -
BTOF Zaochen - same - Frank Geurts
Zaochen Ye
ETOF Philipp Weidenkaff - same - Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Yuri Fisyak - same - Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  
Forward Upgrade Daniel Brandenburg - same - FCS - Akio Ogawa
sTGC - Daniel Brandenburg
FST - Shenghui Zhang/Zhenyu Ye
       

---

Run 22

 

Status of calibration timeline initialization

- Geometry tag has a timestamp of 20211015
- Simulation timeline [20211015, 20211020[
- DB initialization for real data [20211025,...]

Status - 2021/10/13

 

Software coordinator feedback for Run 22 - Point of Contacts (TBC)

Sub-system Coordinator Calibration POC Online monitoring POC
MTD Rongrong Ma - same - - same -
EMC
EEMC

Raghav Kunnawalkam Elayavalli
Navagyan Ghimire

- same -

Note: L2algo, bemc and  bsmdstatus

EPD Prashanth Shanmuganathan (TBC) Skipper Kagamaster - same -
BTOF Zaochen - same - Frank Geurts
Zaochen Ye
ETOF Philipp Weidenkaff - same - Philipp Weidenkaff
HLT Hongwei Ke - same - - same -
TPC Yuri Fisyak - same - Flemming Videbaek
Trigger detectors Akio Ogawa - same - - same -
DAQ Jeff Landgraf N/A  
Forward Upgrade Daniel Brandenburg - same - FCS - Akio Ogawa
sTGC - Daniel Brandenburg
FST - Shenghui Zhang/Zhenyu Ye
       

---

Run X

Below are the related meetings:

  • You do not have access to view this node
  • You do not have access to view this node

Run XI

  • You do not have access to view this node

 

Run XIII

Preparation meeting minutes

Database initialization check list

TPC Software  – Richard Witt          NO
GMT Software  – Richard Witt          NO
EMC2 Software - Alice Ohlson          Yes
FGT Software  - Anselm Vossen         Yes
FMS Software  - Thomas Burton         Yes
TOF Software  - Frank Geurts          Yes
Trigger Detectors  - Akio Ogawa       ??
HFT Software  - Spyridon Margetis     NO (no DB interface, hard-coded values in preview codes)

 

Calibration Point of Contacts per sub-system

If a name is missing, the POC role falls onto the coordinator.
                Coordinator           Possible POC
                ------------          ---------------
TPC Software  – Richard Witt          
GMT Software  – Richard Witt          
EMC2 Software - Alice Ohlson          Alice Ohlson  
FGT Software  - Anselm Vossen         
FMS Software  - Thomas Burton         Thomas Burton    
TOF Software  - Frank Geurts          
Trigger Detectors  - Akio Ogawa       
HFT Software  - Spyridon Margetis     Hao Qiu

Online Monitoring POC

The final list from the SPin PWGC can be found at 2013 Run Tasks . The table below includes the Spin PWGC feedback and other feedbacks merged.

  Directories we inferred are being used (as reported in the RTS Hypernews)
  scaler Len Eun and Ernst Sichtermann (LBL) This directory usage was indirectly reported
  SlowControl James F Ross (Creighton)  
  HLT Qi-Ye Shou The 2012 directory had a recent timestamp but owned by mnaglis. Aihong Tang contacted 2013/02/12
Answer from  Qi-Ye Shou 2013/02/12 - will be POC.

  fmsStatus Yuxi Pan (UCLA) This was not requested but the 2011 directory is being overwritten by user=yuxip
FMS software coordinator contacted for confirmation 2013/02/12
Yuxi Pan confirmed 2013/02/13 as POC for this directory

     
Spin PWG monitoring related directories follows
  L0trg Pibero Djawotho (TAMU)  
  L2algo Maxence Vandenbroucke (Temple)  
  cdev Kevin Adkins (UKY)  
  zdc Len Eun and Ernst Sichtermann (LBL)  
  bsmdStatus Keith Landry (UCLA)  
  emcStatus Keith Landry (UCLA)  
  fgtStatus Xuan Li (Temple) This directory is also being written by user=akio causing protection access and possible clash problems.
POC contacted on 2013/02/08, both Akio and POC contacted again 2013/02/12 -> confirmed as OK.

  bbc Prashanth (KSU)  



Run XIV


Preparation meeting meetings, links

  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node
  • You do not have access to view this node

Notes

  • 2013/11/15
    • Info gathering begins (directories/areas and Point of Contacts)
      Status:
      2013/11/22, directory structure, 2 people provided feedback, Renee coordinated the rest
      2013/11/25, calibration POC, 3 coordinators provided feedback - Closed 2013/12/04
      2013/12/04, geometry for Run 14,
       
    • Basic check: CERT for online is old if coming from the Wireless
      Status: fixed at ITD level, 2013/11/18 - the reverse proxy did not have the proper CERT
  • 2013/1125

Database initialization check list

This actions suggested by this section has not started yet.

Sub-system Coordinator Check done
DAQ
Jeff Landgraf  
TPC Richard Witt  
GMT Richard Witt  
EMC2 Mike Skoby
Kevin Adkins
 
FMS Thomas Burton  
TOF Daniel Brandenburg  
MTD Rongrong Ma  
HFT Spiros Margetis (not known)
Trigger Akio Ogawa  
FGT Xuan Li  


Calibration Point of Contacts per sub-system

"-" indicates no feedback was provided. But if a name is missing, the POC role falls onto the coordinator.

Sub-system Coordinator Calibration POC
DAQ Jeff Landgraf -
TPC Richard Witt -
GMT Richard Witt -
EMC2 Mike Skoby
Kevn Adkins
-
FMS Thomas Burton -
TOF Daniel Brandenburg -
MTD Rongrong Ma Bingchu Huan
HFT Spiros Margetis Jonathan Bouchet
Trigger Akio Ogawa -
FGT Xuan Li N/A


Online Monitoring POC


scaler   Not needed 2013/11/25
SlowControl Chanaka DeSilva OKed on second Run preparation meeting
HLT Zhengquia Zhang  Learn incidently on 2014/01/28
HFT Shusu Shi Learn about it on 2014/02/26
fmsStatus   Not needed 2013/11/25
L0trg Zilong Chang
Mike Skoby
 
Informed 2013/11/10 and created 2013/11/15
L2algo  Nihar Sahoo Informed 2013/11/25
cdev   Not needed 2013/11/25
zdc   may not be used (TBC)
bsmdStatus  Janusz Oleniacz Info will be passed from Keith Landry 2014/01/20
Possible backup, Leszek Kosarzewski 2014/03/26
emcStatus  Janusz Oleniacz Info will be passed from Keith Landry 2014/01/20
Possible backup, Leszek Kosarzewski 2014/03/26
fgtStatus   Not needed 2013/11/25
bbc
 Akio Ogawa Informed 2013/11/15, created same day


Run XV

Run 15 was preapred essentiallydiscussing with indviduals and a comprehensive page not maintained.

Run XVI


This page will contain feedback related to the preparation of the online setup.

 

Notes



 

Online Monitoring POC

scaler    
SlowControl    
HLT Zhengqiao Feedback 2015/11/24
HFT Guannan Xie Spiros: Feedback 2015/11/24
fmsStatus   Akio: Possibly not needed (TBC). 2016/01/13 noted this was not used in Run 15 and wil probably never be used again.
fmsTrg   Confirmed neded 2016/01/13
fps   Akio: Not neded in Run 16? Perhaps later.
L0trg Zilong Chang Zilong: Feedback 2015/11/24
L2algo Kolja Kauder Kolja: will be POC - 2015/11/24
cdev Chanaka DeSilva  
zdc    
bsmdStatus Kolja Kauder Kolja: will be POC - 2015/11/24
bemcTrgDb Kolja Kauder Kolja: will be POC - 2015/11/24
emcStatus Kolja Kauder Kolja: will be POC - 2015/11/24
fgtStatus   Not needed since Run 14 ... May drop from the list
bbc
Akio Ogawa Feedback 2015/11/24, needed
rp    

 

Calibration Point of Contacts per sub-system

Sub-system Coordinator Calibration POC
DAQ Jeff Landgraf -
TPC Richard Witt
Yuri Fisyak
-
GMT Richard Witt -
EMC2 Kolja Kauder
Ting Lin
-
FMS Oleg Eysser -
TOF Daniel Brandenburg -
MTD Rongrong Ma (same confirmed 2015/11/24)
HFT Spiros Margetis Xin Dong
HLT Hongwei Ke (same confirmed 2015/11/24)
Trigger Akio Ogawa -
RP Kin Yip -

 

Database initialization check list



 

Online network documentation

This is to serve as a repository of information about networking in the online environment. 

 

Background as of fall 2009

The network layout at the STAR experiment has grown from a base laid over ten years ago, with a number of people working on it and adding devices over time with little coordination or standardization.  As a result, we have, to put it bluntly, a huge mess of a network, with a mix of hardware vendors and media, cables going all over the place, many of which are unlabelled and now buried to the point of untraceability.  We have SOHO switches all over the place, of various brands, ages and capabilities.  (It was only about one year ago all hubs were at least replaced with switches, or so I think – I haven’t found any hubs since then.)  There are a handful of “managed” switches, but they are generally lower-end switches and we have not taken advantage of even their limited monitoring capabilities.  (In the case of the LinkSys switches purchased one year ago, I found their management web interface poor – slow, buggy and not very helpful.)

In addition to the general messiness, a big (and growing) concern has been that during each of the past several years, there have been a handful of periods of instability in the starp network, typically lasting from a few minutes to hours (or even possibly indefinitely in the most recent cases which were resolved hastily with switch hardware replacements in the middle of RHIC runs).   The cause(s) of these instabilities has never been understood.  The instabilities have typically manifested as slow communications or complete lack of communication with devices on the South Platform (historically, most often VME processors).  Speculation has tended to focus on ITD security scanning.  While this has been shown to be potentially disruptive to some individual devices and services, broad effects on whole segments of the network have never been conclusively demonstrated, nor has there been a testable, plausible explanation for the mechanism of such instability. 

The past year included the two most significant episodes of instability yet on starp, in which LinkSys SLM 2048 switches (after weeks or months of stability) developed problems that appeared to be similar to prior issues, only more severe.  The two had been purchased as a replacement (plus spare) for a Catalyst 1900 on the South Platform.  When the first started showing signs of trouble, it was replaced by the second, which failed spectacularly later in the run, becoming completely unresponsive through its web interface and pings, and was only occasionally transmitting any packets at all, it seemed.   (After all devices were removed, and the switch rebooted, it returned to normal on the lab bench, but has not been put back into service.)

At this point, all devices were removed from the LinkSys switch and sent through a pair of unmanaged SOHO switches, which themselves each link to an old 3Com switch on the first floor.  Since then, no more instabilities have been noted, but it has left a physical cabling mess and a network layout that is quite awkward.  (And further adding to the trouble, at least one of the SOHO switches has a history of sensitivity to power fluctuations, every once in a while needing to be power-cycled after power dips or outages. 

In addition, there have been superficially similar episodes of problems on the DAQ/TRG network, which shares no networking hardware with the starp network.  As far as I know, these episodes spontaneously resolved themselves.  (Is this true?)  Speculation has been on “odd” networked devices (such as oscilloscopes) generating unusual traffic, but here too there is no conclusive evidence of the cause.  Having no explanation, it seems likely this behavior will be encountered again.
 

Core components

There are several “core” pieces currently.  Core is defined somewhat vaguely as connecting lots of devices or requiring relatively high performance: 

1.    ITD’s main switch in the DAQ room
2.    DAQ’s event builder switch in the DAQ room
3.    the starp switch on the South Platform
4.    the DAQ/TRG switch on the South Platform
5.    the Force 10 switches for the HPSS network in the DAQ room

It seems likely that any reshape will have to include those same core components, though perhaps some combinations are possible at the hardware level using VLANs or other technologies.  (combining starp and DAQ/TRG on the platform on to a single large switch, for instance)
 

ITD's Catalyst chassis in the DAQ room (subnets 60, 162, wireless and possibly others in 1006)

This switch chassis is in the networking rack in the northwest corner of the DAQ room.  It is managed by ITD.  STAR has no way to interact with this switch at the software/configuration level.  

 

Slot 1:  WS-X4013 (Supervisor II Engine, fiber uplink to 515 and local management port)

Slot 2:  WS-X4548-GB-RJ45 (48 1Gb/s copper ports @8:1 oversubscription)  port 43 is 162 subnet, rest are subnet 60.

Slot 3: WS-X4232-RJ-XX (32 copper 100 Mb/s) plus a WS-U5404-FX-MT daughter card with 4 MTRJ fiber ports at 100Mb/s)

Slot 4: WS-4148-RJ (48 copper 100Mb/s) - mix of subnets 60 and 162?

Slot 5: WS-4148-RJ (48 copper 100Mb/s)  - all subnet 60?

Slot 6: WS-X4306-GB (6 GBIC (not mini!) ports, 3 of which have 1000-SX modules with SC connectors)

 

Images and miscellaneous files

Here we can keep miscellaneous files documenting the state of the network.

First, I have attached an image showing the current (late 2009/early 2010) switch layout and links in the WAH. ("WAH_switches.pdf")

Then there is an "after" picture with a rough idea of the patch panel placement to replace most of the unmanaged switches. ("WAH_patch_panels.pdf")

For the South Platform, a more refined patch panel plan was put together in June 2010 ("Network Plan for South Platforms.doc")

There is an attachment with general guidelines for installing UTP ("Cat5e_Network_cable.ppt")

 

 

Locations needing network access

WAH: (starp and DAQ/TRG devices are scattered throughout these locations.  I am going to use the term “satellite racks” to include all locations within the C-AD PASS system that are NOT on the South Platform.  Also, note that the satellite racks are semi-mobile, and the entire detector platform (North and South) can move into the Assembly Building.):

-    PMD racks: ~3 devices on starp and ~3 on DAQ/TRG

-    FMS/FPD east side:  Handful of devices on DAQ/TRG and on star

-    Southwest corner work area: rarely more than two systems here, but might want starp, “trailers” and DAQ/TRG networks here for use as needed

-    EEMC racks, west side:  Handful of devices on DAQ/TRG and on starp

-    FPD/FMS west racks:  Handful of devices on DAQ/TRG and on starp

-    PP2PP east and west:  at least one VME processor on DAQ/TRG on each side - these are in the RHIC tunnel, technically not in the WAH.

-    South platform – (IMPORTANT NOTE:  The south platform must remain electrically isolated from the rest of the facility – there can be no conducting cables running from the South Platform to other locations)
o    First floor:  Three rows of 8-9 racks each (volatile, in that subsystems and components are installed or removed each year)
o    Second floor:  Three rows of 8-9 racks each (volatile)

-    North platform:  currently unoccupied, but has had devices in the past and a switch on the starp network is still present there, with a fiber link back to the South Platform (somewhere!)

Control Room:
-    Perimeter (~3 dozen PCs), almost all on starp, but
o    2-3 on DAQ/TRG
o    4-5 on C-AD 108
o    1-2 on C-AD 90 network?.
o    Numerous small unmanaged switches in this room currently

DAQ Room:  (Highest performance of the entire facility is needed in rack row DA, including a minimum 56-port switch with non-blocking/line rate 1Gb inter-links on the DAQ/TRG network)

-    three “rows” plus two networking racks:
o    the “old” network rack and the “new network rack” near the northwest corner
o    rack row “DA” on west side (nearest the Control Room)
o    shelf row in middle with a racks at each end.

  • Northern-most rack is ~20 nodes on “starp” – current rack has at least three unmanaged 8-port switches.
  • Remainder of row is primarily DAQ/TRG with 3-4 starp nodes - both netwokrs go through two unmanaged switches in the rack immediately to the south of the shelves.

o    East row:  ~6 stand-alone starp servers (one of which has a DAQ/TRG connection as well), along with a handful of VME devices on starp.  DAQ or trigger might have a device or two here.  The rack space is primarily occupied by devices on a C-AD network.

GMR:
- 3 PCs – generally stable area.

Clean room:
-    several jacks needed, network use may vary between starp, daq/trg and the 130.199.162 subnet depending on the active use at any time

1006C and 1006D (trailers):
    - typically only subnet 130.199.162 is needed here.


 

 

Meeting notes for week of Oct. 19, 2009

Online network reshape notes from the week of Oct. 18, 2009

During this week, three meetings were held to discuss the STAR online networking reshape plans.

The first meeting included Jeff Landgraf, Wayne Betts, Dan Orsatti (ITD) and Frank Burstein (ITD).  At this meeting the ITD network engineers presented two proposals for core network components based on information previously provided to them by STAR.  The two options were Force-10 based and Cisco-based, with costs of approximately $150,000 and $100,000 respectively.  They included a shared infrastructure for the DAQ/TRG and STARP networks, including a switch redundancy in the DAQ room to handle the two networks and meet DAQ’s relatively high performance needs in the DAQ room.  These ITD options are generally smart, expandable, highly configurable and well-supported by ITD, and meet the initial requirements.

However, in informal discussions since then, Bill Christie suggested that we should consider the possibility of radiation damage and/or errors in any electronic equipment in the WAH.  While this had been mentioned as a possibility in the past, it was not generally taken seriously by those of us in STAR looking after the networks.  Nor is there any way for us to test this to a standard of “beyond reasonable doubt” (or any other standard really).  At Bill’s suggestion, we (Jeff L., Wayne B., Jack E., Yuri G. and Bill C.) met with three members of  C-AD’s networking group, who stated they were certain that radiation could impair switches and strongly suggested that ITD’s suggested equipment was inappropriate for a radiation area.  They also provided some feedback from individuals at two other laboratories that networking equipment in radiation areas are subject to upsets, with one explanation for effects on metal-oxide semiconductors, which at face value would suggest that newer (thus generally smaller) electronic components would be less susceptible, however my intuition is that smaller electronics are denser, and more easily upset by smaller deposited charge, and thus might be more susceptible. 

Here are excerpts from the other labs:

From JLab:  "The flash memory loses its ability to hold data, making it
useless. We have worked around the problem by pulling cable or fiber
back to lower radiation areas wherever we can. Because we made these
cabling changes when we were only using cisco fixed-configuration
100Mbit switches ( 29XX models), I have no data for Gigabit switches.
Since our experience is that it's the flash memory that fails, I'd
expect no better performance from any other switches. All of our
switches that use modular supervisor modules are outside of radiation
areas."

From FermiLab:  "The typical devices used employ metal oxide
semiconductors and the lock up happens when ionizing radiation is
trapped in the gate region of the devices. We see this happen at our two
detectors (CDF and DZero) when losses go up and power supplies circuits
latch up. The other thing working in the positive direction is that when
IC feature sizes go down, there is less likelihood for the charge to get
trapped so they are more radiation tolerant. Having said all that I
can't answer your specific question because we don't put switches or
routers in the tunnel at all."

All this said, the general consensus was that we should move as much “intelligence” as far away from the beam line as reasonably possible.  (Until now, the “big” switches on the platform have actually been about as close to the beam line as possible!)  This means putting any switches in rack rows 1C.  Given both the cost and the radiation concern, we (the STAR personnel) agreed to investigate less expensive switches than ITD’s suggestion, while trying to provide some level of intelligence for monitoring.  We also have a consensus that the DAQ/TRG and STARP networks should try to use common hardware whenever possible, and that we should work to remove as many SOHO-type unmanaged switches as possible as time permits (replacing them with well-documented and labelled patch panels feeding back to core switches).  The C-AD personnel also recommended Cisco’s 2950, 2960 and 3750 switches and Garrett products in general.  One more miscellaneous tidbit from Jack we should avoid LanCast media convertors.

The final meeting of the week included Jerome, Wayne and Matt Ahrenstein, in which Jerome was briefed on the two prior meetings and he generally agreed with the direction we are taking.  At this meeting, we selected an additional area to try to clean-up before the run, specifically the racks on the west side, where there are at least four 8-port unmanaged switches (3 on DAQ/TRG and one on STARP).  He also suggested we consult with Shigeki from the RACF about the whole affair, and is trying to arrange such a meeting as soon as possible.

In addition to this, Jeff has also stated that while either ITD solution would meet DAQ’s needs for several years, he believes he can obtain adequate performance for far less money with lower end equipment.  Here is Jeff's latest on the DAQ needs for the network:

 

"My target is 20Gb/sec network capability across switches.   In likely 
scenarios, the network capability would be significantly higher than 
this because hi bandwidth nodes would all be on the same switch 
(ironically, the cheaper switches mostly seem to be line-speed switches 
internally, unlike the big cisco switches...)    However, in the current 
year, I'll have a hard limit of 12 gigabit ethernet cards incoming on 
EVBs for a hard max of 12Gb/sec.    The projected desired data, 
according to the trigger board is around 6Gb/sec (600MB/sec).   I don't 
expect much more than a factor of two through the EVBs above this 
600MB/sec in the lifetime of STAR (meaning current TPC + HFT + FGT), 
although there are big uncertainties particularly for the HFT.     The 
one lump in the planning involves potential L3 farms - and I don't know 
how this will play out.   There are many scenarios some of which would 
not impact the network (ie... specialized hardware plugged into the TPX 
machines...),  but my current approach is that the network needs will 
have to be incorporated in the L3 farm design plan..." 


 

Where does this leave us?  We need to quickly evaluate options for the “big” switches for the DAQ room and the South Platform.  The DAQ and Trigger groups have 3(?) similar managed switches that might be adequate for the South platform (including a spare), and we should look into the Cisco models suggested by C-AD.  We also should let ITD make another round of suggestions based on our discussions to date, and especially focus with them on what to do with the large ITD switch in the DAQ room that currently has the link to the rest of the campus “public” network.  And we need to do this rather hastily.

 

 

 

Open Questions

Do we support multiple networks on single switches with VLANs, switch port segmentation or other means?  For instance, at remote spots, like PMD’s racks, can we put in a single switch and have it handle both starp and DAQ/TRG?  Daniel Orsatti's most recent advice was leaning towards having a few large switches in four or five core places with VLANs and installing patch panels at or near the various locations needing network connections.
 
Is there a single brand/line of switch equipment that meets most or all of our goals?  Can we get a line of switch products that includes a range from small (~8 port) switches up to the large switches required for DAQ’s event builders or ITD’s main switch, such that they can interoperate and be part of shared monitoring?  (If we go with a patch-panels-to-big-switches approach, then the small switches would not be necessary.)

What kind of monitoring can we expect and how much effort will it take for it to be useful?  SNMP-based?  Nagios?  Etc…

Can we setup a shared but “private” monitoring network for the managed switches, such that starp and DAQ/TRG monitoring share the same infrastructure?  (Most likely, yes.)

Can fiber connectors be easily changed/replaced/repaired?  STAR apparently does not have the tools to terminate fibers at this point.  Do we want to acquire the tools and know-how to do this, or continue to rely on ITD and/or folks like Frank Naase (C-AD) who have done most of our fiber termination to date? 

Overview of the reshape started in 2009

The goal of the online networking reshape is to provide a stable and well-understood networking environment with the possibility of future expansion to meet STAR’s foreseeable needs over time.  The physical layout needs to be well understood, with elements of redundancy and/or easily swapped parts on hand as much as possible.  The devices on the network should be known, including their location, what other systems they are expected to interact with and traffic volumes.  Significant networking errors should be detected at the switch level and allow for troubleshooting without significant disruption to large parts of the network. 

 

Along the way, it will be very useful to increase the availability of knowledge and sources of assistance related to the network.  Naturally this calls for a well documented network in any case.  Consolidating networking hardware into a common brand or line for the multiple online networks (which are currently a hodgepodge) may reduce the number of errors encountered, improve the ability of STAR's personnel to understand more fascets of the networking environment and allow for better monitoring of the network performance.  Our network should mesh well with existing ITD infrastructure so that their expertise can be brought to bear as needed.  However, ITD expertise cannot be the sole source of support for the online networks – at least two individuals in STAR (but not much more than that) should have broad access to realtime network data and configuration.  STAR’s 24-hour on-call experts (DAQ and online computing in particular) need to be able to respond quickly to incidents and gather clues and information from all sources.

Plan of action / critical path items

I think we need to start from the core and work outwards.  This will allow us to finish as much as possible before the run starts and start to see the most benefits as early as possible.  The two big pieces at the core (in order of importance) are:

1. DAQ’s event builder switch, which calls for 56 (let’s say 64) non-blocking/line speed 1Gb/s ports.  No matter what, this piece needs to be put in place before the run starts.  We can probably limp by with everything else as it exists now if we have to, but this has to be a new piece of hardware in place before December 1 (is this a reasonable deadline?).

2. Whatever ITD wants to replace the current Catalyst 4000-series chassis and blades in the DAQ room.

After this, the next items for consideration/replacement are the starp and DAQ/TRG switches on the South Platform.

Then it is on to the satellite racks in the WAH with their relatively small number of devices.

Then the DAQ room, cleaning up the handful of unmanaged switches that exist for both starp and DAQ/TRG.

Control Room clean-up.  The available wall jacks in the Control Room are insufficient for the number of devices, and many of the jacks are inaccessible behind the west side console, but at least this area is always accessible and has had few problems, so it isn’t a high priority.

 

Remote power cycling network switches in the WAH

April 1, 2025 (no, not an April Fools!) - THIS PAGE IS OBSOLETE


Instead, please refer to https://drupal.star.bnl.gov/STAR/public/operations/WAH-Network-Switch-NPS-details

This documents the Network Power Switch plugs used to remotely power cycle STAR's network switches in the Wide Angle Hall.

Updated February 8, 2019  (Ideally, STAR's RackTables would be the definitive source for this information, but it is far from complete.) 

 

*ID Location Switch IP name NPS IP name NPS plug NPS access method NPS type
             
SW22 east racks east-trg-sw.trg.bnl.local pxl-nps.starp.bnl.gov  8  telnet, http (ssh and https available, but not enabled)  APC AP7901 (August 2015)
SW56 east racks east-s60.starp.bnl.gov eastracks-nps.trg.bnl.local  8  ssh (slow to respond to initial connection)  APC AP7901 (August 2012)
SW59 SP 1C4 splat-s60.starp.bnl.gov netpower1.starp.bnl.gov  3  telnet, http  APC
SW2 SP 1C4 splat-trg2.trg.bnl.local netpower1.starp.bnl.gov  1  telnet, http  APC
SW27 SP 1C4 switch1.trg.bnl.local netpower1.starp.bnl.gov  2  telnet, http  APC
SW60 SP 1C4 splat-s60-2.starp.bnl.gov netpower2.starp.bnl.gov  A1  ssh (has key for wbetts)  WTI NPS-8
SW28 SP 1C4 switchplat.scaler.bnl.local netpower2.starp.bnl.gov  A2  ssh ssh (has key for wbetts)  WTI NPS-8
SW55 west racks west-s60.starp.bnl.gov westracks-nps.trg.bnl.local  1  ssh, http  APC
SW30 west racks switch2.trg.bnl.local eemc-pwrs1.starp.bnl.gov  A4  telnet  old WTI
SW51 NP 1st floor nplat-s60.starp.bnl.gov north-nps1.starp.bnl.gov  1  telnet, ssh, http  APC AP7900B (January 2019)

Reshape design goals

A.  Only use managed switches and have each networked device plug directly into a managed switch port.
   
-    Eliminate all “dumb” consumer/SOHO/desktop switches – they are not robust,  add to confusion when troubleshooting and prevent isolation of individual devices
-    allow the blocking of any single device at any time through its nearest  switch’s management interface
-    block the addition of any new, unknown nodes and/or be informed of anything showing up unexpectedly
-    ability to monitor individual ports for traffic volumes, link settings, errors, major links going down, preferably with some history/logging.
-    allow real-time monitoring and alerts for unusual event (capabilities will be hardware/vendor dependent and subject to available time to develop monitoring tools and become familiar with capabilities)


B.  All devices should be within 10-15 feet of a “core” patch panel or network switch.
-    Individuals working on detector subsystems should not have to install network cables that cross rack rows, go from one floor (or room) to another, etc.
-    Piecemeal additions of network segments by subsystems should not be done – that is to say, no one should be adding switches to the network other than core personnel using “approved” devices consistent with the rest of the network components.
-    This calls for cabled and labeled patch panels and/or switches liberally placed throughout the WAH, the Control Room and the DAQ Room.   



C.    Some degree of “commonality” between the infrastructures of the starp and DAQ/TRG networks.  Same line of hardware, media convertors (when needed), switches, monitoring tools, possibly even shared switches with VLANs.  This is a big question – are VLAN’s viable to share switch hardware amongst starp and DAQ/TRG?  A shared “private” management network for the switches is likely a good idea. 

D.    An easily extensible network, such that new locations can be added easily, and existing locations can have additional capacity added and subtracted in accord with the other goals.

E.    Redundant links (fibers or copper, as appropriate) available between all linked core components (preferably with automatic failover).

F.    Spares on hand for just about everything – a good reason to use as few models of hardware as possible.  If we develop a plan with 10 small 8-port switches in various locations, ideally all 10 will be identical and we will have one or two spares on the shelf at all times.

G.    All network components should be on UPS power so that short and/or localized power outages do not bring down portions of the network.  This is not terribly important, but should be kept in mind and allowed for when feasible.

H.  (Added after the initial items above)  Move IC-based devices (switches) away from beam line and attempt to reduce radiation load.  Our working hypothesis, based on anecdotal evidence, is that at least some of the networking problems last year were caused by errors caused by radiation.  The two "big" switches on the South Platform have historically always been in just about the WORST place for radiation load, so these need to be moved away from the beam line.

 

Rules to Live By in Online Networking at STAR

Document everything!

All hardware with an IP address should be labelled.

All installed cables should have a label on each end that is adequate to quickly locate the other end.

All patch panel ports with cables connected should be labelled appropriately to identify the other end.

All network equipment (switches, patch panels, cable runs, etc.) need to be documented, preferably in appropriate documents in Drupal.

 

Copper connections:

Use Cat5e or higher graded cables.

Use yellow cables for devices connected to the STARP network (130.199.60-61.x IP addresses).

Use green cables for devices connected to the DAQ/TRG network (172.16.x.x IP addresses).

Use colors other than yellow and green for any other network connections.

Use T568A termination when adding connectors to bare cable.

 

Fiber Connections:

Use 50 micron multi-mode fiber.

Use 1000Base-SX fiber transcievers where possible.

 

STAR networks in 1006 and their nicknames

“starp”:  130.199.60.0/23

“DAQ/TRG”: 172.16.0.0/16  (non-routed)

“HPSS”:  RCF network for DAQ → HPSS transfers

“Alexei”: Alexei’s video camera and laser network (currently consists of a switch on the South Platform and a switch in the DAQ room connected by a fiber pair?).  This includes 3-4 PCs including obsolete Windows OSes (e.g. Win 98).  No devices on this network are dual-homed, so it is very isolated from everything else and is mentioned here for completeness.

“trailers”:  130.199.162.  - includes wired connections for printers, vistors’ laptops and workstations not directly involved in operations and may exist outside of the trailers, such as the Control Room for visitors’ laptops while on shift.

“Wireless”: Not really relevant conceptually, but there are also three ITD wireless access points in the area.

“C-AD 108” and “C-AD 90”:  C-AD has at least two networks operating in the DAQ and Control Rooms, which are left well enough alone in their hands, but are mentioned here for the sake of completeness.
 

Shift Accounting

This page will now hold the shift accounting pages. They complement the Shift Sign-up process by documenting it.

Admin interface access

Run 16 shift dues


Dues

Requests to serve additional shifts should be made PRIOR to the final calculation of the dues to the shift committee (D. Keane and D. Smirnov). Please, refer to the important dates section in this document for the "until when" you could make such request.

Past shortfalls, shift coverage by institution

The below table shows the percentage of missed shifts over the past 4 years. This information can be used to exclude authors from the author list in Run 16.

Note: If your institution is in this table and fails again to fulfill its dues and the 4 years average is below the threshold defined by our author exclusion policy (see STAR Note 0545), author would be excluded.

Institution Missed percentage, historical
Frankfurt Institute for Advanced Studies (FIAS) 71%
Institute of Modern Physics, Lanzhou 41%
University of Rajasthan 31%
Pusan National University 25%

Shift sign-up - Run 16

Important dates

  • 2015/12/04 - initial shift dues calculated - council feedback requested.
  • 2015/12/04 - Shift sign-up opens for TESTING purposes only - you may exercise the interface by emulating a sign-up
  • 2015/12/13 - The shift sign-up committee needs all council feedback by 12/11. Final shift dues will then be re-computed and provided (they should not change much and will account for all reported changes by that date)
  • 2015/12/16 - The testing interface will be turned OFF that day and all test records flushed/removed. The countdown will begin.
  • 2015/12/17 - Opening will occur at 10 AM BNL time - please, remember to log prior and wait for the countdown to open the signing

Shift Layout, Period Coordinators and special arrangements

Shift layout

STAR shifts begin January 12, 2016 with cosmic data taking shifts.

Period coordinators

As usual, period coordinators are pre-assigned / pre-signed as selected by the Spokesperson office.

Special arrangements and requests

  • UTA has requested for Lanny Ray (QA coordinator) to have the first QA shift.
    Status: the shift sign-up coordinators have had flexibility for such arrangements.
  • FIAS requested 10 shifts to catch-up for unfilled dues in past years.
  • 2) We've agreed to pre-assign the following QA shifts under the new family-related policy:
    Sevil Salur (LBNL)      FEB 16
    Richard Witt (Yale)     FEB 23
    Juan Romero (UC Davis)  MAY 31
    
    3) Bob Tribble (TAMU) is pre-assigned to a shift during  APR 12-19.
    
    4) To correct an unusual rounding anomaly, we've agreed to subtract one week from Valparaiso U dues.  

  • Dec 19, 2015. Run 16 will be shorter by two week than originally planed: 20 weeks total instead of 22
    Dear STAR Collaborators:
    
    We have just received the guidance from DOE (to BNL) that there will be
    20 cryo-week of RHIC run instead of the originally planned 22 weeks.
    
    Our shift sign-up was designed for 22 weeks. For those who have already
    signed up for the last two weeks, please try to un-sign and help to fill other
    open slots. By now we have 8 open slots and need to un-sign 24.
    
    For those who are not able to re-sign to other spots, we will credit your dues,
    but may ask for help if slots open due to unexpected events (visa etc.).
    
    I am looking forward to a successful run 16 and exciting physics from it.
    
    
    Happy Holidays!
    
    Zhangbu
    

    Below is a screen shot of the last two weeks of original shift schedule Run 16 as of Jan 5, 2016.
    Only those who signed up for shift before Dec 19, 2015 will be eligible for a credit:

    Anju Bhasin, University of Jammu
    Evan Finch, Brookhaven National Laboratory
    Abhinav Sharma, University of Jammu
    Yuri Panebratsev, Joint Institute for Nuclear Research
    Madan Aggarwal, Panjab University
    Isaac Upsal, Ohio State University
    Yang Wu, Kent State University
    Grazyna Odyniec, Lawrence Berkeley National Laboratory
    Kunsu Oh, Pusan National University
    Liwen Wen, University of California - Los Angeles
    Saskia Mioduszewski, Texas A&M University
    Maowu Nie, Shanghai Institute of Applied Physics
    Abhinav Sharma, University of Jammu
    Renee Fatemi, University of Kentucky
    Madan Aggarwal, Panjab University
    Subhash Singha, Kent State University
    Liang He, Purdue University
    Declan Keane, Kent State University
    Sonya Kabana, Kent State University (offline QA)

    The following shifters signed up after the announcement:

    Devika Gunarathne, Temple University
    Amani Kraishan, Temple University

    Before:


    After:

Run 17 shift dues


Dues

Requests to serve additional shifts should be made PRIOR to the final calculation of the dues to the shift committee (D. Keane and D. Smirnov). Please, refer to the important dates section in this document for the "until when" you could make such request.

Past shortfalls, shift coverage by institution

The below table shows the percentage of missed shifts over the past 4 years. This information can be used to exclude authors from the author list in Run 17.

Note: If your institution is in this table and fails again to fulfill its dues and the 4 years average is below the threshold defined by our author exclusion policy (see STAR Note 0545), author would be excluded.

   

Shift sign-up - Run 17

Important dates

  • 2016/11/29 - initial shift dues calculated - council feedback requested.
  • 2016/11/29 - Shift sign-up opens for TESTING purposes only - you may exercise the interface by emulating a sign-up
  • 2016/12/13 - The shift sign-up committee needs all council feedback by 12/11. Final shift dues will then be re-computed and provided (they should not change much and will account for all reported changes by that date)
  • 2016/12/19 - The testing interface will be turned OFF that day and all test records flushed/removed. The countdown will begin.
  • 2016/12/20 - Opening will occur at 10 AM BNL time - please, remember to log prior and wait for the countdown to open the signing

Shift Layout, Period Coordinators and special arrangements

Shift layout

STAR shifts begin January XX, 2017 with cosmic data taking shifts.

Period coordinators

As usual, period coordinators are pre-assigned / pre-signed as selected by the Spokesperson office.
 

Feb. 7-March 7  Oleg Eyser (BNL)
March 7- April 4 Sal Fazio (BNL)
April 4 – April 28 Shuai Yang (BNL)
April 28-May 23 Xiaofeng Luo (CCNU)
May 23 – June 20 Jinlong Zhang (LBL)
June 20 – July 11 Nihar Sahoo (TAMU)

 

Special arrangements and requests


0) Bob Tribble: SL, evening, beginning Mar 21
1) Pavla + Pavol: 5 shifts as below.
2) Juan Romero wants QA for 1 week, beginning May 02.
3) Sevil Salur wants QA for 1 week, beginning Mar 07.
4) Richard Witt wants QA for 1 week, beginning Mar 21.
5) Lanny Ray, as always, is pre-assigned the first QA shift.
6) Jan Rusnak wants QA for 1 week, beginning Apr 04.
7) FIAS wants pre-assigned shifts like last year:
Day, beginning Apr 4: Belousov, 2 weeks of shift crew;
Evening, beginning Apr 4: Pugash, 2 weeks of shift crew;
Day, beginning Apr 4: Vassiliev, 1 week DO trainee + 1 week DO;
Evening, beginning Apr 4: Zyzak, 1 week DO trainee + 1 week DO

 

Run 18 shift dues


Run 18 Shift Dues & Notes


Period coordinators

As usual, period coordinators are pre-assigned, as arranged by the Spokespersons.

Special arrangements and requests

  1. Under the family-related policy, the following 6 weeks of offline QA shifts were pre-assigned:
    MAR 27 Kevin Adkins (Kentucky)
    APR 03 Kevin Adkins
    APR 10 Sevil Salur (Rutgers)
    APR 17 Richard Witt (USNA/Yale)
    MAY 22 Juan Romero (UC Davis)
    JUN 12 Terry Tarnowsky (Michigan State)
     
  2. Lanny Ray (UT Austin), as QA coordinator, always is pre-assigned the first QA week.
     
  3. FIAS remains in “catch-up mode” and is taking extra shifts above their dues. Pre-assigned shifts can be requested in this scenario. FIAS has been pre-assigned 4 Detector Op shifts.
     
  4. Bob Tribble (TAMU) requests the evening Shift leader slot during Apr 10-17.

Run 19 special requests

The following pre-assigned slot requests were made.
    9 WEEKS PRE-ASSIGNED QA AS FOLLOWS
    ==================================
    Lanny Ray (UT Austin) QA Mar 5
    Richard Witt (USNA/Yale) QA Mar 19
    Sevil Salur (Rutgers) QA Apr 16
    Wei Li (Rice) QA Apr 23
    Kevin Adkins (Kentucky) QA May 14
    Juan Romero (UC Davis) QA May 21
    Jana Bielcikova (NPI, Czech Acad of Sci) QA May 28  
    Yanfang Liu (TAMU) QA June 25 
    Yanfang Liu (TAMU) QA July 02
    
    8 WEEKS PRE-ASSIGNED REGULAR SHIFTS AS FOLLOWS
    ==================================
    Bob Tribble (BNL) Feb 05 SL evening 
    Daniel Kincses (Eotvos) Mar 12  DO Trainee Day
    Daniel Kincses (Eotvos) Mar 19  DO Day
    Mate Csanad (Eotvos) Mar 12 SC Day
    Ronald Pinter (Eotvos) Mar 19 SC Day
    Carl Gagliardi (TAMU)  May 14  SL day
    Carl Gagliardi (TAMU)  May 21 SL day 
    Grazyna Odyniec (LBNL) July 02 SL evening
    
    

Shift Dues and Special Requests Run 20

For the calculation of shift dues, there are two considerations.
1) The length of time of the various shift configurations (2 person, 4 person no trainees, 4 person with trainees, plus period coordinators/QA shifts)
2) The percent occupancy of the training shifts

For many years, 2) has hovered about 45%, which is what we used to calculate the dues.  Since STAR gives credit for training shifts (as we should) this needs to be factored in or we would not have enough shifts.

The sum total of shifts needed are then divided by the total number of authors minus authors from Russian institutions who can not come to BNL.

date                  weeks           crew           training           PC           OFFLINE          
11/26-12/10    2                  2                      0                  0           0           
12/10-12/24    2                  4                      2                 1            0   
12/24-6/30      27                4                      2                 1            1   
7/02-7/16        2                  4                      0                 1            1   

Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 522 shifts.
The total number of shifters is 303 - 30 Russian collaborators = 273 people
Giving a total due of 1.9 per author.

For a given institution, their load is calculated as # of authors - # of expert credits x due -> Set to an integer value as cutting collaborators into pieces is non-collegial behavior.

However, this year, this should have been:
date                  weeks           crew           training           PC           OFFLINE          
11/26-12/10    2                  2                      0                  0           0           
12/10-12/24    2                  4                      2                 1            0   
12/24-6/02      23                4                      2                 1            1   
6/02-6/16        2                  4                      0                 1            1   

Adding these together (3x a shift for crew, 3x45% for training, plus pc plus offline) gives a total of 456 shifts for a total due of 1.7 per author.

We allowed some people to pre-sign up, due to a couple different reasons.

Family reasons so offline QA:
James Kevin Adkins
Jana Bielčíková
Sevil Selur
Md. Nasim
Yanfang Liu

Additionally, Lanny Ray is given the first QA shift of the year as our experience QA shifter.

This year, to add an incentive to train for shift leader, we allowed people who were doing shift leader training to sign up for both their training shift and their "real" shift early:
Justin Ewigleben
Hanna Zbroszczyk
Jan Vanek
Maria Zurek
Mathew Kelsey
Kun Jiang
Yue-Hang Leung

Both Bob Tribble and Grazyna Odyniec sign up early for a shift leader position in recognition of their schedules and contributions

This year because of the date of Quark Matter and the STAR pre-QM meeting, several people were traveling on Tuesday during the sign up.  These people I signed up early as I did not want to punish some of our most active colleagues for the QM timing:
James Daniel  Brandenburg
Sooraj Radhakrishnan

3 other cases that were allowed to pre-sign up:
Panjab University had a single person who had the visa to enter the US, and had to take all of their shifts prior to the end of their contract in March.  So that the shifter could have some spaces in his shifts for sanity, I signed up:
Jagbir Singh
Eotvos Lorand University stated that travel is complicated for their group, and so it would be good if they could insure that they were all on shift at the same time.  Given that they are coming from Europe I signed up:
Mate Csanad
Daniel Kincses
Roland Pinter
Srikanta Tripathy
Frankfurt Institute for Advanced Studies (FIAS) wanted to be able to bring Masters students to do shift, but given the training requirements and timing with school and travel for Europe, this leaves little availability for shift.  So I signed up:
Iouri Vassiliev
Artemiy Belousov
Grigory Kozlov

Tools

This is to serve as a repository of information about various STAR tools used in experimental operations.

EVO

This section contains information about using EVO for STAR meetings.

If you would like to be able to use EVO in the 1006 trailer, there is a conference PC setup for use.  There is a generic account on the computer for everyone to share.

The account credentials are:
Username: rhicstar
Password: (See below)
Log On To: Conference (This computer)

I will not post the password anywhere that is not encrypted for security purposes, so please come see me in my office (Building 510 Room 1-179) or send me an e-mail containing your GPG public key.  If you do not have a GPG public key, please bring your laptop, (for desktop users, call me, and I'll come to see you) and I'll help you set it up.  It is quite useful.

FUSE & SSHFS - Overview and example in STAR online environment

FUSE (Filesystem in Userspace)


FUSE is a kernel module that acts as a bridge between the kernel’s built-in filesystem functions and user-space code that “understands” the (arbitrary) structure of the mounted content.  It allows non-root users to add filesystems to a running system.

Typically, FUSE-mounted filesystems are (nearly) indistinguishable from any other mounted filesystem to the user.

Some examples of FUSE in action:

  • WikipediaFS - viewing and editing Wikipedia articles as if they are local files.
  • Archive access - accessing and in some cases manipulating files in tarballs, zip archives, cpio archives, etc.
  • Encrypted filesystems
  • Union of filesystems (as is done in many live Linux boot disks and Linux installation routines to merge the read-only CD-rom filesystem with read-write space on disk)
  • Event Triggering - FUSE implementations can have triggered events.  Some possible uses might be:
    • automatically restarting a service if its configuration file is altered
    • automatically re-compiling code whenever a source file is changed
    • making a back-up after a file is changed
  • Arbitrary hardware interface
  • ... and the one we will focus on here:  SSHFS

The Fuse project FileSystems page has a more complete list and links to individual software projects that use FUSE.

 
SSHFS (Secure Shell Filesystem)


SSHFS allows a user (not necessarily root) on host A (the "client") to mount a directory on host B (the "server") using the (almost) ubiquitous SSH client-server communication protocols.  Generally, no configuration changes or software installations are required on host B.

The directory on host B then looks like a local directory on host A, at a location in host A's directory structure chosen by the user (in a location where user A has adequate privileges of course).

Unlike NFS, the user on host A must authenticate as a known user on host B, and the operations performed on the mounted filesystem are performed as known user on host B.  This avoids the "classic" NFS problem of UID/GID clashes between the client and server.

Here is a sample session with some explanatory comments:

In this example, host A is "stargw1" and host B is "staruser01".  The user name is wbetts on both hosts, but the user on host B could be any account that the user can access via SSH.
 
First, create a directory that will serve as the mountpoint:

[wbetts@stargw1 ~]$ mkdir /tmp/wbssh
[wbetts@stargw1 ~]$ ls -ld /tmp/wbssh
drwxrwxr-x  2 wbetts wbetts 4096 Oct 13 10:52 /tmp/wbssh

Second, mount the remote directory using the sshfs command:

[wbetts@stargw1 ~]$ sshfs staruser01.star.bnl.gov: /tmp/wbssh


In this example, no remote username or directory is specified, so the remote username is assumed to match the local username and the user’s home directory is selected by default.  So the command above is equivalent to:

% sshfs wbetts@staruser01.star.bnl.gov:/home/wbetts /tmp/wbssh

That’s it!  (No password or passphrase is required in this case, because wbetts uses SSH key agent forwarding) 

Now use the remote files just like local files:

[wbetts@stargw1 ~]$ ls -l /tmp/wbssh |head -n 3
total 16000
-rw-rw-r--  1 1003 1003    6412 Oct 19  2005 2005_Performance_Self_Appraisal.sxw
-rw-rw-r--  1 1003 1003   10880 Oct 19  2005 60_subnet_PLUS_SUBSYS.sxc
[wbetts@stargw1 ~]$ ls -ld /tmp/wbssh drwx------  1 1003 1003 4096 Oct 11 15:56 /tmp/wbssh


The permissions on our mount point have been altered -- now the remote UID is shown (a source of possible confusion) and the permissions have morphed to the permissions on the remote side, but this is potentially misleading too…

[root@stargw1 ~]# ls /tmp/wbssh
ls: /tmp/wbssh: Permission denied

Even root on the local host can’t access this mount point, though root can see it in the list of mounts.
 
In addition to the ACL confusion, there can be some quirks in behaviour, where sshfs doesn't translate perfectly:

[wbetts@stargw1 ~]$ df /tmp/wbssh
Filesystem                                       1K-blocks       Used     Available        Use%     Mounted on
sshfs#staruser01.star.bnl.gov:    1048576000         0     1048576000       0%     /tmp/wbssh


Ideally the user unmounts it once finished, else it sits there indefinitely (it is probably subject to the same timeouts (TCP, firewall conduit, SSH config, etc.) as an ordinary ssh connection, but in limited testing so far, the connection has been long term)  Here is the unmount command:

[wbetts@stargw1 ~]$ fusermount -u /tmp/wbssh/
[wbetts@stargw1 ~]$ ls /tmp/wbssh
[wbetts@stargw1 ~]$

Some additional details:

By default, users other than the user who initiated the mount are not permitted access to the local mountpoint (not even root), but that can be changed by the user, IF it is permitted by the FUSE configuration (as decided by the admin of the client node).  The options though are not very granular.  The three possible options are:

  1. access for the user who mounted it (and no one else)
  2. the mounter plus root
  3. everybody

In any case, whoever accesses the mount point will act as (and have the permissions of) the user on host B specified by the mounter.  This requires careful evaluation of the options permitted and user education on the possibilities of allowing inappropriate or unnecessary access to other users.

The mount is not tied to the specific shell it is started in.  It lasts indefinitely it seems – the user can log out of host A, kill remote agents, etc. and the mount remains accessible on future logins.  (Interpretation: an agent of some sort is maintained on the client (host A) on the user’s behalf.  (If multiple users have access to the user account on A, this could be worrisome, in the same manner as the allowance of others to access the mount point mentioned above.)) 

 

Here are some potential advantages and benefits of using SSHFS, some of which are mentioned above:

  • User-initiated
  • Encrypted communications over the network
  • Authenticated (at first order) – somewhat better user tracing than NFS
  • SSH keys/forwarding can make it relatively painless (no pass{words,phrases} required for mounting)
  • Networking/firewalling is simple – if ssh works between the two nodes, then so will sshfs (unlike NFS, where port configuration and firewalls are a pain)
  • “Passthrough” mounting works -- an sshfs mount point can be mounted from another node (if host B mounts a directory on C, then A can mount B's mountpoint and have access to C's filesystem.  In this case, B acts as both a client (to C) and a server (to A).)
  • No server-side configuration is needed.
  • These mounts can be automounted by the user somewhat like autofs using afuser ( http://afuse.sourceforge.net/ ), though this is primarily for interactive use based on SSH agents.

 

And some drawbacks:

  • User initiated (they are unlikely to clean up after themselves)
  • Access controls are either very strict (by default), or very lax in the hands of users (-o allow_other or -o allow_root) -- nothing else
  • Cross-system UID overlap and ACLs can be confusing
  • Availability of FUSE for RHEL/SL 3 and other clients?
  • Use of SSHFS in scripts could entice users to create SSH keys without passphrases -- a real no-no!

And some final details about the configuration of the online gatekeepers that presumably are prime candidates for the use of SSHFS:

The standard installation of FUSE for Scientific Linux 4 seems to not be quite complete.  A little help is required to make it work:

In /etc/rc.d/rc.local:

/etc/init.d/fuse start
/bin/chown root.fuse /dev/fuse
/bin/chmod 660 /dev/fuse


“fuse” group created – each user who will use SSHFS needs to be a member of this group (must be kept in mind if we use NIS or LDAP for user management on the gateways)

 

Server Logging

The default openssh packages  from Scientific Linux 3, 4 and 5 (~openssh 3.6, 3.9 and 4.3 respectively) do not support sftp-subsystem logging.  Later versions of openssh do (starting at version ~4.4).  This provides the ability to log file accesses and trace them to individual (authenticated) users. 

I grabbed the latest openssh source (version 5.1) and built it on an SL4 machine with no trouble:

% ./configure --prefix=/opt/openssh5.1p1 --without-zlib-version-check --with-tcp-wrappers
% make
% make install

 

Then in the sshd_config file, append "-f AUTHPRIV -l INFO" to sftp-subsystem line.  This activates the logging level (INFO) and causes the logs to be sent to /var/log/secure.  (To be tried: VERBOSE log level).

Even at the INFO level, the logs are fairly detailed.  Shown below is a sample session, with the client commands on the left and the resulting log entries from the server (carradine, using port 2222 for testing) on the right.  For brevity, the time stamps from the log have been removed after the first entry.

 

SFTP LOGGING at the INFO level
CLIENT COMMANDS SERVER LOG (/var/log/secure)
   
sshfs -p 2222 wbetts@carradine.star.bnl.gov:/home/wbetts/ carradine_home Nov 20 14:30:29 carradine sshd[29120]: Accepted publickey for wbetts from 130.199.60.84 port 41746 ssh2
carradine sshd[29122]: subsystem request for sftp
carradine sftp-server[29123]: session opened for local user wbetts from [130.199.60.84]
ls carradine_home carradine sftp-server[29123]: opendir "/home/wbetts/."
carradine sftp-server[29123]: closedir "/home/wbetts/."
touch carradine_home/test.txt carradine sftp-server[29123]: sent status No such file
carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE,CREATE,EXCL mode 0100664
carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 0
carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE mode 00
carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 0
carradine sftp-server[29123]: set "/home/wbetts/test.txt" modtime 20081120-14:36:36
cat /etc/DOE_banner >> carradine_home/test.txt carradine sftp-server[29123]: open "/home/wbetts/test.txt" flags WRITE mode 00
carradine sftp-server[29123]: close "/home/wbetts/test.txt" bytes read 0 written 1119
rm carradine_home/test.txt carradine sftp-server[29123]: remove name "/home/wbetts/test.txt"
fusermount -u carradine_home/ carradine sftp-server[29123]: session closed for local user wbetts from [130.199.60.84]

 

From these logs, we would appear to have a good record of the who/what/when of sshfs usage.  But the need to build our own openssh packages puts a burden on us to track and install updated openssh versions in a timely fashion, rather than relying on the distribution maintainer and the OS's native update manager(s).  The log files on a heavily utilised server may also become unwieldy and cause a performance degredation, but I've not made any estimates or tests of these issues.

 



Here are the specific relevant packages installed on the client test nodes (stargw1 and stargw2):


fuse-2.7.3-1.SL
fuse-libs-2.7.3-1.SL
fuse-devel-2.7.3-1.SL
fuse-sshfs-2.1-1.SL
kernel-module-fuse-2.6.9-78.0.1.ELsmp-2.7.3-1.SL

(Exact versions should not be terribly important, but it appears that fuse-2.5.3 included up to SL4.6 requires more tweaking after installation than fuse 2.7.3 included in SL4.7).

 

 

Implementing SSL (https) in Tomcat using CA generated certificates

The reason for using a certificate from a CA as opposed to a self-signed  certificate is that the browser gives a warning screen and asks you to except the certificate in the case of a self-signed  certificate. As there already exists a given list of trusted CAs in the browser this step is not needed.
 
The following list of certificates and a key are needed:

/etc/pki/tls/certs/wildcard.star.bnl.gov.Nov.2012.cert – host cert.
/etc/pki/tls/private/wildcard.star.bnl.gov.Nov.2012.key – host key (don’t give this one out)
/etc/pki/tls/certs/GlobalSignIntermediate.crt – intermediate cert.
/etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt –root cert.
/etc/pki/tls/certs/ca-bundle.crt – a big list of many cert.

Concatenate the following certs into one file in this example I call it: Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/GlobalSignIntermediate.crt > Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/GlobalSignRootCA_ExtendedSSL.crt >> Global_plus_Intermediate.crt
cat /etc/pki/tls/certs/ca-bundle.crt >> Global_plus_Intermediate.crt

Run this command. Note that -name tomcat” and -caname root should not be changed to any other value. The command will still work but will fail under tomcat. If it works you will be asked for a password, that password should be set to "changeit".

 openssl pkcs12 -export -in wildcard.star.bnl.gov.Nov.2012.cert -inkey wildcard.star.bnl.gov.Nov.2012.key -out mycert.p12 -name tomcat -CAfile Global_plus_Intermediate.crt -caname root -chain

Test the new p12 output file with this command:

keytool -list -v -storetype pkcs12 -keystore mycert.p12

Note it should say: "Certificate chain length: 3"


In tomcat’s the server.xml file add a connector that looks like this:
 

<Connector port="8443" protocol="HTTP/1.1" SSLEnabled="true"
           maxThreads="150" scheme="https" secure="true"
           keystoreFile="/home/lbhajdu/certs/mycert.p12" keystorePass="changeit"
           keystoreType="PKCS12" clientAuth="false" sslProtocol="TLS"/>


Note the path should be set to the correct path of the certificate.  And the p12 file should only be readable by the Tomcat account because it holds the host key. 

Online Linux pool

March 15, 2012:

THIS PAGE IS OBSOLETE!  It was written as a guide in 2008 for documenting improvements in the online Linux pool, but has not been updated to reflect additional changes to the state of the pool, so not all details are up to date. 

One particular detail to be aware of:  the name of the pool nodes is now onlNN.starp.bnl.gov, where 01<=NN<=14.  The "onllinuxN" names were retired several years ago.

 

Historical page (circa 2008/9):

Online Linux pool for general experiment support needs

 

GOAL: 

Provide a Linux environment for general computing needs in support of the experiemental operations.

HISTORY (as of approximately June 2008):

A pool of 14 nodes, consisting of four different hardware classes (all circa 2001) has been in existence for several years.  For the last three (or more?) years, they have had Scientific Linux 3.x with support for the STAR software environment, along with access to various DAQ and Trigger data sources.  The number of significant users has probably been less than 20, with the heaviest usage related to L2.  User authentication was originally based on an antique NIS server, to which we had imported the RCF accounts and passwords.  Though still alive, we have not kept this NIS information maintained over time.  Over time, local accounts on each node became the norm, though of course this is rather tedious.  Home directories come in three categories:  AFS, NFS on onllinux5, and local home directories on individual nodes.  Again, this gets rather tedious to maintain over time.

There are several "special" nodes to be aware of:

  1. Three of the nodes (onllinux1, 2 and 3) are in the Control Room for direct console login as needed.  (The rest are in the DAQ room.)
  2. onllinux5 has the NFS shared home directories (in /online/users).  (NB.  /online/users is being backed up by the ITD Networker backup system.)
  3. onllinux6 is (was?) used for many online database maintenance scripts (check with Mike DePhillps about this -- we had planned to move these scripts to onldb).
  4. onllinux1 was configured as an NIS slave server, in case the NIS master (starnis01) fails.

 

PLAN:

For the run starting in 2008 (2009?), we are replacing all of these nodes with newer hardware.

The basic hardware specs for the replacement nodes are:

Dual 2.4 GHZ Intel Xeon processors

1GB RAM

2 x 120 GB IDE disks

 

These nodes should be configured with Scientific Linux 4.5 (or 4.6 if we can ensure compatibility with STAR software) and support the STAR software environment.

They should have access to various DAQ and Trigger NFS shares.  Here is a starter list of mounts:

 

Shared DAQ and Trigger resources

SERVER DIRECTORY on SERVER LOCAL MOUNT PONT MOUNT OPTIONS
 evp.starp  /a  /evp/a  ro
 evb01.starp  /a  /evb01/a  ro
 evb01  /b  /evb01/b  ro
 evb01  /c  /evb01/c  ro
 evb01  /d  /evb01/d  ro
 evb02.starp  /a  /evb02/a  ro
 evb02  /b  /evb02/b  ro
 evb02  /c  /evb02/c  ro
 evb02  /d  /evb02/d  ro
 daqman.starp  /RTS  /daq/RTS  ro
 daqman  /data  /daq/data  rw
 daqman  /log  /daq/log  ro
 trgscratch.starp  /data/trgdata  /trg/trgdata  ro
 trgscratch.starp  /data/scalerdata  /trg/scalerdata  ro
 startrg2.starp  /home/startrg/trg/monitor/run9/scalers  /trg/scalermonitor  ro
 online.star  /export  /onlineweb/www  rw

 

 

WISHLIST Items with good progress:

  • <Uniform and easy to maintain user authentication system to replace the current NIS and local account mess.  Either a local LDAP, or a glom onto RCF LDAP seems most feasible> -- An ldap server (onlldap.starp.bnl.gov) has been set-up and the 15 onllinux nodes are authenticating to it *BUT* it is using NIS!
  • <Shared home directories across the nodes with backups> -- onlldap is also hosting the home directories and sharing them via NFS.  EMC Networker is backing up the home directories and Matt A. is recieving the email notifications.
  • <Integration into SSH key management system (mechanism depends upon user authentication method(s) selected).> --  The ldap server has been added to the STAR SSH key management system, and users are able to login to the new onlXX nodes with keys now.
  • <Common configuration management system> -- Webmin is in use.
  • <Ganglia monitoring of the nodes> -- I think this is done...
  • <Osiris monitoring of the nodes> -- I think this is done - Matt A. and Wayne B. are receiveing the notices...

WISHLIST Items still needing significant work:

  • None?

 

SSH Key Management

Overview 

An SSH public key management system has been developed for STAR (see D. Arkhipkin et al 2008 J. Phys.: Conf. Ser. 119 072005), with two primary goals stemming from the heightened cyber-security scrutiny at BNL:

  • Use of two-factor authentication for remote logins
  • Identification and management of remote users accessing our nodes (in particular, the users of "group" accounts which are not tied to one individual) and achieve accountability

A benefit for users also can be seen in the reduction in the number of passwords to remember and type.

 

In purpose, this system is similar to the RCF's key management system, but is somewhat more powerful because of its flexibility in the association of hosts (client systems), user accounts on those clients, and self-service key installation requests.

Here is a typical scenario of the system usage: 

  1. A sysadmin of a machine named FOO creates a user account named "JDOE" and, if not done already, installs the key_services client.
  2. A user account 'JDOE' on host 'FOO' is configured in the Key Management system by a key management administrator.
  3. John Doe uploads (via the web) his or her public ssh key (in openssh format).
  4. John Doe requests (via the web) that his key be added to JDOE's authorized_keys file on FOO.
  5. A key management administrator approves the request, and the key_services client places the key in ~JDOE/.ssh/authorized_keys.

At this point, John Doe has key-based access to JDOE@FOO.  Simple enough?  But wait, there's more!  Now John Doe realizes that he also needs access to the group account named "operator" on host BAR.  Since his key is already in the key management system he has only to request that his key be added to operator@BAR, and voila (subject to administrator approval), he can now login with his key to both JDOE@FOO and operator@BAR.  And if Mr. Doe should leave STAR, then an administrator simply removes him from the system and his keys are removed from both hosts.

Slightly Deeper...

There are three things to keep track of here -- people (and their SSH keys of course), host (client) systems, and user accounts on those hosts:

People want access to specific user accounts at specific hosts.

So the system maintains a list of user accounts for each host system, and a list of people associated with each user account at each host.
(To be clear -- the system does not have any automatic user account detection mechanism at this time -- each desired "user account@host" association has to be added "by hand" by an administrator.)

This Key Management system, as seen by the users (and admins), consists simply of users' web browsers (with https for encryption) and some PHP code on a web server (which we'll call "starkeyw") which inserts uploaded keys and user requests (and administrator's commands) to a backend database (which could be on a different node from the web server if desired). 

Behind the scenes, each host that is participating in the system has a keyservices client installed that runs as a system service.  The keyservices_client periodically (at five minute intervals by default) interacts a different web server (serving different PHP code that we'll call starkeyd).  The backend database is consulted for the list of approved associations and the appropriate keys are downloaded by the client and added to the authorized_keys files accordingly.

In our case, our primary web server at www.star.bnl.gov hosts all the STAR Key Manager (SKM) services (starkeyw and starkeyd via Apache, and a MySQL database), but they could each be on separate servers if desired.

Perhaps a picture will help.  See below for a link to an image labelled "SKMS in pictures".

Deployment Status and Future Plans

We have begun using the Key Management system with several nodes and are seeking to add more (currently on a voluntary basis).  Only RHEL 3/4/5 and Scientific Linux 3/4/5 with i386 and x86_64 kernels have been tested, but there is no reason to believe that the client couldn't be built on other Linux distributions or even Solaris.  We do not anticipate "forcing" this tool onto any detector sub-systems during the 2007 RHIC run, but we do expect it (or something similar) to become mandatory before any future runs.  Please contact one of the admins (Wayne Betts, Jerome Lauret or Mike Dephillips) if you'd like to volunteer or have any questions.

User access is currently based on RCF Kerberos authentication, but may be extended to additional authentication methods (eg., BNL LDAP) if the need arises.

Client RPMs (for some configurations) and SRPM's are available, and some installation details are available here: 

http://www.star.bnl.gov/~dmitry/skd_setup/

An additional related project is the possible implementation of a STAR ssh gateway system (while disallowing direct login to any of our nodes online) - in effect acting much like the current ssh gateway systems role in the SDCC.  Though we have an intended gateway node online (stargw1.starp.bnl.gov, with a spare on hand as well), it's use is not currently required.

 

Anxious to get started? 

Here you go: https://www.star.bnl.gov/starkeyw/ 

You can use your RCF username and Kerberos password to enter.

When uploading keys, use your SSH public keys - they need to be in OpenSSH format. If not, please consult SSH Keys and login to the SDCC.

 
 

STAR Electronic Shiftlog (ESL) Administrator Manual

STAR Electronic Shiftlog (ESL) Administration guide

The STAR (ESL) Electronic Shiftlog is written in JSP (Java server pages) and requires a web server that can render JSP content. Unlike php JSP is compiled into JAVA classes using a method call “Just in Time” this means the page is compiled the first time the page is accessed, then it does not have to be compiled again for the life of the page or until the page is modified. The forbearer of JSP is serverlets these are also used in the shiftlog mostly to stream images. The technology differs in that serverlets need to be compiled in advance of being deployed.

 

Our JSP server is Apache Tomcat. Documentation and newer versions can be downloaded from http://tomcat.apache.org/. Although tomcat is a fully functional web server unto its self we prefer to allow the Apache web server to serve the HTML content and only require Tomcat to serve the JSP pages that Apache can not. This is accomplished by way of the mod_jk Apache Tomcat Connector using the ajp13 protocol. Tomcat hosts on port 8080. This is blocked from the outside but can be seen on a browser started up on the online web server its self.

 

The Tomcat server hosting the shiftlog is deployed on the online web server online.star.bnl.gov and run under the tomcat account. In order to log on to the online web server to administrate Tomcat and the ESL you will need keys mapped to the Tomcat user account. Please see Wayne Betts or Jérôme Lauret about getting your keys mapped. There are multiple version of Tomcat residing in /opt.

 

Conventions relating to install of newer versions of Tomcat on the online web server

All versions of tomcat are placed in the /opt folder, in a sub folder clearly demoting the version number. (When you unzip Tomcat this is usually how it comes.) Examples are:

/opt/apache-tomcat-5.5.20/
/opt/apache-tomcat-6.0.18/


The currently used version of Tomcat is link to /opt/tomcat/. Below is an ls of the tomcat folder:

-bash-3.00$ ls -l /opt/tomcat
lrwxrwxrwx 1 root root 22 Nov 17 11:11 /opt/tomcat -> ./apache-tomcat-6.0.18

Note that this folder is the tomcat’s users home directory. It contains the .ssh folder which holds your keys, so relinking this may cause you to become locked out if you do not transfer this folder in advance.

Configuring Tomcat & The Tomcat Directory Structure

After you install a new version of Tomcat you will want to configure it.

There are some environment variables whose existences you will want to verify, and if they don’t exist you will want to set them, preferably in a start-up script so they will survive a server restart.

$CATALINA_HOME: /opt/tomcat
$JAVA_HOME: /usr/java/default

Inside the Tomcat folder you will find these directories (and some others):

$CATALINA_HOME/bin/
$CATALINA_HOME/logs/
$CATALINA_HOME/webapps/
$CATALINA_HOME/conf/

$CATALINA_HOME/bin/ holds the executables (for linux and windows).

To startup the Tomcat server use:

% $CATALINA_HOME/bin/startup.sh

To shut it down use:

% $CATALINA_HOME/bin/shutdown.sh

You will want to modify the $CATALINA_HOME/bin/catalina.sh this is a script called by startup.sh its function is to invoke the java process which is the Tomcat server.

Directly under the header these lines are added:

# added by Levente Hajdu ##################################### "
export JAVA_OPTS=$JAVA_OPTS" -Xmx512M -Djava.library.path=/usr/lib64 -Djava.awt.headless=true"
############################################################# 

A description of the options used follows

  • -Xmx512M sets the memory ceiling on the JAVA VM which runs the server to 512MB this should be sufficient for our needs. Any more consumption over this limit will lead to the Tomcat process being terminated.

  • -Djava.library.path this sets the library path for an optional set of native (non-JAVA) libraries which Tomcat can utilize for improved performance. If this is not present you will see suggestions to set it in the tomcat log.

  • Djava.awt.headless=true this line prevents a particular type of crash. This server also hosts the SUMS statistics pages. These use libraries (jFreeChart) to render images for display which have a relation to x-server libraries. If Tomcat is started by a user that has X-forwarding enabled but no server running, Tomcat would crash as it tries to execute the JSP without this line present.

You will be spending a lot of time in $CATALINA_HOME/conf/. The file that controls the Tomcat context paths is $CATALINA_HOME/conf/server.xml. This file requires editing when ever software is deployed at a new context path. Before you edit this file always make a backup. Each year of the shiftlog resides on a different context path. Here is the list:

http://online.star.bnl.gov/apps/shiftLog2003/
http://online.star.bnl.gov/apps/shiftLog2004/
http://online.star.bnl.gov/apps/shiftLog2005/
http://online.star.bnl.gov/apps/shiftLog2006/
http://online.star.bnl.gov/apps/shiftLog2007/
http://online.star.bnl.gov/apps/shiftLog2008/
http://online.star.bnl.gov/apps/shiftLog2009/


The current year is always at:

http://online.star.bnl.gov/apps/shiftLog/


If we look inside the $CATALINA_HOME/conf/server.xml file we will see an entry for each one of these paths:

<!--Shiftlog 2007-->
<Context className="org.apache.catalina.core.StandardContext" cachingAllowed="true" 
 charsetMapperClass="org.apache.catalina.util.CharsetMapper" cookies="true" crossContext="false" debug="0" 
 docBase="/var/tomcat/webapps/shiftLog2007.war" mapperClass="org.apache.catalina.core.StandardContextMapper" 
 path="/apps/shiftLog2007" privileged="false" reloadable="true" swallowOutput="false" useNaming="true" 
 wrapperClass="org.apache.catalina.core.StandardWrapper">
<Environment description="" name="year" override="false" type="java.lang.Integer" value="2007"/>
<Environment description="" name="isEditable" override="false" type="java.lang.Boolean" value="false"/>
<Environment description="" name="runLogLink" override="false" type="java.lang.String" 
 value="http://online.star.bnl.gov/RunLog/Summary.php?run="/>
<Environment description="" name="runNumber" override="false" type="java.lang.Integer" value="7"/>
</Context>

This is the block of XML for the shiftlog for 2007. With different versions of Tomcat the syntax of this file can change, however it usually doesn’t change too much. Lets go over the important properties in this block:

docBase – Tomcat supports web archive files (.war). This is basically a zip file with a special internal structure. The explanation of the preparation of one of these files would take a whole Drupal page unto its self.

Path – This is the context path at which the site will appear when you look at it over your web browser. It is the part of the url after the server name.

Environment – The environment sub-tag makes information available to the program. The format if fairly simple, However you have to be careful to set the override="false" or else the .war files ./WEB-INF/web.xml will over write these values with its own values.

The environment properties for the shiftlog are:

year – this is the shiftlog year. Example: “2007”

isEditable – this is a boolean value after the run has completed access to the editor is turned off by setting this to false.

runLogLink – This is the url for the run log. The shiftlog uses this to build links to the run log.

runNumber – this is almost the same as the year it’s just the number. Examples:

run 8 = 2008

run 9 = 2009

run 10 = 2010

The $CATALINA_HOME/webapps/ web apps folder holds the default pages that come pre-packaged with the Tomcat server. This is also the location where Tomcat unpacks the war files. The folder naming conventions can change from Tomcat version to Tomcat version.

The $CATALINA_HOME/logs/ directory, as you may have guessed, holds log files. You will want to look over all files in here even if Tomcat would seem to be functioning correctly. The logs can point out errors you many not be aware of. The file $CATALINA_HOME/webapps/catalina.out holds the stander output stream of your JSPs (not to be confused with the HTML output stream) along with Tomcats own stander output stream, making this a handy file for debugging.

Deploying new war files

To deploy a war file the procedure is as follows:

  1. Stop Tomcat:

    $CATALINA_HOME/bin/shutdown.sh

    NOTE: If you deploy the tomcat administrative web interface shutting down the whole server is not strictly required because you could just shut down the context path, but I prefer to shut down the whole server as a matter of habit because time required is so short no one really notices.
     

  2. If this is an upgrade of an existing .war file (else move to step 3), back up the old .war file. All war files are located in /var/tomcat/webapps/ here is the listing of the directory, note the convention for the naming of the web archive files:

    -bash-3.00$ ls -1 /var/tomcat/webapps/shiftLog*.war
    /var/tomcat/webapps/shiftLog2003.war
    /var/tomcat/webapps/shiftLog2004.war
    /var/tomcat/webapps/shiftLog2005.war
    /var/tomcat/webapps/shiftLog2006.war
    /var/tomcat/webapps/shiftLog2007.war
    /var/tomcat/webapps/shiftLog2008t.war
    /var/tomcat/webapps/shiftLog2008.war
    /var/tomcat/webapps/shiftLog2009.war

    When removing one of these files I move it to the /var/tomcat/webapps/old/ directory and rename it following the convention here:

    shiftLog2007.Apr03.965628000.war
    shiftLog2007.Apr04.288184000.war
    shiftLog2007.Apr09.200079000.war
    shiftLog2007.Dec03.805483000.war
    shiftLog2007.Feb07.785336000.war
    ...
    shiftLog2007.Mar27.875569000.war
    shiftLog2007.Nov09.134343000.war
    shiftLog2007.Nov28.320967000.war
    shiftLog2007.Nov28.657299000.war

    It is important to retain the backup in case there is something wrong with the new .war file, keeping the old one will allow you to roll back whilst the problem is being corrected.

  3. Next copy over the new .war file from the node on which it resides. Scp is the method I use for this. The syntax is:

    % scp [username]@[nodeName]:[Path&File]/var/tomcat/webapps/shiftLog[year].war
  4. If this is a new deploy and not an upgrade of an existing .war file you will have to configure a context path in $CATALINA_HOME/conf/server.xml (else move to step 6)
     

  5. If this is an upgrade you will have to dump (delete) the expanded .war file in $CATALINA_HOME/webapps/ it should be a directory having a name similar to that of the name of the .war file. You do not have to back this up because you already have the .war file backed up.

  6. Startup Tomcat

    % $CATALINA_HOME/bin/startup.sh
  7. Open up a web browser and check that the page displays correctly

  8. Run the shift log Java web start application to confirm that the developer has signed his or her jar files within the .war file, if not you will need to have the .war file rebuilt.

Tips

Because upgrades are done fairly frequently mostly for request for new features and some bug fixes I keep a script to do the upgrade process listed above, however the script requires modification before running it. The name of the script is $CATALINA_HOME/bin/deploy_year .

If you have done the upgrade but do not notice any change:

  1. checked that you dumped $CATALINA_HOME/webapps/ (step 5)

  2. also dump your web browsers cache

If you get the “page unavailable” message, check that the tomcat process is running. Use the command

ps –ef | grep tomcat | grep java 

Even if it is running shut it down and try and restart it again, like an old car Tomcat may not start the first time you try to crank it over.


 

Adding a user account to the ShiftLog Expert Online Remote Editor

STAR experts deemed absolutely essential may request to be placed on the expert editor list to edit the ShiftLog directly via the web interface. The user must provide justification for needing to edit the ShiftLog remotely and provide their Kerberos (RCF) user name.  

 

Administrator Notes:
The Tomcat web server will authenticate the user with Kerberos and Tomcat manages the session. We have written the custom module OnlineTomcatRealm.jar to do the authentication which is configured in $CATALINA_HOME/conf/server.xml.

   ssh tomcat@online.star.bnl.gov

Edit the file  $CATALINA_HOME/conf/tomcat-users.xml

Note: that $CATALINA_HOME may not be defined. However it is wherever Tomcat is installed. In our case this /opt/tomcat   

The file looks like this:

<tomcat-users>
  <role rolename="manager"/>
  <role rolename="logEditor"/>
  <user username="jfaustus" roles="logEditor"/>
  <user username="mephistophilis" roles="logEditor"/>
</tomcat-users>

Add a new user with the username and the roles set to "logEditor.

The restart server:

$CATALINA_HOME/bin/shutdown.sh
$CATALINA_HOME/bin/startup.sh

Check that it works and you’re done.


UPS list

Uninterruptible Power Supplies at the experiment:

 

RackTables OBJECT NAME LOCATION           MODEL                                   BATTERY TYPE LAST BATTERY  CHANGE  DEVICES POWERED                    NOTES
             
UPS7 Control Room, Slow Controls Terminals, floor near south west corner APC SMT1500NC  RBC7 6/2017 (original battery)

sc5.starp.bnl.gov

2 LCDs for sc5

speakers for sc5

Serial #: AS1711333192
Manuf. date: March 2017

black tower

has an AP9631 network interface with an environmental monitor probe (ups7.starp.bnl.gov)

IP: 130.199.60.181

BNL tag: A76077

  Control Room, north of Slow Controls Terminals, floor APC BR1000G

(Back-UPS Pro 1000)
 RBC123

12/2017

08/2021

sc.starp.bnl.gov

2 LCDs for alh.starp

Serial #: 3B1204X18919
Manuf. date: January 2012
bought summer 2012

black "tower"

BNL tag: A073737

self-test can be initiated from the front panel by pressing and holding the power button for 6 seconds.  (Caution - pressing and holding the power button for two seconds (one beep) and releasing it will shutdown the UPS!  For the self-test, keep pressing until the 2nd beep!)

  Control Room, TPC Terminals, console shelf

 APC SMT1500RM2U

(Smart-UPS 1500)

 RBC133
11/2014 (orig. battery put into service)


10/25/2019

12/26/2024
 

chaplin + 2 LCDs

sirius + LCD

Serial#: AS1431232892
Manuf. date: July 2014

Rack-mount

BNL tag: A76065

  Control Room,
TPC terminals

APC DLA1500

(SMART-UPS 1500)

 RBC7 11/08/2018

11/20/2021
gmt-ops + LCD

Serial # AS0736230401
Manuf. date: Sept. 2007

Black

BNL tag: A068061

UPS14
(not in RackTables)

Control Room, trigger systems, countertop

 APC SMT1500C

(Smart-UPS 1500)

 RBC7 original factory battery, May 2022

startrg + LCD
 

Serial #: 352208X11667

Manuf. date: February 2022 (bought summer of 2022)

black

IP: 130.199.60.161

BNL tag: A111242

  Contol Room, magnet terminals, behind the LCD for the Windows PC running magnet monitoring

 APC BR1500LCD

(Back-UPS RS 1500)

 RBC109  1/24/2020 rosas + LCD

Serial #: 3B0935X21952
Manuf. date: August 2009

gray/black

*nominally belongs to CAD* possible contacts are John Pomaro or anyone in Collider-Accelerator Support

self-test can be started by holding the power button for *2* beeps (~2-3 seconds)

BNL tag: A83567

  Control Room, under Shift Leader desk APC BR1000G  RBC123 11/30/2017 (though battery was bought in November 2014)

04/12/2023
shift-leader + 2 LCD

Serial #: 3B1204X18994
Manuf. date: January, 2012

black

BNL tag: A132278

PowerChute Personal edition installed on shift-leader system (cannot be used with PowerChute Business Edition)

self-test can be initiated from the front panel by pressing and holding the power button for 6 seconds.  (Caution - pressing and holding the power button for two seconds (one beep) and releasing it will shutdown the UPS!  For the self-test, keep pressing until the 2nd beep!)

             
UPS1  DAQ Room, L4 and server rack (center row, north end) APC SMT1500RM2U
(Smart-UPS 1500)
RBC133 March 2018 (though battery was bought in November 2015)

September 2022
ovirt2

onldb5 (twice, redundant PS)

new servers in 2022 TBC
Serial #: AS1231125008
Manuf. date: July 2012
black, rack mount

bought December 2012
 
 

DAQ Room, on the floor between DB1 and DB2 (the legacy DAQ and trigger racks - southern end of the middle row)

 APC SMT1500

(Smart-UPS 1500)

 RBC7  Original
Battery (3/2011)

February 2015

November 2021
evp3 (bottom PS),

trgscratch (top PS),

sclrscratch (top PS),

daqlocalmain network switch,

daq-sw2 network switch

trgscratch 12 disk external storage array (bottom PS)
 

Serial #: AS1050221151
Manuf. date: Dec. 2010

black

bought March 2011

UPS13 DAQ Room, northeast corner, floor  APC SMT1500C
 
 (Smart UPS 1500)
 RBC7 October 2021 (original factory battery) stargw3.starp.bnl.gov Serial #: 3S2141X15140

Manuf. date: October 2021

black, bought April 2022

IP: 130.199.60.152

BNL tag: A111250
 
UPS2 DAQ Room, rack DB2 (legacy DAQ rack)
 APC SMT2200RM2U

 (Smart-UPS 2200)
 
 RBC43  November 2014 (original battery)

November 26, 2019

February 26, 2025

evp

trgscratch (bottom PS)

sclrscratch (bottom PS)

trgscratch 12 disk external storage array (top PS)

Serial #: AS1431243644
Manuf. date: July 29, 2014

rack-mount
 
UPS15 DAQ Room floor north of shelves in center row  APC SMT2200RM2uC  RBC43 factory original battery (February 2022)

onldb4 (right PS)

onldb3 (right PS)

satabeast1 (right PS)

Serial #:AS2205260230
Manuf. date: Feb. 2022

black

IP: 130.199.50.29

BNL tag: A132277
 

UPS3 DAQ Room DC3

APC SUA1500RM2U

(Smart-UPS 1500)

 RBC24 Oct. 11, 2016

Dec. 6, 2019
barbados2

softioc4

daq-sw1

Serial #: AS0847123095
Manuf. date: Nov. 2008

black, rack-mount

  DAQ Room DC4 APC SMX1500RM2U with

APC SMX48RMBP2U (external battery)
RBC115

2x RBC115
? various SGIS interlock equipment Serial #: AS1039230480

C-AD equipment
UPS4 DAQ Room, rack DB1, bottom APC SMT2200RM2U  RBC43 July 2019

October 2021

l2ana01 (bottom PS)

l2ana02

PCI extension (for l2ana01)

Serial #: AS1336140512
Manuf. date: Sept. 2013
UPS5 DAQ Room, northern Online Linux Pool rack APC SMT2200RM2U RBC43 November 2014 (original battery)

January 23, 2018

June 7, 2019

January 26, 2023
onl30,

onldb (x2)
Serial #: AS1430241567
Manuf. date: July 22, 2014

<DAQ Room Power Panel>
UPS6 DAQ Room, L4 and server rack (DB8, middle row, north end)

APC DLA1500RM2U

(Smart-UPS 1500)

 RBC24  May 27, 2016

 July 9, 2019

 January 2023

 Feb. 28, 2025

 L4 network switch,

 dbbak (both PS)

 dashboard1 (both PS)


 

Serial #: AS0340212578
Manuf. date: Sept. 2003

black, rack-mount

UPS9 DAQ Room middle row shelves APC SMT2200RM2UTW
(Smart-UPS 2200)
RBC43 (Note that the unit itself says it uses an RBC55, if one navigates through the onboard menu.  This appears to be an error on the part of APC). 05/2017
(original battery)


12/2019

January 30, 2023
satabeast3 (left PS)

onldb3.starp (left PS)

onldb4.starp (right PS)
 
Serial #: AS1645262798
Manuf. date: November 2016
UPS10 DAQ Room, middle row shelves (middle shelf) APC SMC1500-2U  RBC132 12/2015 (original battery)

January 28, 2021
stardns1.starp.bnl.gov

24-disk SAS enclosure for  trgscratch and sclrscratch

onldb2.starp (left PS)
Serial #: AS1539124741

Bought December 2015

to initiate self-test: push + hold Mute,then press Display for 2 seconds
  DAQ Room,
bottom of rack "DB9" (center row, north end)
APC SMT2200RM2U  RBC43 5/2017 (original battery, installed at factory 11/2016)

01/2021?

08/2021

06/2022 (disturbingly frequent replacement intervals in this unit...)
cephnfs2 (left PS)

dbbak (top PS)

onlhome (top PS)

stargw2 (in rack DB8)

cephmon01 in rack DB8 (right PS)

cephmon02 in rack DB8 (right PS)

onlpool-s60-01 and onlpool-s60-02 (via a shared extension cord)
Serial #:AS1645260493
Manuf. date: November 3, 2016 (bought May 2017)

2U rack-mount
 
BNL tag: A76075
  DAQ Room, NW corner networking rack

APC SMX2000LV with
2x SMX120BP

 RBC143 October 2020? Various networking equipment

Serial #: AS1913351834

Manuf. date: Mar 2019

rack-mount

*belongs to ITD*

             
UPS16  WAH 1A9 APC SMT1500RM2UC  RBC159 2/2023 (factory original battery)

NPSlaser.starp (Remote power switch for TPC laser PC, though the PC is NOT plugged into it, only a "picomotor multi-axis driver")

Serial #: 3S2205X11933

Manuf. date: Feb. 2022

rack-mount, black

BNL tag: A132276

has a network interface (TBC, and to-be-configured)

   WAH 1B1 APC SMT1500RM2U (Smart-UPS 1500)  RBC133 10/2016 (original battery)

10/2021
tofcontrol

TOF USB hub

Serial #: AS1617143314
Manuf. date: April 2016, bought September 2016

rack-mount, black

  WAH 1C4 APC SMT1500RM2U  RBC133 11/2018 netpower1.starp.bnl.gov (with networking equipment in 1C4)

netpower2.starp.bnl.gov (with networking equipment in 1C4) (This could be moved back to UPS11 at a "convenient power outage".)
Serial #: AS1243245039
black, 2U rack-mount
Manuf. date: October 2012

bought in January 2013
UPS11 WAH 1C4 APC SMT2200RM2U  RBC43 12/2014 (original battery, installed at factory 8/2014)

November 2021

January 10, 2025
netpower2.starp.bnl.gov (with networking equipment in 1C4)

netwpower2 was moved to the other UPS in this rack in 2024 (?) but could be moved back at if there is a "convenient" opportunity to do so.
Serial #:AS1435142781
Manuf. date: July 28, 2014 (bought December 2014)

rack mount

Has overheated and shutdown while in service in the DAQ Room during AC failures (with ambient room temperatures above 90 F (reaching 100 at times)).  So while it seems to be an otherwise reliable unit, it should not be used in an environment where the temperatures may have such uncomfortably high temperatures, nor in the immediate vicinity of other especially warm equipment.
   WAH 2A3

APC SMX1500RM2U

(Smart-UPS 1500)

with external battery pack

 RBC115? unknown gas leak detection systems in 2A2 and possibly C-AD interlock equipment in 2A1

Serial #: AS1039230484
Manuf. date: Sept. 2010

rack-mount, black

battery pack Serial #: QS1002251184
 
*C-AD equipment?*

  WAH 2A9 APC SMT1500RM2U (Smart-UPS 1500)  RBC133 original battery (07/2014)

12/2020

Jan 30, 2023

grant (Wiener/VME) Serial #: AS143611346
Manuf. date: Sept. 2014
black, rack mount
bought March 2015
   WAH 2A9 APC SMT1500RM2U (Smart-UPS 1500)  RBC133 April 2018 TPC interlock distribution panel

surge suppressor with:
-cooling water flow
   meters
-scserv
-2x interlocks 
     equipment in 2A8
Serial #: AS1243245306
Manuf. date: October 2012
black, rack mount
bought ~Dec. 2012
  (in Bldg. 510 when last seen, previously was in the WAH on the floor under the east stairs to RHIC tunnel)

APC BE750G

(Back-UPS ES750)

 RBC17 original battery from ~fall 2010??? nothing when last seen in the WAH (checked 11/20/2015)

Serial #: 5B1039T74854
Manuf. date: Sept. 2010

black

no self-test option

  WAH North Platform, 1st floor west APC SMT1500RM2U (Smart-UPS 1500)  RBC133  01/2019 north-nps1 (and thus all networking equipment on the north platform) Serial #: AS1144220012
Manuf. date: October 2011

rack-mount, black

  AB, near the GMR  PowerWare    fall 2012?
(Batteries likely were replaced at some point after that under a service contract, but details are unclear (handled by STSG))


February 2025 (nearly the whole unit was replaced)
 gas system equipment  This is a large UPS for circuits in the Gas Mixing Room, under the care of the STSG group.

IP: gmr-ups.starp.bnl.gov
BNL property tag 145850
bought in fall 2012
  AB, mezzanine top floor (northeast corner)  Mitsubishi UP7011A   November 20, 2019 unknown CAD equipment, definitely not STAR's responsibility

labelled "1006 UPS1"

serial port is connected to a Ethernet console server, 130.199.41.64

installed January 2015

Contacts are John Mingoia and Anh Pham

 

This list is maintained as information is made available and is sporadically checked for correctness.  The maintainer of this list is often not informed when STSG adds, removes or replaces UPSes and batteries.  Furthermore, anyone may remove or add equipment to UPSes without informing the maintainer of this list.

 

Spare batteries on hand: 

In a cabinet in the DAQ Room (APC RBC numbers):

7: January 2023 (2 of them)
55: October 2022
109: March 2020
132: November 2021
141: October 2020
 
(STSG / electronics techs may have additional spares in the Building 510 labs)
 

Windows XP EOL overview

Microsoft support for Windows XP will end on April 14, 2014.  Lab and DOE cybersecurity policy (as well as general best practice) prohibit the use of unsupported operating systems.  This page will serve as an overview on STAR's migration away from Windows XP; specific details per machine (or subsystem) will generally be kept in the associated RT tickets. 

9/25/2013 note:  I have acquired 5 used Dell desktop machines with Vista license keys as potential replacements for some of the machines listed below.  All have 4GB or more of RAM and single 160GB SATA disks.  From the list below, deneb2.starp and conference.star in particular are good candidates for replacement with these machines; possibly videopc as well if the video capture card can be put into one of them.  Others TBD.


XP systems in the SDAS enclave:

HOSTNAME SUBSYSTEM PRIMARY CONTACT RT TICKET (if any) NOTES and EXPECTED RESOLUTION PATH
autueil.starp S&C Wayne Betts 2690 Replace with a Windows 7 machine currently named madison in 1006C
shift-leader.starp ops   2689 Dell says this model (Optiplex 745) has been successfully "Tested for Basic Windows 7 Functionality" and the Windows 7 upgrade advisor tool from MS indicates no significant problems. 

Nonetheless, the plan is to replace this system with a Dell Optiplex 990 (BNL barcode 151457) currently in 510/1-179 (Windows 7). 
tpcgas.starp and its backup machine TPC Jim Thomas 2626 Two new computers are online now as tpcgas1 and tpcgas2.  Peter Kravtsov completed one, the other needs additional configuration, for which Peter provided instructions, but they cannot be completed without swapping hardware, so backup machine is not "perfect" backup yet.)
chaplin-run09, astaire-run09, sirius-run09 TPC Jim Thomas   - Moving to Linux has been discussed numerous times and is still a possibility; the primary hold-up is the TPC Alarm Handler, which is currently a Windows application.  Without a replacement for it within Linux, the assumption has been that at least one Windows machine will need to be available, but in discussing with Alexei, it seems this TPC Alarm Handler is redundant with Slow Controls's STAR Alarm Handler, so may not be necessary after all.  (resolution TBD)
- One more note, discussing this with Alexei and Jim, we all generally seem to agree that they don't need 3 computers (that was a luxury afforded to them in the early days when the Control Room wasn't so crowded) - 2 would suffice.
- Nov. 21 update (WB): It turns out these computers were bought with Vista licenses.  Upgrading in place is a *painfully* lengthy process, but I am attempting it on astaire (with a fallback disk with the XP installation just in case).
- Nov 25 update (WB):  Alexei and Jim have definitively approved a Linux trial.  The astaire PC will have replacement disks installed and a fresh Linux installation (SL 6.4).  Testing of TPC usage is expected to be quick - once approved, will proceed with Linux installation on chaplin.  They request to keep sirius while they try to migrate the TPC alarm handler to Linux (seeking source code from Peter Kravtsov) - if successful, will eliminate sirius, otherwise will proceed with attempted upgrade to Vista.
- Jan 10 update (WB): astaire had Linux installed 3-4 weeks ago and TPC MEDM screens shown made to work nicely after some font adjustments.  Approval to proceed with chaplin (keeping the original disks on stand-by).  Also, the TPC alarm handler (currently "assigned" to sirius) was demonstrated to run fine using Wine on a Sc.Linux 6 machine, so that no longer seems to be a hold-up - simply compy over the Alarms folder, make some fairly obvious path adjustments and firewall openings and it works.

Final disposition:  chaplin and astaire have Sc.Linux installations on them.  sirius still has Windows XP, but is only on a small private network for use with the WAH video and TPC laser systems.
tofgas.starp TOF   2627 Was replaced during Peter Kravtsov's visit in December, 2013.
deneb2.starp general use on South Platform   2680 Replaced with one of the recovered Vista machines. 
Does not need much; does not play a direct role in STAR data-taking; just used during maintenance days as a terminal and web browser.
fmsled FMS Steve Trentalange   a laptop in the Wide Angle Hall - not sure if there is a compelling reason for it to be a laptop going forward, but if so desired, we have available a Sony VIAO with a Vista key (barcode 136278); in any case, it does not need much computing power.  FMS is not expected to be present in the 2014 run, so this is a relatively low priority.  Steve expressed a preference for Windows 7 over Windows Vista, but I doubt it will make any difference, other than possibly giving a longer potential lifetime to the replacement.
MP 11/22: The Sony VIAO machine has a Windows Vista installation on it. All necessary BNL configurations have been made.
1/10/2014 (WB): Unfortunately, the original fmsled laptop has a serious hardware problem and will not boot at all.  Hopefully the disk can be recovered, though that is complicated somewhat by having PGP WDE.


Final disposition:  System is removed from the WAH and the network.  Steve T. says there is nothing critival to recover from it.
hoosier BEMC Steve Trentalange/ Oleg Tsai 2770 WB: 10/15 - Win 7 upgrade advisor says ok for both 32-bit and 64-bit Win 7 installations.
JL: 11/22, assigned to MP
MP: A Dell precision desktop has been allocated for use to replace the old hoosier machine. The machine has been brought up to date and is ready for use. Steve needs to test an HV device on the old machine to ensure that it works. Once he gives the go ahead we will switch over to the new machine. The switch over will hopefully take place during the week of 1/13/14.
Mp: 1/27/14 - The replacement machine has been put in place. LabVIEW 2013 evaluation has been installed for the time being and Steve's VI worked on LabVIEW 2013 on the new Windows 7 machine. The new machine has been put in place, we now just need to get licenses for a legitimate version of LabVIEW and the machine should be finished.
MP: 4/17/14 - LabVIEW 2013 has been purchased and installed on the machine. The Windows XP Machine has been disconnect and is no longer in use.
emcsc / backup emcsc BEMC Steve Trentalange/ Oleg Tsai   WB: 10/17 - Win 7 upgrade advisor says it needs more RAM (currently only 512MB; 1GB min for 32-bit Win 7), and does not know about the compatibility of the National Instruments RS-485 adapter card.  Meanwhile, there is a newer computer (unfortunately also with Win XP) available that was configured 1-2 years ago as a backup for emcsc (including LabVIEW 6.1 and an RS-485 adapter) but it has been sitting unused since then.  Steve has suggested we try putting Windows 7 on the backup machine as a test, and if it works, put it into production.
WB: 1/10:  tested the old PCI 232/485 card in a Windows 7 machine, and was able to download drivers from National Instruments that allow the ports to be recognized, so this might not be a show stopper.  Also, found NI's LabView version compatibiltiy chart and it indicates that LabView 2013 should be able to open VI's saved in version 6.1, so this too is looking positive.  We need to get a version (possibly a trial version?) of the latest LabView to try this out.
Mp: 4/17/14: A Windows 7 machine was delegated for replacement of the emcsc machine. The trial version of LabVIEW 2013 was installed along with the old PCI 232/485 card. The problem was that the LabVIEW 6 code was too old to run on LabVIEW 2013. The .vi would not run properly. I had a LabVIEW technical rep come out to the lab multiple times in order to troubleshoot the issue and the conclusion was that the old code would need to be revamped in order to run under LabVIEW 2013. Fortunately, in order for the emcsc machine to operate, it does not need a network connection (only the NI COM card). The XP machine has been deregistered and disconnected from the network, and will continued to be used until time allows for the LabVIEW code to be updated.
videopc  ops  Alexei Lebedev   Have to evaluate the compatibility of the video capture card (and its software) with Windows Vista/7
WB: 1/10 - having looked into this, I thought it would be impossible, but Alexei informed me today that Andrei Brandin will be at BNL for the collaboration meeting in February, and he thinks he can make the current system work under Windows 7. But if not, we will move this machine to a small private network shared with the TPC Laser system control PCs.

 
Final disposition:  Andrei B. made no progess (or even any effort?) on his visit.  The system still has Windows XP, but is only on a private network now.
 pp2pp-slow  PP2PP     Originally overlooked because it is not on a "star" subnet (it is 130.199.90.72), and the PP2PP subsystem has been inactive for some time.  This is 9.5 year old Dell Pentium 4 system, so not likely a good candidate for Windows 7 or Vista, though it meets the minimum requirements.
MP 2/25: After speaking with Wlodek Guryn and Kin Yip, this machine will not be used for Run14. The machine has been removed from the Control Room by one of Wlodek's guys and will be worked on off of the network. A PP2PP machine will been needed for next year, a replacement machine will need to be purchased and setup down the road.


STAR XP systems outside of subnet 60 (starp/SDAS):

SYSTEM NAME CONTACT/PRIMARY USER LOCATION RT TICKET 
(if any)
RESOLUTION PLAN/SUMMARY 
JML.STAR.BNL.GOV Jeff Landgraf 510/1-184 2677 Have discussed with Jeff - a new PC was ordered (expected to arrive by end of November).
MP: The new PC has come and it all setup for BNL use. Jeff's profile has been setup.
Bugrhoff (DHCP client) Wayne Betts 510/1-179   old laptop - phased out in favor of newer one already in use
DBEAVISDT.STAR.BNL.GOV Dana Beavis 510/1-169   JL: 09/27 - Ambiguity on group
WB: computer has been moved to a C-AD building.  MAC reg., IP address and domain group are no longer associated with STAR

 
BCHRISTIE.STAR.BNL.GOV Bill Christie 510/1-180 2691 JL: 09/27 - Update OK in the coming months if possible, suggest 7 (need to check)
MP: 10/4 I ran the Win 7 Upgrade Advisor. The machines hardware and software is compatable with Win 7 (currently has Win XP 32-bit)
KEATON2.STAR.BNL.GOV Victor Perevoztchikov 510/1-165 2720 JL: 09/27 - Machine could be replaced by a Linux node (preferred)
JL: 11/22, assigned to MP (new node needs to be purchased)
MP: 12/5, A Dell Precision T3610 has been ordered. The machine supports RHEL and will be setup accordingly.
MP: 2/28, The machine has been replaced with the T3610 setup with Scientific Linux 6. The old machine will be retired.
MONROE2.STAR.BNL.GOV Lidia Didenko 510/1-173 2695 possible to upgrade to Vista? (a license key is on the case)
JL: 09/27 - Update OK, is Win 7 possible? Worried of CERT being messed up (saved in IE)
MP: 10/4 I ran the Win 7 Upgrade Advisor. The machine's hardware and software is compatable with Win 7 (currently has Win XP 64-bit)
MP: 11/20 The machine has been upgraded to Windows 7. Refer to ticket # 2695
BANCROFT.STAR.BNL.GOV nobody 1006C   WB: 10/18 - old machine has been pulled from service (it existed solely to operate an old SCSI scanner, which has also been retired)
CONFERENCE.STAR.BNL.GOV   1006C 2687 WB: 10/17 - Vista has been installed on a machine from the Equipment Pool, and the original conference PC has been shut down. 
GRANT.STAR.BNL.GOV John Hammond 901   This is a file server for the electronics support group.  It is largely up to John to move the shared content to a different server to retire this one.
JL: 11/22, assigned to MP
MP: 12/5, I spoke with John, he stated that he has a Windows 7 machine and will be moving the file server to that node himself. I will be in contact with him to record when the XP machine has been taken off the network.
MP: 2/28, I spoke with John this week, he stated that the GRANT machine is still on the network but he will be taking it off at the end of this month. He has a Windows 7 machine to replace the XP machine, just needs to do the switch over.
WB: 4/18/2014:  John was copying the final directories to the replacement today and expects to turn grant off on Monday, 4/21.
PADRAZO1.STAR.BNL.GOV John Hammond 901   WB: 10/17 - John purchased and installed Windows 7 for this system on a new disk and Athena T. will start using it.  The original disk has been put aside in case any files from Ken Asselta turn out to be needed.
PKUCZEWSKIDT.PHY.BNL.GOV Phil Kuczewski 901   MP: 2/25, I sent an email to Phil on 1/2/14 regarding his Windows XP Machines. I never recieved a reply.
PKLAPTOP1.STAR.BNL.GOV Phil Kuczewski 901   (laptop)
MP: 2/25, I sent an email to Phil on 1/2/14 regarding his Windows XP Machines. I never recieved a reply.
DAGOSTINOC.STAR.BNL.GOV John Hammond
Alex Tkatchev

 
901   WB: 10/17 - This is about 4 years old and has a Windows Vista product sticker, but the current plan is to make a fresh Linux installation and let Alex Tkatchev use the system for trigger-related development work.  The original disk has been removed and a new one installed for the Linux installation.
PO-143966.STAR.BNL.GOV Alex Tkatchev 901   WB: 10/15 - This is about 4 years old and has a Win 7 product sticker on it. 
WB: 2/28/14: If a fresh Win 7 install is made, I suggest adding a second disk (if it doesn't already have two) and making a RAID 1 array if possible.



Others of possible concern:

 SYSTEM NAME    LOCATION    NOTE
STAR-UTILITIES.STAR.BNL.GOV (on a C-AD network)   STAR Control Room   runs software provided by C-AD.  Used for STAR WAH video camera system control and monitoring.  We should move the components related to the video system to a starp machine in any case - there's no reason to be crossing subnets and firewalls for this.  11/15/2013 (WB):  no longer required for use with STAR video system.
ROSAS.STAR.BNL.GOV (on a C-AD network)   STAR Control Room   runs software provided by C-AD.


The prohibition on unsupported operating systems is typically only enforced for computers connected to the campus-wide LAN, though variances are possible.  Stand-alone systems and those on local networks do not typically come under scrutiny (in part because they are hard to detect and in part because they pose much less overall risk).