May 23 Downtime Changes

Tuesday, May 23, 2017 - 8:00pm
Cluster Computing
  • Batch system changes
  • Add 'pfsdir' as a node attribute to allow creation/removal of PFSDIR for jobs that request the attribute. New location of PFSDIR is /fs/scratch/USER/JOBID. JOBID includes the job number, and batch server name.
  • Update submit filter to be Python 3 safe.
  • Reorder Oakley batch queue routing so jobs for the large memory nodes are put into the correct queue.
  • Restructure GPU reservation on Ruby to segregate 'vis' jobs onto dedicated nodes
  • License server changes
  • license[1,4] servers turned off.
  • Legacy Intel license server on license2 is no longer available (statewide users were to migrate to license5 long ago.)
  • Partek license server on license2 is no longer available 
Research Data Storage
  • Project & Scratch
  • Perform file system check ('mmfsck') on both project, and scratch file systems
  • Enable DMAPI support for future use with IBM SOBAR backup scheme, and eventual user accessible archive to tape
  • Change tuning parameters per recommendations from DDN
  • Upgrade GPFS to 4.2.2-3 on protocol, and IME servers
  • Home directories
  • Enable >16 group support for /users/ NFS mounts.
  • Still do not have support for >16 groups on /, /usr/local/, etc. This is due to a problem with setuid failing that could not be solved before the downtime.