Owens

XDMOD JOB METRICS TOOL NOW AVAILABLE

XDMoD, which stands for XD Metrics on Demand, is an NSF-funded open source tool that provides a wide range of metrics pertaining to resource utilization and performance of high-performance computing (HPC) resources, and the impact these resources have in terms of scholarship and research. Access it at xdmod.osc.edu and log in with your OSC credentials. For more information and details on how to use this tool, see our documentation: https://bit.ly/2EhT3EI

SCARCE RESOURCES - GPU AND LARGE MEMORY NODES

Please be aware that the GPU and large memory resources on Owens and Pitzer are very busy and that can lead to long queue wait times. If you do not need these resources for your jobs, using the standard compute nodes will ensure your jobs start sooner and the scarcer resources are available for those whose jobs require them. If you are unsure whether you need these resources for your work, please contact us at oschelp@osc.edu

DOWNTIME FOR ALL CLUSTERS ON FEBRUARY 5, 2019

A downtime for all HPC systems is scheduled from 7 a.m. to 5 p.m., Tuesday, Feb 5, 2019. The downtime will affect the Pitzer, Ruby and Owens Clusters, web portals and HPC file servers. Login services, including my.osc.edu, and access to storage will not be available during this time. In preparation for the downtime, the batch scheduler will begin holding jobs that cannot be completed before 7 a.m., Feb. 5, 2019.

Owens switch replacements

OSC will replace the Ethernet switches in the Owens cluster starting from Dec 14. We do not expect any user-visible impacts from the work. Owens will have slightly reduced capacity when we temporarily shut down 2 or 3 racks on the day of the replacement. See here for more info: https://bit.ly/2Qkq0ct

Pitzer Production Deployment December 4

Pitzer, OSC's latest cluster, will be deployed to full production status on Tuesday, December 4. All users will have access to the cluster and will be able to submit jobs. For details on how to modify your jobs to run on Pitzer, please see https://bit.ly/2P7G4Zz For general information about the new cluster, please visit osc.edu and see our Cluster Computing pages. If you have any questions, please contact OSC help https://bit.ly/29AXmdf

Services have been restored after switch failure

At about 1:50 am on November 14th, 4:05 am on November 17th, and 5:00 am on November 18th, OSC experienced three separate major switch failures. We restored all the services after each outage, and have completed the update to the NetApp appliance that provides the home directory service to address a separate bug triggered by the outage. We are still working with the vendor for the network switches on a permanent resolution to the bug that has caused these interruptions. We will continue to keep you informed.

Pages