Emergency UPS Maintenance
A UPS in the data center requires some emergency maintenance to be undertaken at 2PM on Oct 11 2023. There is a very small chance that parts of Owens and of the C18 Pitzer nodes may lose power as a result.
A UPS in the data center requires some emergency maintenance to be undertaken at 2PM on Oct 11 2023. There is a very small chance that parts of Owens and of the C18 Pitzer nodes may lose power as a result.
We will have rolling reboots of Owens and Pitzer cluster, including login and compute nodes, starting from 9am on August 18, 2021. The rolling reboot is for urgent security updates.
The rolling reboots won't affect any running jobs, but users may experience longer queue wait time than usual on the cluster. User will also expect about a 10 minute outage of login nodes during the reboot of login nodes. If there are interactive jobs started from a login node and that login node is rebooted, then the job will be killed.
Please do not run any Jupyter applications at OSC until further notice due to a security vulnerability.
OSC will update JupyterLab and Jupyter Notebook applications to rectify this as soon as possible.
List of versions changed:
References for more information:
Update on 14 April 2020, 0903:
Work is completed.
Original message:
There will be maintenance on cluster export services on Tuesday, April 14 from 8:00am to 10:00am. The following services will be affected by this:
We will have a reboot of the NetApp as part of an upgrade, starting from 9:30 AM on Monday, November 19, 2018.
1:40PM 4/27/2017 Update: Rolling reboots are completed.
3:10PM 4/18/2017 Update: Rolling reboots on Owens have started to address GPFS errors occured late Friday.
Rolling reboot of Owens, Oakley, and Ruby clusters is scheduled to start from Wednesday morning, April 19, 2017. Highlights of the rolling reboot activities:
Update: Downtime completed at 6:30PM, June 7th.
The June 7th downtime is now slated to be completed at 6:30PM. Previous estimate was 5PM.
All systems and services will continue to be unavailable until that time.
Thank you for your cooperation.
Day One of the scheduled downtime has been completed, and HPC operations have resumed. As planned, Lustre work will extend into Day Two. Jobs using /fs/lustre or $PFSDIR cannot run until this work is completed, but all other jobs can run.
UPDATE: Performance problems with Lustre have prevented us from bringing up the filesystem. We are working on a resolution.
UPDATE: Lustre returned to service the afternoon of July 12th, 2014.