Operations
OSC is experiencing a very high support load
OSC is currently experiencing a very high customer ticket load. Please allow extra time for analysts to get to your issue. We apologize for any inconvenience. Thank you for your continued support of OSC.
Systemic Problem on Cluster Computing service
4:20PM 6/23/2017 Update: All HPC systems are back in production. This outage may cause failures of users' jobs. We'll update the community as more is known.
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
3:40PM 6/23/2017 Update: All HPC systems are back in production except for scratch service (/fs/scratch
). This outage may cause failures of users' jobs. We'll update the community as more is known.
All HPC systems are available
8/24/16 3:57PM: All HPC systems are availalbe including:
- Oakley cluster for general access
- Ruby cluster for restricted access
- Owens cluster for early users
- Home directory and scratch file systems
- OnDemand and other web portals
- Project file system (/fs/project)
All jobs held before downtime have been released by the batch scheduler. If your jobs are still held or you have any questions, please contact oschelp@osc.edu
Some issues remain after downtime
15 July 2016, 5:00PM update: some additional issues we are facing
- We are experiencing periodic hangs of the GPFS client file system software used with the new storage environment. We have an open support case with the vendor, but no solution at this time. This may affect access to the /fs/project, and /fs/scratch file systems. Reports of transfer failures to these file systems through scp.osc.edu, and sftp.osc.edu have been reported.
-
Symlinks transfered from /nfs/gpfs to /fs/project are lost(fixed)
June 7th downtime to finish at 6:30PM
Update: Downtime completed at 6:30PM, June 7th.
The June 7th downtime is now slated to be completed at 6:30PM. Previous estimate was 5PM.
All systems and services will continue to be unavailable until that time.
Thank you for your cooperation.
Oakley login node instability
Oakley login nodes are seeing some instability related to Lustre. We will reboot the nodes on Thursday, October 2nd 2014 to resolve the issue. If a login node crashes before then and we have the fix ready, we will apply it then rather than wait until the scheduled maintenence window.
Ruby is offline
The Ruby Transitional Cluster (only open to select research groups) is currently offline due to network problems. We expect it will return to service some time after the downtime.
16 core nodes on Glenn temporarily unavailable
This issue has been resolved. The 16-core nodes are online.
---------------------------------------------------------------------------------------
16 core nodes on Glenn are currently offline due to side issues generated during the downtime. System administrators are working on a fix.
2/26 Downtime Difficulties
All systems should be functioning normally. Please report any remaining issues to OSC Help.
----
A number of systems are still experiencing problems after yesterday's downtime. Currently, the following systems are still offline:
Oakley(returned to service)ARMSTRONG(returned to service)license server(returned to service)proj11(returned to service)proj12(returned to service)