login

Oakley login nodes and ruby02 will not be accessible between 9:00-9:30am on 10/18/2016

We upgraded to RHEL 6.8 for both Oakley and Ruby clusters during the October 12th's downtime. Unfortunately, we are noticing some NFS problem that has been causing rsh, or ssh sessions to hang on Oakley and Ruby. To resolve this issue, we've downgraded the kernel version to one that is not exhibiting the NFS regression, and started to reboot compute nodes on Oakley and Ruby. It won't affect any running jobs, but users may experience longer queue wait time on Oakley and Ruby.  

Lustre bug causing Oakley login node crashes

Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.  The bug (or issue otherwise) seems to be activated when a user does operations on a lustre directory that contains an excessive number of files (10000+ files).  

Our support contacts have been contacted and we are working with them to resolve this issue.  Updates will be posted both here.

Pages