Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.

Jobs requesting Lustre are rejected by qsub

Date: 
Friday, August 15, 2014 - 11:30pm
Supercomputer: 

On both production clusters, we have begun rejecting jobs that request the use of Lustre or $PFSDIR in order to reduce job failures caused by the triggering of a bug that crashes the filesystem. If you have a GPFS allocation, you may if appropriate want to move your data there in order to maintain productivity while the Lustre service is in a degraded state.