Over the past two weeks we have experienced Oakely login node crashes potentially caused by a Lustre bug.
The maximum job size on Glenn has been reduced from 256 nodes (2048 cores) to 128 nodes (1024 cores). User and group limits are still at 2048 cores total.
We have updated the default module of ANSYS on both Oakley and Glenn. The default module of ANSYS is ANSYS 14.5.7 on Oakley and ANSYS 14.5 on Glenn. For example, if you use
module load ansys on Oakley, ANSYS 14.5.7 (instead of ANSYS 13.0) is loaded. We however keep the older versions of ANSYS installed on our cluster.
We have updated the default module of FLUENT on both Oakley and Glenn. The default module of FLUENT is FLUENT15.0.7 on Oakley and FLUENT 14.5 on Glenn. For example, if you use
module load fluent on Oakley, FLUENT 15.0.7 (instead of FLUENT 13.0) is loaded. We however keep the older versions of FLUENT installed on our cluster.
222 compute nodes on Glenn have been removed from service to begin preparing for the arrival of Ruby compute nodes. There have been no other associated system changes. Jobs on Glenn may see more frequent waits in the queue than they have in the past due to a reduction of available resources, depending on scheduler load.
Amber 14 has been installed on Oakley and Glenn; usage is via the modules amber/14 on Oakley and amber14 on Glenn. For information on available executables and installation details see the software page for Amber or the output of the respective module help command, e.g.: module help amber/14
Intel compiler licenses have been updated. This should be invisible on the HPC systems. If you are a statewide user of the license, please set the
LM_LICENSE_FILE environment variable to
firstname.lastname@example.org before reporting issues.
On both production clusters, we have begun rejecting jobs that request the use of Lustre or $PFSDIR in order to reduce job failures caused by the triggering of a bug that crashes the filesystem. If you have a GPFS allocation, you may if appropriate want to move your data there in order to maintain productivity while the Lustre service is in a degraded state.