Emergency maintenance in OSC’s data center Feb 10 2022
OSC will shut down significant portions of the Owens and Pitzer clusters for several hours this afternoon (Thursday, Feb. 10).
OSC will shut down significant portions of the Owens and Pitzer clusters for several hours this afternoon (Thursday, Feb. 10).
You might encounter an error while pulling a large Docker image:
ERROR: toomanyrequests: Too Many Requests.
or
We found mpiexec
/mpirun
from OpenMPI can not be used in an interactive session (launched by sinteractive
) after upgrading Pitzer and Owens to Slurm 20.11.4. Please use srun
only while you use OpenMPI in an interactive session.
Updated on Feb 25:
StarCCM license outage is restored.
Original post:
OSC's starccm software license will expire at 12 a.m., Sunday, Feb 21, 2021, making the software unavailable until the license is renewed.
Updated on March 2:
This is completed.
Original Post:
We will have rolling reboots of Owens cluster including login and compute nodes, starting from 9AM Feb 18, 2021. The rolling reboot is to update BIOS for urgent security updates. The rolling reboots won't affect any running jobs, but users may experience longer queue wait time than usual on the cluster. User will also expect a ~10 minute outage of login nodes during the reboot of login nodes.
A partial-node MPI job may fail to start using mpiexec
from intelmpi/2019.3
and intelmpi/2019.7
with error messages like
OSC is currently experiencing problems with its internal network. Interactive sessions may be slow or unresponsive, but running jobs should not be affected.
Users would encounter a MPI job failed with openmpi/3.1.0-hpcx
on Owens and Pitzer. The job would stop with the error like "There are not enough slots available in the system to satisfy the slots". Please switch to openmpi/3.1.4-hpcx
. The buggy version openmpi/3.1.0-hpcx
will be removed on August 18 2020.
==========
Resolved: We removed openmpi/3.1.0-hpcx
on August 18 2020.
The CUDA debugger, cuda-gdb, can raise a segmentation fault immediately upon execution. A workaround before executing cuda-gdb is to unload the xalt module, e.g.:
module unload xalt
This issue affects most cuda modules on Pitzer and Owens.
Users may encoutner an error like 'libim_client.so: undefined reference to `uuid_unparse@UUID_1.0' while compiling MPI applications with mvapich2 in some Conda enivronments. We found pre-installed libuuid package from Conda conflicting with system libuuid libraries. The affected Conda packages are python/2.7-conda5.2
, python/3.6-conda5.2
and python/3.7-2019.10
.