Pitzer

Slurm Migration Issues

This page documents the known issues for migrating jobs from Torque to Slurm.

$PBS_NODEFILE and $SLURM_JOB_NODELIST

Please be aware that $PBS_NODEFILE is a file while $SLURM_JOB_NODELIST is a string variable. 

The analog on Slurm to cat $PBS_NODEFILE is srun hostname | sort -n 

Environment variables are not evaluated in job script directives

Environment variables do not work in a slurm directive inside a job script.

 
1 Start 2 Complete

Please report the problem here when you use Slurm

CAPTCHA
This question is for testing whether you are a human visitor and to prevent automated spam submissions.

Pitzer compute unavailable between 7am Aug 18 and noon Aug 20, 2020

A downtime for all OSC HPC systems is scheduled from 7 a.m. to 9 p.m., Tuesday, August 18, 2020. Pitzer login nodes will be available at the end of the normal downtime window. However, all compute nodes of Pitzer cluster will be unavailable through noon on August 20, 2020 to allow for cooling changes for the Pitzer expansion. To stay up to date on system notices, follow @HPCNotices on Twitter. As always, you can contact us at OSC Help.

Slurm Migration

Overview

Slurm, which stands for Simple Linux Utility for Resource Management, is a widely used open-source HPC resource management and scheduling system that originated at Lawrence Livermore National Laboratory.

It is decided that OSC will be implementing Slurm for job scheduling and resource management, to replace the Torque resource manager and Moab scheduling system that it currently uses, over the course of 2020.

Backup failures for Project on August 1st and 2nd

OSC experienced backup failures on our GPFS file systems (both Project file systems, /fs/project and /fs/ess) the mornings of August 1st and 2nd. The underlying cause was identified and backups were operating as expected the morning of August 3rd. As a result of these failed backups, OSC will not be able to complete some file restore requests for files changed between approximately 2020-07-31 02:30 through 2020-08-02 02:30.

System Downtime August 18, 2020

A downtime for all OSC HPC systems is scheduled from 7 a.m. to 9 p.m., Tuesday, August 18, 2020. The downtime will affect the Pitzer, Ruby and Owens Clusters, web portals and HPC file servers. Login services, except for my.osc.edu, will not be available during this time. OSC clients are able to log into my.osc.edu during the downtime but no changes will take place until the downtime is completed. In preparation for the downtime, the batch scheduler will begin holding jobs that cannot be completed before 7 a.m., August 18, 2020.

Pages