Pitzer Programming Environment (PBS)

This document is obsoleted and kept as a reference to previous Pitzer programming environment. Please refer to here for the latest version.

Compilers

C, C++ and Fortran are supported on the Pitzer cluster. Intel, PGI and GNU compiler suites are available. The Intel development tool chain is loaded by default. Compiler commands and recommended options for serial programs are listed in the table below. See also our compilation guide.

The Skylake processors that make up Pitzer support the Advanced Vector Extensions (AVX512) instruction set, but you must set the correct compiler flags to take advantage of it. AVX512 has the potential to speed up your code by a factor of 8 or more, depending on the compiler and options you would otherwise use. However, bear in mind that clock speeds decrease as the level of the instruction set increases. So, if your code does not benefit from vectorization it may be beneficial to use a lower instruction set.

In our experience, the Intel and PGI compilers do a much better job than the GNU compilers at optimizing HPC code.

With the Intel compilers, use -xHost and -O2 or higher. With the GNU compilers, use -march=native and -O3. The PGI compilers by default use the highest available instruction set, so no additional flags are necessary.

This advice assumes that you are building and running your code on Pitzer. The executables will not be portable. Of course, any highly optimized builds, such as those employing the options above, should be thoroughly validated for correctness.

LANGUAGE	INTEL EXAMPLE	PGI EXAMPLE	GNU EXAMPLE
C	icc -O2 -xHost hello.c	pgcc -fast hello.c	gcc -O3 -march=native hello.c
Fortran 90	ifort -O2 -xHost hello.f90	pgf90 -fast hello.f90	gfortran -O3 -march=native hello.f90
C++	icpc -O2 -xHost hello.cpp	pgc++ -fast hello.cpp	g++ -O3 -march=native hello.cpp

Parallel Programming

MPI

OSC systems use the MVAPICH2 implementation of the Message Passing Interface (MPI), optimized for the high-speed Infiniband interconnect. MPI is a standard library for performing parallel processing using a distributed-memory model. For more information on building your MPI codes, please visit the MPI Library documentation.

Parallel programs are started with the mpiexec command. For example,

mpiexec ./myprog

The program to be run must either be in your path or have its path specified.

The mpiexec command will normally spawn one MPI process per CPU core requested in a batch job. Use the -n and/or -ppn option to change that behavior.

The table below shows some commonly used options. Use mpiexec -help for more information.

MPIEXEC OPTION	COMMENT
`-ppn 1`	One process per node
`-ppn procs`	procs processes per node
`-n totalprocs` `-np totalprocs`	At most totalprocs processes per node
`-prepend-rank`	Prepend rank to output
`-help`	Get a list of available options

Caution: There are many variations on mpiexec and mpiexec.hydra. Information found on non-OSC websites may not be applicable to our installation.

The information above applies to the MVAPICH2 and IntelMPI installations at OSC. See the OpenMPI software page for mpiexec usage with OpenMPI.

OpenMP

The Intel, PGI and GNU compilers understand the OpenMP set of directives, which support multithreaded programming. For more information on building OpenMP codes on OSC systems, please visit the OpenMP documentation.

Process/Thread placement

Processes and threads are placed differently depending on the compiler and MPI implementation used to compile your code. This section summarizes the default behavior and how to modify placement.

For all three compilers (Intel, GNU, PGI), purely threaded codes do not bind to particular cores by default.

For MPI-only codes, Intel MPI first binds the first half of processes to one socket, and then second half to the second socket so that consecutive tasks are located near each other. MVAPICH2 first binds as many processes as possible on one socket, then allocates the remaining processes on the second socket so that consecutive tasks are near each other. OpenMPI alternately binds processes on socket 1, socket 2, socket 1, socket 2, etc, with no particular order for the core id.

For Hybrid codes, Intel MPI first binds the first half of processes to one socket, and then second half to the second socket so that consecutive tasks are located near each other. Each process is allocated ${OMP_NUM_THREADS} cores and the threads of each process are bound to those cores. MVAPICH2 allocates ${OMP_NUM_THREADS} cores for each process and each thread of a process is placed on a separate core. By default, OpenMPI behaves the same for hybrid codes as it does for MPI-only codes, allocating a single core for each process and all threads of that process.

The following tables describe how to modify the default placements for each type of code.

OpenMP options:

Option	Intel	GNU	Pgi	description
Scatter	KMP_AFFINITY=scatter	OMP_PLACES=cores OMP_PROC_BIND=close/spread	MP_BIND=yes	Distribute threads as evenly as possible across system
Compact	KMP_AFFINITY=compact	OMP_PLACES=sockets	MP_BIND=yes MP_BLIST="0,2,4,6,8,10,1,3,5,7,9"	Place threads as closely as possible on system

MPI options:

OPTION	INTEL	MVAPICh2	openmpi	DESCRIPTION
Scatter	I_MPI_PIN_DOMAIN=core I_MPI_PIN_ORDER=scatter	MV2_CPU_BINDING_POLICY=scatter	-map-by core --rank-by socket:span	Distribute processes as evenly as possible across system
Compact	I_MPI_PIN_DOMAIN=core I_MPI_PIN_ORDER=compact	MV2_CPU_BINDING_POLICY=bunch	-map-by core	Distribute processes as closely as possible on system

Hybrid MPI+OpenMP options (combine with options from OpenMP table for thread affinity within cores allocated to each process):

OPTION	INTEL	MVAPICH2	OPENMPI	DESCRIPTION
Scatter	I_MPI_PIN_DOMAIN=omp I_MPI_PIN_ORDER=scatter	MV2_CPU_BINDING_POLICY=hybrid MV2_HYBRID_BINDING_POLICY=linear	-map-by node:PE=$OMP_NUM_THREADS --bind-to core --rank-by socket:span	Distrubute processes as evenly as possible across system ($OMP_NUM_THREADS cores per process)
Compact	I_MPI_PIN_DOMAIN=omp I_MPI_PIN_ORDER=compact	MV2_CPU_BINDING_POLICY=hybrid MV2_HYBRID_BINDING_POLICY=spread	-map-by node:PE=$OMP_NUM_THREADS --bind-to core	Distribute processes as closely as possible on system ($OMP_NUM_THREADS cores per process)

The above tables list the most commonly used settings for process/thread placement. Some compilers and Intel libraries may have additional options for process and thread placement beyond those mentioned on this page. For more information on a specific compiler/library, check the more detailed documentation for that library.

GPU Programming

64 Nvidia V100 GPUs are available on Pitzer. Please visit our GPU documentation.

Supercomputer:

Pitzer

Search form

Pitzer Programming Environment (PBS)

Compilers

Parallel Programming

MPI

OpenMP

Process/Thread placement

GPU Programming

Client Resources

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links

Search form

You are here

Pitzer Programming Environment (PBS)

Compilers

Parallel Programming

MPI

OpenMP

Process/Thread placement

GPU Programming

Client Resources

Upcoming Events

Recent News

Translate

Ohio Department of Higher Education

State Government Links

Education Links