Lead Instructors

The lead instructors at Berkeley provided all of the instructional materials used in the course.  This included videos of all lectures, quizzes relating to each of the lectures, and several programming assignments.  The recorded videos allowed each of the participating institutions to work through the course on their own academic schedule.  The lectures can either be watched independently by the students or jointly in the classroom.  Class time can then be used to discuss the lecture material and/or augment it with related discussions.

The quizzes are provided online as a way to gauge whether the remote students are keeping up with the class and to assess their comprehension of the lecture materials.  The quiz grades can be used as part of the final grading system at each participating institution.

The computer exercises focus on several different strategies for optimizing parallel computing code, using a range of programming options and algorithms.  An autograder was created for each exercise.  The autograders run the student's codes and provide a score based on the best possible optimization of each program.  Students can use that score to gauge the efficiency of their own code and instructors can use it as one way of gauging the mastery of the programming topics as part of the grading system.   The teaching assistant at Berkeley and the project coordinator track course and coding questions by the students and faculty through discussion forums on the course management system.  Instructors are also given access to the optimized code so that they can better advise their own students about programming strategies.

The lecture and assignment outline for the most recent offering of the course can be found here.  An earlier version of the lectures and quizzes are available through the XSEDE training website.  The first written assignment is to describe an existing parallel application (Homework 0).  All programming assignments are completed on XSEDE resources based on a classroom allocation that serves all course participants. The first computer exercise is an optimization of a matrix multiplication on a single processor.  The second assignment is to optimize a particle simulation.  Part 1 is done using multiple processors and part 2 using GPU's.  The third assignment uses the UPC language to optimize a graph algorithm to solve a de Novo genome assembly problem.  Students also complete an independent individual or group final project under the direction of their local instructors.  Examples of past projects are provided by Berkeley.

Course Outline and Schedule

 

Topic

 

Start Date

Completion Date

Introduction

 

1/16/2018

1/16/2018

Single Processor Machines: Memory Hierarchies and Processor Features

 

1/18/2018

1/23/2018

Homework 0 – Describe a Parallel Application

 

1/18/2018

1/29/2018

Parallel Machines and Programming Models

 

1/23/2018

1/25/2018

Sources of Parallelism and Locality in Simulation - Part 1

 

1/25/2018

1/30/2018

Sources of Parallelism and Locality in Simulation - Part 2

 

1/30/2018

2/1/2018

Shared Memory Programming: Threads and OpenMP, and Tricks with Trees

 

2/1/2018

2/6/2018

Programming Homework 1 - Optimize Matrix Multiplication

 

1/26/2018

2/9/2018

Distributed Memory Machines and Programming

 

2/6/2018

2/8/2018

Partitioned Global Address Space Programming with Unified Parallel C (UPC) and UPC++, by Kathy Yelick

 

2/8/2018

2/13/2018

ICloud Computing and Big Data Processing, by Shivaram Venkataraman

 

2/13/2018

2/13/2018

NERSC, Cori, Knights Landing and Other matters by Jack Deslippe

 

2/15/2018

2/15/2018

Programming Homework 2 (Part 1) Parallelizing a Particle Simulation

 

2/9/2018

3/2/2018

An Introduction to CUDA/OpenCL and Graphics Processors (GPUs), by Forrest Iandola

 

2/20/2018

2/20/2018

Dense Linear Algebra (Part 1)

 

2/22/2018

2/22/2018

Dense Linear Algebra (Part 2): Comm Avoiding Algorithms

 

2/27/2018

2/27/2018

Graph Partitioning

 

3/1/2018

3/62018

Programming Homework 2 (Part 2) Parallelizing a Particle Simulation (GPU)

 

3/1/2018

3/9/2018

Automatic Performance Tuning and Sparse Matrix Vector Multiplication

 

3/6/2018

3/8/2018

Automatic Performance Tuning and Sparse Matrix Vector Multiplication (continued)

 

3/8/2018

3/8/2018

Programming Homework 3 - Parallelize Graph Algorithms

 

3/13/2018

4/6/2018

Structured Grids

 

3/13/2018

3/13/2018

Parallel Graph Algorithms, by Aydin Buluc

 

3/15/2018

3/152018

Final Project Proposal

 

 

4/6/2018

Architecting Parallel Software with Patterns, by Kurt Keutzer

 

3/20/2018

3/20/2018

Fast Fourier Transform

 

3/22/2018

3/22/2018

Modeling and Predicting Climate Change, by Michael Wehner

 

4/3/2018

4/3/2018

Scientific Software Ecosystems by Mike Heroux

 

4/5/2018

4/5/2018

Dynamic Load Balancing

 

4/10/2018

4/10/2018

Accelerated Materials Design through High-throughput First Principles Calculations by Kristin Persson

 

4/12/2018

4/12/2018

Hierarchical Methods for the N-Body Problem

 

4/17/2018

4/19/2018

Communication Lower Bounds and Optimal Algorithms

 

4/19/2018

4/19/2018

Big Bang, Big Data, Big Iron, HPC and the Cosmic Microwave Background Data Analysis by Julian Borrill

 

4/24/2018

4/24/2018

 Big Bang and Exascale: A Tale of Two Ecosystems by Kathy Yelick

 

 4/26/2018

 4/26/2018

Final Project Poster

 

 

Local

Final Project Report

 

 

Local

Past CS267 Projects: Past Projects