Lead Instructors

The lead instructors at Berkeley provided all of the instructional materials used in the course. This included videos of all lectures, quizzes relating to each of the lectures, and several programming assignments. The recorded videos allowed each of the participating institutions to work through the course on their own academic schedule. The lectures can either be watched independently by the students or jointly in the classroom. Class time can then be used to discuss the lecture material and/or augment it with related discussions.

The quizzes are provided online as a way to gauge whether the remote students are keeping up with the class and to assess their comprehension of the lecture materials. The quiz grades can be used as part of the final grading system at each participating institution.

The computer exercises focus on several different strategies for optimizing parallel computing code, using a range of programming options and algorithms. An autograder was created for each exercise. The autograders run the student's codes and provide a score based on the best possible optimization of each program. Students can use that score to gauge the efficiency of their own code and instructors can use it as one way of gauging the mastery of the programming topics as part of the grading system. The teaching assistant at Berkeley and the project coordinator track course and coding questions by the students and faculty through discussion forums on the course management system. Instructors are also given access to the optimized code so that they can better advise their own students about programming strategies.

The lecture and assignment outline for the most recent offering of the course can be found here. An earlier version of the lectures and quizzes are available through the XSEDE training website. The first written assignment is to describe an existing parallel application (Homework 0). All programming assignments are completed on XSEDE resources based on a classroom allocation that serves all course participants. The first computer exercise is an optimization of a matrix multiplication on a single processor. The second assignment is to optimize a particle simulation. Part 1 is done using multiple processors and part 2 using GPU's. The third assignment uses the UPC language to optimize a graph algorithm to solve a de Novo genome assembly problem. Students also complete an independent individual or group final project under the direction of their local instructors. Examples of past projects are provided by Berkeley.

Course Outline and Schedule

Topic	Start Date	Completion Date
Introduction	1/16/2018	1/16/2018
Single Processor Machines: Memory Hierarchies and Processor Features	1/18/2018	1/23/2018
Homework 0 – Describe a Parallel Application	1/18/2018	1/29/2018
Parallel Machines and Programming Models	1/23/2018	1/25/2018
Sources of Parallelism and Locality in Simulation - Part 1	1/25/2018	1/30/2018
Sources of Parallelism and Locality in Simulation - Part 2	1/30/2018	2/1/2018
Shared Memory Programming: Threads and OpenMP, and Tricks with Trees	2/1/2018	2/6/2018
Programming Homework 1 - Optimize Matrix Multiplication	1/26/2018	2/9/2018
Distributed Memory Machines and Programming	2/6/2018	2/8/2018
Partitioned Global Address Space Programming with Unified Parallel C (UPC) and UPC++, by Kathy Yelick	2/8/2018	2/13/2018
ICloud Computing and Big Data Processing, by Shivaram Venkataraman	2/13/2018	2/13/2018
NERSC, Cori, Knights Landing and Other matters by Jack Deslippe	2/15/2018	2/15/2018
Programming Homework 2 (Part 1) Parallelizing a Particle Simulation	2/9/2018	3/2/2018
An Introduction to CUDA/OpenCL and Graphics Processors (GPUs), by Forrest Iandola	2/20/2018	2/20/2018
Dense Linear Algebra (Part 1)	2/22/2018	2/22/2018
Dense Linear Algebra (Part 2): Comm Avoiding Algorithms	2/27/2018	2/27/2018
Graph Partitioning	3/1/2018	3/62018
Programming Homework 2 (Part 2) Parallelizing a Particle Simulation (GPU)	3/1/2018	3/9/2018
Automatic Performance Tuning and Sparse Matrix Vector Multiplication	3/6/2018	3/8/2018
Automatic Performance Tuning and Sparse Matrix Vector Multiplication (continued)	3/8/2018	3/8/2018
Programming Homework 3 - Parallelize Graph Algorithms	3/13/2018	4/6/2018
Structured Grids	3/13/2018	3/13/2018
Parallel Graph Algorithms, by Aydin Buluc	3/15/2018	3/152018
Final Project Proposal		4/6/2018
Architecting Parallel Software with Patterns, by Kurt Keutzer	3/20/2018	3/20/2018
Fast Fourier Transform	3/22/2018	3/22/2018
Modeling and Predicting Climate Change, by Michael Wehner	4/3/2018	4/3/2018
Scientific Software Ecosystems by Mike Heroux	4/5/2018	4/5/2018
Dynamic Load Balancing	4/10/2018	4/10/2018
Accelerated Materials Design through High-throughput First Principles Calculations by Kristin Persson	4/12/2018	4/12/2018
Hierarchical Methods for the N-Body Problem	4/17/2018	4/19/2018
Communication Lower Bounds and Optimal Algorithms	4/19/2018	4/19/2018
Big Bang, Big Data, Big Iron, HPC and the Cosmic Microwave Background Data Analysis by Julian Borrill	4/24/2018	4/24/2018
Big Bang and Exascale: A Tale of Two Ecosystems by Kathy Yelick	4/26/2018	4/26/2018
Final Project Poster		Local
Final Project Report		Local

Past CS267 Projects: Past Projects