next up previous
Next: References

CS 594
Understanding Parallel Architectures:
From Theory To Practice

Jack Dongarra, Professor, Spring 2002 3 credits

This course is aimed at providing students with a deep knowledge of the techniques and tools needed to understand today's and tomorrow's high performance computers, and to efficiently program them. A mixture of theoretical and practical material will be provided.

Today's high performance computers range from expensive highly parallel distributed memory platforms down to cheap local networks of standard workstations. But the problems associated with software development are the same on all architectures: the user needs to recast his or her algorithm or application in terms of parallel entities (tasks, processes, threads, or whatever) that will execute concurrently. Parallelism is difficult to detect in an automatic fashion because of data dependencies. In many cases, one needs to perform some form of algorithm restructuring to expose the parallelism. Finally, to realize the restructured algorithm in terms of software on a specific architecture may be quite complicated.

Fortunately, there are well-established techniques and tools to help: portable computation libraries such as ScaLAPACK, portable communication libraries like MPI, general-purpose task systems such as PVM, or even data-parallel languages like HPF. These are the tools that the class targets.

In this course we plan to cover and understand the nuts and bolts of developing parallel applications. For instance our study of PVM goes with the foundations of task graph scheduling, that of MPI with the complexity analysis of collective communication operations like broadcasts etc, and that of HPF with data dependence analysis and automatic parallelization techniques. In addition we will study performance evaluation and benchmarking on today's high-performance computers. Each lecture of the class will make use of simple examples borrowed from numerical linear algebra. Matrix-matrix multiplication or LU decomposition will provide the framework needed to illustrate the theory.

Target audience is mainly computer science students. Students from other disciplines are welcome, provided they have a computer science background in the following areas: machine architecture, algorithm design, elementary graph algorithms, and complexity analysis.

Grading will be based:

There will be no substantial project on parallel programming, unless requested by a student willing to implement his particular application on a parallel machine (that would replace the report).

Tentative outline of the class is the following:

  1. Overview of High Performance Computing
  2. PVM/MPI
  3. Memory Hierarchy, Cache
  4. HPF and OpenMP
  5. Blocked linear algebra
  6. Linear Algebra Algorithms
  7. Parallel Performance Analysis Tools
  8. Network Enabled Servers
  9. Metacomputing
  10. Performance Optimization Techniques
  11. Performance Optimization Techniques
  12. Class Projects
  13. Performance Modeling of Parallel Applications
  14. Scalability Analysis

There is no book covering the scope of the class. But for each lecture a comprehensive document will be made available in postscript. A short bibliography is given below.




next up previous
Next: References

Jack Dongarra
2002