MCS 572 NCSA Platinum and UIC Argo Clusters Individual Group Project Suggestions
Spring 2003
Professor F. B. HANSON
FINAL PROJECT REPORT DUE Friday 02 May 2003 in 718 SEO.
Groups can be either one or two students, but if two their contributions
should be balanced.
Students will make short presentations of group project results in class,
starting on Monday 28 April 2003, or earlier if any group is ready.
Recall, that each student must give a least one presentation, but
shared presentations may be all right.
CAUTION: Projects should have sufficient work to effective
utilize the Platinum Cluster (pt.ncsa.uiuc.edu) and the Argo Cluster
(argo.cc.uic.edu) with MPI, but should not be so time consuming
as to severely affect the performance of other users. Write a group
(1 < group < 2) with good load balancing among the group
members) report that is a short paper (8 or 15 or so pages plus
appendices) as if for publication, i.e., with
- abstract (short description of problem and results)
- executive summary (give an itemized brief summary of your paper)
- introduction (motivate your problem for the class, citing prior
work)
- problem or method
- results and discussion (should include theoretical explanations of
interesting results and graphs; explain results whether good or bad)
- conclusions (brief, emphasizing your main results)
- acknowledgements (give thanks to others that helped you and to the
National Center for Supercomputing Applications (NCSA) of use of the Platinum
IA32 Cluster and the UIC ACCC Argo Beowulf Cluster)
- references (list articles, books, guides, web pages and other
documents that you used as sources)
- appendices: (code used, compiler options, job scripts or command
line execution format, sample output, and supporting timings.
You are welcome to make up your own projects (see the first
suggestion), but you should discuss this with Professor Hanson
before hand for suggestions. Also let him know what ever project
you select for additional advice, because even the following ideas
are very broad.
WARNING: If you use test or sample floating point arrays in your
project, make sure they are genuine and random floating point, i.e.,
do not use trivial integers or numbers with patterns. Consult the
class local user's guide for how to run a scalar job to use as a
reference measurement.
Also, if your project is similar to the one you did
on the PSC TSC, then you may want to give an extensive comparison to
your TCS Project. Also, any other small scale local cluster such as
the EVL or BIOE clusters can be substituted for Argo.
Platinum/Argo Project Suggestions
- Own Project: The Best Choice.
A NCSA Platinum - ACCC Argo with MPI project or your own design,
such as optimization
of some method connected with your thesis research area, graphical
visualization, another course, or some interesting science-engineering area.
- Statistics Project. Generate suitable sets of random
numbers (make sure they are floating point), each with a different
sample size N. The function `ranf' is a very good random number
generator (RNG), but check it out yourself.
See the
Class TCS Local Guide
or Platinum/Argo man pages. Describe how you tested the
randomness of your data, e.g., test for a uniform random distribution.
For each set, compute basic statistics, like mean, variance and
Chi-Square test in as efficient vector manner as possible (i.e.,
make use of the extended Fortran90 intrinsic sum function `sum' on
the Cray. Plot T versus N and T versus p. Estimate or compute
and plot the Amdahl vector fraction as a function of N. Compare
speedups and efficiencies relative to N. Is the Amdahl law operative
as the problem size N becomes large? Develop your own performance
model that is appropriate for the behavior of the timing data with
number of processors p, sample size N and Chi-Square bin size Nb.
Does your performance model account for deviations in Amdahl's law?
- Iteration Methods. Make a comparison of the performance
of Jacobi and Gauss-Seidel methods for Elliptic Partial Differential
equations. Gauss-Seidel is better for serial computers, but what
about parallel and vector computers? (See Ortega, "Intro. Parallel
and Vector Solution of Linear Systems," 1988, or the newer Golub and Ortega
"Scientific Computing: An Introduction with Parallel Computing," 1993,
and related papers.) Revise the code for an arbitrary number of block/slice processes with a refined decomposition.
See
Class Sample Laplace-MPI C Code.
- Two-Dimensional Block Decompositions.
Implement the Jacobi 2D block decomposition sketched in class using the
MPI_Cart functions and other features with blocking or non-blocking
communication. Find other applications for the techniques.
- Numerical Parabolic Differential Equations.
Revised the Jacobi iteration block decomposition for marching in
time instead of approximate iterations for the elliptic Laplace
problem. Be sure that the parabolic mesh ratio is sufficiently
small, i.e., Diffusion*delta(t)/delta{x)2 < 0.5. If
the drift term is significant, upwinding with forward/backward
differences will be needed.
- Two-Dimensional Block Decompositions.
Implemment the Jacobi 2D block decomposition sketched in class using the
MPI_Cart functions and other features with blocking or non-blocking
communication.
- Test whether higher or lower levels of optimization give
higher performance. For instance, does the command
`gcc -O[n] -[other_optimization_options]... cpgm.c' lead to faster executables for some values of Option Level `[n]' (e.g., -O, -O0, -O1, -O2, -O3, ... on
platinum)
for matrix multiplication or some other application.
Use `man gcc' on pt.ncsa.uiuc.edu. Else compare the performance
of other compilers like Intel's `icc' (use `icc -help' or `icpc -help');
or the Portland Group (PGI)'s C `pgcc' or `pgCC' (use `pgcc -help').
This suggestion can
be combined with other suggestions for testing.
- Compare Performance of MPI Functions/Subroutines.
For instance, compare the Collective Communication routine MPI_Bcast
with the Blocking Point to Point Communication routine MPI_Send along with
MPI_Recv, and with the Nonblocking Point to Point Communication
routine MPI_Isend along with MPI_Irecv. Use MPI_Wtime to measure performance
times. (Note shmem is the Cray native message passing library.
See `man shmem'.) Compare the Send and Recv functions with the
sequence non-blocking functions Irec, ISend and Wait for MPI to see
if computation and communication can be sufficiently overlapped to
improve performance.
- Advanced MPI Features and Libraries. Implement some of
the advanced MPI functions with applications suggested in
Chapters 5-7 in
Using MPI: Portable Parallel Programming with the Message-Passing Interface
by William Gropp, Ewing Lusk, and Anthony Skjellum.
or in Peter Pacheco's Book or in Wilkinson and Allen's book.
- Extensive TCS and Platinum/Argo Performance Comparison.
Take some application and make a comparison between optimized performance
on the PSC TCS and the Platinum/Argo pair with MPI.
Project Resources:
Web Source: http://www.math.uic.edu/~hanson/mcs572/pt03project.html
Email Comments or Corrections or Questions to Professor Hanson.