MCS 572 HP-Convex Exemplar SPP1200 Group Project Suggestions
Fall 1996
Professor F. B. Hanson
DUE Monday 11 November 1996 in Class.
Two(2) copies of the report are due Wednesday 11 November 1996 in class
(one graded copy will be returned and the other will be used for the
the class record and future class proposals).
Students will make short presentations of group project results in class,
starting on Monday 11 November 1996.
CAUTION: Projects should have sufficient work to effective utilize
the SPP1200, by a comparison of the two platforms, but should
not be so time consuming as to severely affect the performance of other users.
The project can use either Fortran or C, providing the project makes sense in
the programming language.
Write a group (1 .leq. group .leq. 2) with good load balancing
among the group members) report that is a short paper (8 or 15 or so
pages) as if for publication (i.e., with
- abstract (short paragraph description of problem and results)
- executive summary (give an itemized brief summary of your paper)
- introduction (motivate your problem for the class, citing prior
work)
- problem or method
- results and discussion (should include theoretical explanations of
interesting results and graphs; explain results whether good or bad)
- conclusions (brief)
- acknowledgements (give thanks to others that helped you and to the
UIC Computer Center for use of the SPP1200)
- references (list papers and books that you used as sources)
- appendices: compiler informational code listing (`*.LST' file with
loop marking from "-LST" option of `fc' or "-or all" option of `cc'; see
Appendix C: Optimization report, of the "Convex Exemplar Programming Guide".),
supporting timings and if relevant assembly listings.
You are welcome to make up your own projects (see the first suggestion),
but you should discuss this with Professor Hanson
before hand for advice. Also let him know what ever project
you select, because even the following ideas are very broad.
WARNING: If you use test or sample floating point arrays in your
project, make sure they are genuine and random floating point, i.e.,
do not use trivial integers or numbers with patterns. Consult the
class local user's guide for how to run a scalar job to use as a
reference measurement and for how to use the random function `rand'.
The Projects
- Own Project.
A Convex project or your own design, such as optimization
of some method connected with your thesis research area, graphical
visualization, another course, or something interesting in science, engineering or mathematics.
- Statistics Project.
Generate suitable sets of random
numbers (make sure they are floating point),
each with a different sample size N. `rand' is a standard Unix random
number generator, but check it out yourself.
Describe how you tested the
randomness of your data, e.g., test for a uniform random distribution.
For each set, compute
basic statistics, like mean, variance and Chi-Square test in as
efficient parallel manner as possible (i.e., make use of the extended
Fortran90 intrinsic sum function `sum' on the Convex (needs an
`intrinsic sum' declaration)). Plot T versus N and T
versus p. Estimate or compute and plot
the Amdahl parallel fraction as a function of N. Compare speedups and
efficiencies relative to N. Is the Amdahl law
operative as the problem size N becomes large?
Develop your own performance model that is appropriate for the behavior
of the timing data with number of processors p, sample size N and
Chi-Square bin size Nb. This project assumes some knowledge of statistics.
Does your performance model account for deviations in Amdahl's law?
- Loop Unrolling and Vector (Memory) Touches.
Measure actual performance on the Convex and advantage of "unrolling" loops for
several depths of unrolling for matrix-vector and matrix-matrix
multiplication. Compare results to the theoretical results presented in class
from J.J. Dongarra and S.C. Eisenstat (1984), "Squeezing the Most out of
an Algorithm in CRAY Fortran," ACM Trans. Math. Software 10 (No. 3)
pp. 221-230 and J.J. Dongarra and A.R. Hinds (1979),
"Unrolling Loops in FORTRAN," Software-Practice and Experience 9,
pp. 219-229 and W.R. Cowell and C.P. Thompson, "Transforming Fortran
DO Loops to Improve Performance on Vector Architectures, "ACM Trans.
Math. Software 12 (No. 4, Dec. 1986), pp. 324-353. See also J. Levesque
and J.W. Williamson, "A Guidebook to Fortran on Supercomputers," 1989.
Find out how to get a pseudo-assembler listing of the principal
parts of your program for analysis (check the `man fc' or `man cc' or the local
guide for the `-S' option and explore simple examples).
Is there an optimal unrolling depth due to the fact that
there is a hardware limit on the Convex.
Present results in MegaFlops versus unrolling depth and determine a maximum,
if any. Measure the cost of scalar touches relative to vector (memory)
touches, and use the ratio to correct the touch count in your report discussion.
Try both horizontal (LHS) or vertical (RHS) unrolling for matrix
multiplication.
- Row versus Column Oriented Multiplication Loops.
Determine regions of array size where there are
efficiency advantages on the Convex using
column referencing as opposed to row referencing in reordering
numerical linear algebra multiple loops (matrix-vector and
matrix-matrix multiplication).
Is the simple Fortran column environment
argument valid, and if not why not? How strong is the dependence on
loop iteration size N? What about rectangular (non-square) matrices. Is there
a "shape effect"? What about stride effects?
Make sure your floating point arrays are genuine.
(See Dongarra, Gustavson and Karp, SIAM
Review, Vol. 26, 1984, pp. 91-122; for the CRAY-1).
- Row versus Column Oriented LU Decomposition Loops.
Determine regions of array size where there are
efficiency advantages on the Convex using
column referencing as opposed to row referencing in reordering
LU decomposition multiple loops. Is the simple Fortran column environment
argument valid, and if not why not? How strong is the dependence on
loop iteration size N? What about rectangular (non-square) matrices.
Make sure your floating point arrays are genuine.
(See Dongarra, Gustavson and Karp, SIAM
Review, Vol. 26, 1984, pp. 91-122; for the CRAY-1).
- Validity of Hanson's "Avoid These Things".
Investigate a number of Professor Hanson's Rules of Thumb about "Avoiding
Certain Optimization Hindering Constructs". Find out the validity
on loops (if loops were involved) with sufficient work (ie., bigger than
the toy class examples). Find regions of work size, if any, where each
rule works. For example: What is the quantitative difference in overhead
between common and subroutine argument passing? How much does inlining
subroutines and functions save?
statement.
- Iteration Methods.
Make a comparison of the performance of Jacobi
and Gauss-Seidel methods for Elliptic Partial Differential equations.
Gauss-Seidel is better for serial computers, but what about parallel
computers?
See James M. Ortega, Introduction to Parallel and Vector Solution of Linear
Systems, Plenum Press, 1988.
- Compare Strassen's matrix multiplication algorithm with the usual
form.
See D. Bailey, "Extra High Speed Multiplication on the Cray-2", SIAM J. Sci.
Comp. 9(3), May 1988, p. 603-607.
- Test performance of levels of parallel optimization.
For instance, does the command `fc -O[n] -LST [source].f' lead to faster,
or slower, executables when `[n]' is `2', `1' or `0' rather than `3'
for matrix multiplication or some other application?
Test speedup and efficiency using from 1 to 8 processors using the `mpa'
modify parallel attribute utility (see the local guide and `man mpa' command).
- Test performance different classes of memory addressing.
Test performance dependence on the 256 KB (32 KDW, KDW = kilo-double-words)
Data Cache attached to each processor (hint: may want to fix the number of
processors with the `mpa' command as a control).
Also can test `far_shared', `near_shared', `block_shared' and `thread_private'
memories using appropriate compiler directives, as another project suggestion.
Use an application like matrix-matrix multiplication for various size matrices
in your tests.
See Chapter 5 and 6 of the "Convex Exemplar Programming Guide".
Information Important to Project:
Please report to Professor Hanson any problems:
Web Source:http://www.math.uic.edu/~hanson/borgproject.html