MCS 572 HP-Convex Exemplar SPP1200 Group Project Suggestions   Fall 1996

Professor F. B. Hanson


DUE Monday 11 November 1996 in Class.


Two(2) copies of the report are due Wednesday 11 November 1996 in class (one graded copy will be returned and the other will be used for the the class record and future class proposals). Students will make short presentations of group project results in class, starting on Monday 11 November 1996.

CAUTION: Projects should have sufficient work to effective utilize the SPP1200, by a comparison of the two platforms, but should not be so time consuming as to severely affect the performance of other users. The project can use either Fortran or C, providing the project makes sense in the programming language. Write a group (1 .leq. group .leq. 2) with good load balancing among the group members) report that is a short paper (8 or 15 or so pages) as if for publication (i.e., with
  1. abstract (short paragraph description of problem and results)
  2. executive summary (give an itemized brief summary of your paper)
  3. introduction (motivate your problem for the class, citing prior work)
  4. problem or method
  5. results and discussion (should include theoretical explanations of interesting results and graphs; explain results whether good or bad)
  6. conclusions (brief)
  7. acknowledgements (give thanks to others that helped you and to the UIC Computer Center for use of the SPP1200)
  8. references (list papers and books that you used as sources)
  9. appendices: compiler informational code listing (`*.LST' file with loop marking from "-LST" option of `fc' or "-or all" option of `cc'; see Appendix C: Optimization report, of the "Convex Exemplar Programming Guide".), supporting timings and if relevant assembly listings.
You are welcome to make up your own projects (see the first suggestion), but you should discuss this with Professor Hanson before hand for advice. Also let him know what ever project you select, because even the following ideas are very broad.

WARNING: If you use test or sample floating point arrays in your project, make sure they are genuine and random floating point, i.e., do not use trivial integers or numbers with patterns. Consult the class local user's guide for how to run a scalar job to use as a reference measurement and for how to use the random function `rand'.

The Projects


  1. Own Project. A Convex project or your own design, such as optimization of some method connected with your thesis research area, graphical visualization, another course, or something interesting in science, engineering or mathematics.
  2. Statistics Project. Generate suitable sets of random numbers (make sure they are floating point), each with a different sample size N. `rand' is a standard Unix random number generator, but check it out yourself. Describe how you tested the randomness of your data, e.g., test for a uniform random distribution. For each set, compute basic statistics, like mean, variance and Chi-Square test in as efficient parallel manner as possible (i.e., make use of the extended Fortran90 intrinsic sum function `sum' on the Convex (needs an `intrinsic sum' declaration)). Plot T versus N and T versus p. Estimate or compute and plot the Amdahl parallel fraction as a function of N. Compare speedups and efficiencies relative to N. Is the Amdahl law operative as the problem size N becomes large? Develop your own performance model that is appropriate for the behavior of the timing data with number of processors p, sample size N and Chi-Square bin size Nb. This project assumes some knowledge of statistics. Does your performance model account for deviations in Amdahl's law?
  3. Loop Unrolling and Vector (Memory) Touches. Measure actual performance on the Convex and advantage of "unrolling" loops for several depths of unrolling for matrix-vector and matrix-matrix multiplication. Compare results to the theoretical results presented in class from J.J. Dongarra and S.C. Eisenstat (1984), "Squeezing the Most out of an Algorithm in CRAY Fortran," ACM Trans. Math. Software 10 (No. 3) pp. 221-230 and J.J. Dongarra and A.R. Hinds (1979), "Unrolling Loops in FORTRAN," Software-Practice and Experience 9, pp. 219-229 and W.R. Cowell and C.P. Thompson, "Transforming Fortran DO Loops to Improve Performance on Vector Architectures, "ACM Trans. Math. Software 12 (No. 4, Dec. 1986), pp. 324-353. See also J. Levesque and J.W. Williamson, "A Guidebook to Fortran on Supercomputers," 1989. Find out how to get a pseudo-assembler listing of the principal parts of your program for analysis (check the `man fc' or `man cc' or the local guide for the `-S' option and explore simple examples). Is there an optimal unrolling depth due to the fact that there is a hardware limit on the Convex. Present results in MegaFlops versus unrolling depth and determine a maximum, if any. Measure the cost of scalar touches relative to vector (memory) touches, and use the ratio to correct the touch count in your report discussion. Try both horizontal (LHS) or vertical (RHS) unrolling for matrix multiplication.
  4. Row versus Column Oriented Multiplication Loops. Determine regions of array size where there are efficiency advantages on the Convex using column referencing as opposed to row referencing in reordering numerical linear algebra multiple loops (matrix-vector and matrix-matrix multiplication). Is the simple Fortran column environment argument valid, and if not why not? How strong is the dependence on loop iteration size N? What about rectangular (non-square) matrices. Is there a "shape effect"? What about stride effects? Make sure your floating point arrays are genuine. (See Dongarra, Gustavson and Karp, SIAM Review, Vol. 26, 1984, pp. 91-122; for the CRAY-1).
  5. Row versus Column Oriented LU Decomposition Loops. Determine regions of array size where there are efficiency advantages on the Convex using column referencing as opposed to row referencing in reordering LU decomposition multiple loops. Is the simple Fortran column environment argument valid, and if not why not? How strong is the dependence on loop iteration size N? What about rectangular (non-square) matrices. Make sure your floating point arrays are genuine. (See Dongarra, Gustavson and Karp, SIAM Review, Vol. 26, 1984, pp. 91-122; for the CRAY-1).
  6. Validity of Hanson's "Avoid These Things". Investigate a number of Professor Hanson's Rules of Thumb about "Avoiding Certain Optimization Hindering Constructs". Find out the validity on loops (if loops were involved) with sufficient work (ie., bigger than the toy class examples). Find regions of work size, if any, where each rule works. For example: What is the quantitative difference in overhead between common and subroutine argument passing? How much does inlining subroutines and functions save? statement.
  7. Iteration Methods. Make a comparison of the performance of Jacobi and Gauss-Seidel methods for Elliptic Partial Differential equations. Gauss-Seidel is better for serial computers, but what about parallel computers? See James M. Ortega, Introduction to Parallel and Vector Solution of Linear Systems, Plenum Press, 1988.
  8. Compare Strassen's matrix multiplication algorithm with the usual form. See D. Bailey, "Extra High Speed Multiplication on the Cray-2", SIAM J. Sci. Comp. 9(3), May 1988, p. 603-607.
  9. Test performance of levels of parallel optimization. For instance, does the command `fc -O[n] -LST [source].f' lead to faster, or slower, executables when `[n]' is `2', `1' or `0' rather than `3' for matrix multiplication or some other application? Test speedup and efficiency using from 1 to 8 processors using the `mpa' modify parallel attribute utility (see the local guide and `man mpa' command).
  10. Test performance different classes of memory addressing. Test performance dependence on the 256 KB (32 KDW, KDW = kilo-double-words) Data Cache attached to each processor (hint: may want to fix the number of processors with the `mpa' command as a control). Also can test `far_shared', `near_shared', `block_shared' and `thread_private' memories using appropriate compiler directives, as another project suggestion. Use an application like matrix-matrix multiplication for various size matrices in your tests. See Chapter 5 and 6 of the "Convex Exemplar Programming Guide".

Information Important to Project:


Please report to Professor Hanson any problems:
Web Source:http://www.math.uic.edu/~hanson/borgproject.html