MCS572 UIC User's Local Guide to
PSC Terascale Computing System (TCS) Cluster

version 0.80
24 March 2003

F. B. Hanson

Mail address:

Department of Mathematics, Statistics, and Computer Science

University of Illinois at Chicago

851 S. Morgan; SEO, MC 249

Chicago, IL 60607-7045

Office address:

Room: 718 SEO

Hanson World Wide WEB Home Page:

http://www.math.uic.edu/~hanson/

UIC Fall 2003 Course:

MCS 572 Introduction to Supercomputing

MCS 572 Class World Wide WEB Home Page:

http://www.math.uic.edu/~hanson/mcs572/

Acknowledgement:

Project MCS 572 Introduction to Supercomputing is supported in part by National Science Foundation through computing resources provided by the Pittsburgh Supercomputer Center though PSC Grant SEE030003P to Principal Investigator Floyd Hanson.

Preface.

TCS Overview.

TCS Processing Units.
TCS Benchmark Performance.
TCS Memory Units.
TCS Operating System.
TCS Login Shells.
TCS-UIC Login Access.
TCS-UIC File Transfer.
TCS Programming Languages.
TCS Batch Queueing Systems: PBS and NQS with MPI.
TCS Message Passing Interface (MPI) Sources.
More TCS Information.

Guide Notation.

Background References.

UNIX Command Dictionary. AVAILABLE As Separate File.

UNIX f90 Compile, Load and Execution Commands.
UNIX C Language Commands.
UNIX Performance Commands.
UNIX makefile Commands.
UNIX Mail Commands.
UNIX Network Queueing System (NQS) and AlphaServer Portable Batch System (PBS).

Interrupts Dictionaries Telnet and UNIX.

UNIX Interrupts Dictionary.

MPI Message Passing Programming on TCS. UNDER RECONSTRUCTION

TCS Fortran90 and other Extensions. UNDER RECONSTRUCTION

TCS Fortran90 (f90) Compiler Options.
TCS Fortran90 (f90) Miscellaneous Extensions.
Fortran90 Array Construction Functions.
Fortran90 Array Reduction Functions.
Fortran90 Array Manipulation Functions.
Fortran90 Array Location Functions.
Fortran90 Array Matrix Multiplication Functions.
TCS Fortran90 Array Functions TEST CODE.
TCS Fortran90 (f90) Library Functions.
CFT Fortran90 (f90) Compiler Scalar Optimization Directives.
CFT Fortran90 (f90) Compiler Loop Directives.
CFT Fortran90 (f90) Compiler Storage Directives.
CFT Fortran90 (f90) Compiler Diagnostic Directives.

f90 and cc Timing Utility Functions. UNDER RECONSTRUCTION

T90 Fortran90 (f90) Timing Utility Functions.
cc Timing Utility Function.
Table of Other Timers.

Preface

This User's Local Guide is intended to be a sufficient, hands-on introduction to the Pittsburgh Supercomputing Center TCS (Terascale Computing System Parallel Cluster for our MCS 572 Introduction to Supercomputing class. The TCS has a Compaq variation of the UNIX operating system called Tru64 UNIX.

The PSC Class Account for MCS572 Fall 2000 is `sc70jpp' for the PSC Grant SEEE030003P.

TCS Overview.

The PSC TCS MC512 is a large scale parallel cluster with 64 Compaq (HP) Alphaserver compute nodes, each wth 4 667 MHz processors, making a total of 256 processors. The PSC TCS's internet address is

tcs.psc.edu

with the prompt of `%'. For TCS information from PSC, see

TCS Class Cluster

Remark: There are a lot of inaccuracies in this outdated page, some of which are corrected in this local guide.

The TCS is a protoype for a larger, final terascale system called Lemieux with 750 computer nodes with at total of 3000 processors. Lemieux's web page should be consuted for updated system information:

TCS Lemieux (Best) Cluster

The TCS and Lemieux AlphaServer System Reference card is found at

Alphaserver SC Programmer's Quick Reference Guide.

What does the PSC TCS look like? PSC TCS Picture

A simple view of the TCS architecture is given for the larger system:

TCS Lemieux Configuration

TCS Compute Nodes and Procesors.

Each compute node is an HP AlphaServer SC ES40/EV67/ nodes configured as as a 4 processor Symmetric MultiProcessor or Shared Memory Processor (SMP) with 4 GigaBytes (GB) of memory (RAM). The TSC cluster nodes are connected with a proprietary Quadrics Interconnection (IC) network. For more information on the compute nodes, see the nice Compaq slide show of N. Srivastava:

AlphaServer SC System Overview and Development Environment. Access may require your AFS password (same as your original one if you have not changed it. Also, you need MS PowerPoint to diplay the slides, but you can a free display only version from MicroSoft by checking a web search engine.

TCS Benchmark Performance.

The PSC TCS, installed at PSC in April 2001, ranks as the 246th top computer in the world (Top 500 Computer Reports, November 2002, Source: http://www.top500.org) and has a theoretical asymptotic peak speed of R_max = 264 GigaFlops (GF) on LINPACK linear algebra benchmarks, with Hockney Linear Model (see MCS572 class notes) parameters of asymptotic peak speed R_peak = 342 GF (also called Rinfinity) and N_1/2 = 20,000, with maximum order run Nmax=106,000 given at the web link above, or see the class summary

MCS 572 Selections from Cluster Sublist From November 2002 Data.

TCS Memory Units.

The random access memory (RAM) is globally shared 4GB memory on the 4 processor nodes, but distributed memory or 256 GB with respect as a cluster of 64 nodes, so is has a hybrid memory system globally as a 256 processor system. The processors or CPUs each have a 8 MB L2 cache memory (level 2 local memory).

TCS Operating System.

The operating system is the Compaq Tru64 UNIX V5.1A (Rev. 1885). However, since compilation and execution TCS is by remote batch scheduling, the user uses a combination of the Compaq Portable Batch System (PBS) and the UNIX Network Queueing System (NQS), the user should refer to subsections on those topics. See

PSC UNIX overview

TCS Login Shells.

The operating system environment is set by a UNIX shell and the default shell on the PSC TCS is the C-Shell. The shell can be changed, but that is not recommended, by the Change Shell command "chsh" and has the format:

chsh [pscuser] [shellpath]

where the Shell Path "[shellpath]" can be found with system "which" in the format:

which [shell]

where "[shell]" is the standard system shell "sh", Bourne again shell "bash", the Korn shell "korn" and others. However, all of the NQS QSUB job scripts given here assume the C-shell which uses the resouce configuration file ".cshrc" which resides in the user's home directory and can be used to define commands and make aliases (format: "alias [aliasname] [aliasdefinition]", in cases of special command characters quotation marks are needed.). A sample of a ".cshrc" file for use on the TCS is

.cshrc.tcstemplate

TCS-UIC Login Access.

Users MUST access the PSC TCS directly using the Secure Shell (ssh), such as from UIC `icarus' or from department systems,

ssh tcs.psc.edu -l [PSC-login-name]

ssh [PSC-login-name]@tcs.pcs.edu

If your computer system does not have this secure form, you will have to find one that does, like the UIC student computer server icarus.uic.edu since every student should have a UIC netid. If ssh has difficulty with the Unix ".ssh/known_hosts" (will differ on other platforms) then edit the file by deleting the entry for the node that is giving the problem since the ssh key may be expired and try ssh command again.

SSH works like the Unix remote login command `rlogin', but encrypts your password so that it is nearly impossible to steal. The commands `rlogin' or `telnet' do not work with the `tcs', resulting in the response "tcs.PSC.edu: Connection refused". See "man ssh" for help from the UNIX manual pages.

SSH is a UNIX command found on may UNIX systems, but you can get a free MS Windows version that comes in two main flavors:

PuTTY Download Page. PuTTY is a free Win32 Telnet/SSH Client that is fairly easy to install and can be combined with
- PSCP (an SCP client, i.e. command-line secure file copy)
- PSFTP (an SFTP client, i.e. general file transfer sessions much like FTP)
- And other commands.
TTSSH and Teraterm Software Downloads. The secure shell TTSSH works with the telnet terminal software Teraterm, so both are needed and the links are given on the cited page.

TCS-UIC File Transfer.

Users MUST do their file transfer bewteen the PSC TCS and UIC using the Secure Shell (ssh) commands such as secure copy scp or secure FTP sftp. For example, from UIC

SCP Secure Copy:

scp [uic-file] [tcs-user]@tcs.psc.edu:[tcs-file]

or from PSC TSC

scp [tsc-file] [uic-User]@[node.dept].uic.edu:[uic-file]

This form of the command works well for a single file, which can also have a directory path, but the user password has to be given each time. For multiple files a wild card version can be use, e.g., for all C files omitting the target file name from PSC:

scp *.c [uic-User]@[node.dept].uic.edu:[uic-directory]

SFTP Secure File Transfer Protocol: See "man scp" for help from the UNIX manual pages.

Also, you can use the secure File Transfer Protocol (FTP) called sftp that works like the usual FTP, except that you can not use any abbreviations of the FTP subcommands (e.g., use "put" and not "put"), but SFTP secures your session better. For example, from UIC,

sftp [tcs-user]@tcs.psc.edu

or from PSC TSC

sftp [uic-User]@[node.dept].uic.edu

Remark: If your username is the same at both UIC node and PSC node, then the "[username]@" is optional. See "man sftp" for help from the UNIX manual pages.

TCS File Systems.

HOME Directory:
Each PSC User has a home directory to keep files and subdirectories with the full path specified by "/usr/users/[n]/[username]" where [n] = 0:9. The home directory can be more simply referenced by the UNIX symbol ~ or the UNIX meta or environmental variable representation ${HOME} as in cd $HOME to change directory back to home or ls ${HOME}/mcs572 to list contents of a home subdirectory "mcs572" (note that the curly brackets are optional in the first example but required in the second example where "HOME" is followed by nonblank characters. Home directory quotas are 100MB (Mb?), but may not be enforced.
SCRATCH Directory:
Each user has a scratch or work directory "/usr/scratch/[nx]/[username]" where [nx] = 0:9 or [nx] = 0x:9x, and these directories are linked to the disks /scratch1/ or /scratch2/. The user's scratch directory can simply be referenced by the meta representation ${SCRATCH}, where the curly brackets are optional if ${SCRATCH} is used as a sole argument. It is strongly recommended that the scratch directory directory be used for scheduling batch jobs (essentially the only ones allowed) on the TCS cluster with the qsub queueing submit command including all necessary input files. Caution: On the tcs.html webpage the no longer existing $STAGE is incorrectly listed for the work directory.
LOCAL Directory:
Each TCS cluster computing node has global node memory accessible to all four of its processors and that memory is accessible to the user only when the user's code is executing, technically beginning with the qsub script required shell identification, e.g., "#!/bin/csh" escape to the C-shell. Hence, it should not be necessary to change the current directory to ${LOCAL}. However, the parallel run command prun needs seemly redundant "./[executable]" file. On the tcs.html webpage the no longer existing $TMPDIR is incorrectly listed for the compute node directory.
Remark: The commands "qsub" and "prun" are discribed more below.
FAR File ARchiver System:
The FAR system runs on golem.psc.edu and is accessible from TCS and Lemieux for large file storage for long periods of time. You will need information on the Andrew File System (AFS) and FAR special instructions to use it and more information on FAR is at
http://www.psc.edu/general/filesys/far/.
However, for the class, you will likely not be needing FAR.

TCS Programming Languages.

The TCS programs are compiled directly on the TCS, given here with some typical options, using the
Fortran90 Compiler:
f90 -O -lmpi -lelan -arch ev67 -lm -o [executable] [source].f
or the
C Compiler:
cc -O -lmpi -lelan -arch ev67 -lm -o [executable] [source].c
or the
C⁺⁺ HP Compiler:
cxx -O -lmpi -lelan -arch ev67 -lm -o [executable] [source].c
See "man cxx" for help from the UNIX manual pages or http://h30097.www3.hp.com/cplus/cxx_ref.htm
In the above compilation commands, the opitons are

"-O": a typical level of optimization for that compiler (see the Quick Reference Card for more information on other optimization options:
Alphaserver SC Programmer's Quick Reference Guide.),

"-lmpi" or "-l mpi": references the Message Passing Interface (MPI) Library that is called by Fortran or C compilers so that the code can execute in parallel, permitting the use of MPI parallel programming in the code. In addition to the MPI Library option, the code itself must include the MPI header directives in the code preface (code beginning), "include 'mpif.h'" for Fortan 90 and "#include <mpi.h>" for the C family of programming languages. Both the MPI Library and MPI Include statements are needed.
"-lelan": allows parallel communication between the compute notes by linking to the ELAN Library.
"-arch ev67": allows the parallel execution to be tuned to the TCS AlphaServer chip in the EV40/EV67 series from Hewlett Packard (HP).
"-lm": allows the use of the UNIX Math Library in conjunction with the math.h header file (if you are not using math functions, then you do not need them, but you need both library and header if you do).
"-o [executable]": names the output executable object file "[executable]", unless this option is missing and the excutable is given the generic default name "a.out". Execution of the executable is by the massively parallel envelope run command prun.

PRUN Parallel Run Command:
prun -N [Number_Nodes] -n [Number_Processors] [executable] < [data]
where "< [data]" means the data file is directed into standard UNIX input. An executable can not run in parallel without "prun". Usually, the number of nodes "[Number_Nodes]" and the number of processors "[Number_Processors]" are specified by the local meta environmental variables, ${RMS_NODES} and ${RMS_NODES}, respectively, since both must be initially set by a PBS statement in the QSUB script or in the options of the "qsub" command, which automatially initialize the meta variables. See "man prun" for help from the UNIX manual pages.

TCS Batch Queueing Systems: PBS and NQS with MPI.

NQS Job Scripts:
Remote job scheduling on the TCS is accomplished by using the UNIX Network Queueing System (NQS) job scripts, but the script directives use the so-called Portable Batch System (PBS) Directives used on the HP Alphaservers, in place of the usual NQS Directives. A sample PSC TCS 4 processor target job script for C code is given in

cpgm4.job

and a one for PSC TCS 4 processor script for Fortran 90 code is given in

fpgm4.job

The new user should study these sample job scripts and others listed on the class homepage:

MPI Code and Job Script Examples

Executable Job Scripts:
Before any job script can be used as an argument of the qsub the job script must be made executable for all, e.g., using the UNIX change mode command:

chmod 755 cpgm.job
         or
chmod a+x cpgm4.job
     for C Languages
         Else
chmod 755 fpgm4.job
         or
chmod a+x fpgm4.job
       for Fortran 90
where in the second from, the files should already be readable (r).

NQS qsub Submit Command:
These job scripts are run with the NQS QSUB submit command from the user's `${SCRATCH}' scratch directory, for example,

qsub cpgm4.job
   for C Languages
       or
qsub fpgm4.job
   for Fortran 90
where `${SCRATCH}' denotes the meta-name of the user's scratch directory on the TCS cluster. See "man qsub" for help from the UNIX manual pages.

NQS qstat Status Command:
The job status can be checked by the NQS QSTAT status command:

qstat -u [tcs-username]
and when done, the user can view the output if any. Under the table heading called "S" ,e.g., "Q" means that the job is queued waiting to run, "R" means running, and "E" means exiting. See "man qstat" for help from the UNIX manual pages.

NQS qdel Delete Command:
If for any reason you need to kill the job before the end, first note the job id number `[job_id]' at the beginning of your job line in the "qstat -u [tcs-username]" output, then enter the command:

qdel [job_id]
which should stop a running job, unless the system is busy. See "man qdel" for help from the UNIX manual pages.

Job Script Examples:
A user can try out the class sample NQS QSUB job scripts by down loading and copying one of the following sample codes

Pi calculation for any number of processors and MPI, but needs cdata:

pi_mpi.c, C version;
pi_mpi.f, F90 version;
cdata,sample data file for Pi code that also works as a dummy data file for the Trap and Laplace Code below;

Trapezoidal Rule calculation for any number of processors and MPI:

trap_mpi.c, C version;
trap_mpi.f, F90 version;

Laplace Equation Interation for 4 Processor - 4 Subdomain MPI:

lap4.c, C version;
lap4.f, F90 version;

to your home directory and then recopying it, say "[ExampleCode].c" or "[ExampleCode].f" to the recyclable source file of the form `*pgm.*' as follows:
cp [ExampleCode].c    cpgm.c
         or
cp [ExampleCode].f    fpgm.f
for C or F90, respectively.
The user will also have to create a simple input data file called "cdata" or use the Pi Code example data file for the qsub scripts since the script are written to take a data file as standard input, (e.g., using the editor "vi" to revise the set of integration points in cdata, terminated by zero) into the input data file; then in the home directory entering the queue submit command for 4 processors on a single node:
qsub cpgm4.job
         or
qsub fpgm4.job

then check for a finished job with "qstat -u [psc-username]" until the your queue record no longer is displayed, finally looking for the standard output and standard error files, for example "ls -l *pgm4.output *pgm4.error". You can always modify the sample job scripts to suit your particular job requirements, your own file naming preferences or if you prefer to open and close files in the code by hand.

TCS Message Passing Interface (MPI) Sources.

MCS 572 Class MPI webpages:

General MPI Information, e.g., the SDSC or ANL sites are very good, but there does not seem to be much at PSC.
MCS572 MPI Example Page.

PSC MPI Basics:

MPI Basics: Parallel Programming Techniques

Cray native SHMEM communication library also available, but is optimized between nodes like ELAN only and not within a node:

SHMEM (Logically Shared, Distributed Memory (SHMEM) Routines).

OpenMP is supported in Tru64 UNIX for C and Fortran, but not C++:

OpenMP Resource Page.

More TCS Information.

For TCS information from PSC, see

Using the PSC TCS prototype
Using the PSC TCS lemieux
A Visualization Subsystem for the PSC TCS, J. Welling, C. Gribble and J Vasak, U. Utah.

Guide Notation.

This local-guide is meant to indicate ``what works'' primary for access from UNIX systems to PSC TCS. The use of the Unix C-Shell on the TCS is assumed throughout most of this local guide.
UNIX is a trademark of AT&T.
Computer prompts or broadcasts will be enclosed in double quotes (``_''), background comments will be enclosed in curly braces ({_}), commands cited in the comments are highlighted by single quotes or double quotes depending on emphasis (`_') or ("_") {do not type the quotes when typing the commands}, and optional or user specified arguments are enclosed in square brackets ([_]) {However, do not enter the square brackets.}. The symbol (CR) will denote an immediate carriage return or enter. {Ignore the blanks that precede it as in `[command] (CR)', making it easier to read.} The symbol (Esc) will denote an immediate pressing of the Escape-key {Use no brackets please.} The symbol (SPACE) will denote an immediate pressing of the Space-bar {Warning: Do not type any of these notational symbols in an actual computer session.}
Return to TABLE OF CONTENTS?

Background References

For further information, please consult the sources (you can just click on the highlighted topics to access if you are surfing the world Wide Web):

Professor Hanson's MCS 572 Introduction to Supercomputing Home Page provides a large variety of links to useful supercomputing information.
Pittsburgh Supercomting Center (PSC) Home Page on the World Wide Web permits the direct search of the PSC public web information directories.
Resources Available to PSC Users Page.

Pittsburgh Supercomputing Center Hardware.

TCS Prototype Cluster (MCS 572 class cluster).

Lemieux TCS Final Cluster.

Terascale Computing System (TCS) FAQ.

TCS Compaq AlphaServer SC System Overiew, HP Workshop notes (you may need an AFS password for these nice PowerPoint Slides and will need a MS PowerPoint Viewer forMS Windows).

Kai Hwang's "Basic Network-Based Cluster Computing,".

Raj Buyya's Cluster Computing Information Center
Rajkumar Buyya (editor)and Hai Jin (slide author), High Performance Cluser Computing: Architures and Systems -- Lecture Notes for selected chapters ( MS PowerPoint Viewer needed, but available free on-line).

Raj Buyya's Trends in Cray Supercomputer versus Killer Micros.

Raj Buyya's "Cluster Computing Architectrue,".

PSC UNIX overview.

Compaq DTKS C Overwiew for TCS.

HP Fortran User Manual for Tru64 UNIX for TCS.

Getting Started With MPI: A Message Passing Interface for Parallel Programming: An Introduction to MPI at SDSC.

MCS572 MPI General Information Page.

MPICH Reference Card

MCS572 TCS Cray MPI Example Page.
OpenMP Resource Page.
man [command] (CR), when invoked in a UNIX-like system such as UNICOS, produces an on-line listing of the manual pages on the command [command], or similar function.
Consultation concerning problems related to using the Crays can be obtained from Professor Hanson {718 SEO, X3-2142, hanson@uic.edu}. It is recommended that Professor Hanson contact TCS consultants for this class, if they are necessary.

Return to TABLE OF CONTENTS?

MPI Message Passing Programming on TCS.

MPI or Message Passing Interface is a library of subroutines in Fortran (procedures in C) that facilitate message passing form of parallel programming in a distributed computer or network environment. At NPACI, MPI is especially useful for writing parallel programs for the Cray T3E (T3E) massively parallel processors. Eventually, MPI will replace PVM, but currently there is more information about PVM than for MPI. MPI is more abstract and complicated than PVM, since a lot of the features of MPI are hidden behind its functions and its own compile and execution commands. For relevant information on MPI, consult the following pages, especially the example page:

General MPI Information: Introductory Selections for MCS572
MPI Web Man Pages.
MCS572 MPI_Reduce Help Page.
ANL Tutorial material on MPI available on the Web, by Bill Gropp et al at ANL.
Getting Started With MPI: A Message Passing Interface for Parallel Programming.
Using SHMEM on the CRAY T3E: Cray Native Shared Memory Message Passing Library.

Return to TABLE OF CONTENTS?

UNDER RECONSTRUCTION
UNIX Command Dictionary.

UNIX Log In and Out Commands.
UNIX Information Commands.
UNIX C Language Commands.
UNIX makefile Commands.
UNIX Directory Commands.
UNIX File Commands.
UNIX Pipe and Redirection Commands.
UNIX Mail Commands.
UNIX Control-Key Commands.
UNIX Terminal Environment Commands.
UNIX Process Commands.
UNIX Editor Commands.

The ex Editor.
The vi Editor.

Return to TABLE OF CONTENTS?

UNICOS T90 Fortran90 (f90) Compile, Load and Execution Commands

f90 -r3 -[other options] [source].f [other source files] (CR) : Compiles source file `[source].f' and `[other source files]' with the Cray level 3 report compiler option `-r3' both with the default full optimization (`noaggress bl noinline recurrence norecursion scalar vector ....... nozeroinc'), producing an object file `[source].o' and compiler annotated listing file `[source].l' with vectorization information:

Marking Meaning S scalar loop optimization (major marker) V vector optimization (major marker) P Parallel optimization (major marker) Vs short vector optimization W unwound (major marker) {short inner-most loops with trip counts of not more than 5 are collapsed or transformed to single statements so that the next inner-most loop can be vectorized provided there are no dependencies} b bottom loading {pre-fetching is used for the next iteration of scalar loops, only and `-o nobl' kills it} c conditionally vectorized, {subject to run-time determination of recurrence vector length} k kernel scheduling i unconditionally vectorized with IVDEP r loop unrolling {a set of loop iterations is collapsed into one iteration that has been enabled by the `-e' enabling option with its `m' loop marking sub-option} D delete loop
Use `-emx' in place of `-em' if you want a cross reference listing also. Use the `-b [binfile]' option to name the object file with a name other than the default `[source].o' name. Use `-o aggress' to turn on a more aggressive form of optimization, but be careful of the results. Use `-o inline' or `-I [inline-source]' to get inlining of subprograms to avoid their overhead. Use the compiler directives `NORECURRENCE' or `IVDEP' and `RECURRENCE' to turn off and on the optimization of loop recurrences. Use `-o recursion' to enable subprograms to be recursive. Use `-o zeroinc' if zero increments of do loops indices or constant increment variables (CIV) are used, because the default assumes there are none. Use `segldr' command to load the execution module, which then can be used to execute the program. See below and the last section for more on the options. It is much better to use makefiles for such commands.

f90 -eS [source].f (CR) : Creates a Cray Assembly Language (CAL) file or calfile named `[source].s' for the Fortran program `[source].f' that can be used with the Cray Assembler or to determine how the Cray compiler has carried out the optimization, particularly how it has used the vector registers. The option `[name].s' can be used to name the calfile with something other than the default name. No object or binary file `[source].o' is produced, and a nasty message will be given instead.
f90 -g [source].f (CR) : Compiles the f90 and generates a symbol table for the debugger, like `cdbx' (use `man cdbx'). See also `-G debug_lvl', where `-G 0' is the same as `-g'.
segldr -o [executable-file] -l [library-list] [source].o (CR) : This segment loader links and loads the object module `[source].o' from the `f90' step into the execution module named `[executable-file]' by the `-o' option. Without the `-o' option, the executable is the standard `a.out' file. The library option may not be needed because many libraries are searched by default: Pascal (libp.a), I/O (libio.a), utility (libu.a), Fortran (libf.a), C (libc.a), Math (libm.a), and Science (libsci.a). Numerical Recipes in Fortran or C of Press et al. are not directly available in UNICOS.
f90 [-options] -o [executable] [source].f (CR) : The `f90 -o [executable]' command combines both `f90' compile and `segldr' load functions in one command; e.g.,
f90 -limsl [source].F (CR) : This Fortran90 parallel form is for using the IMSL mathematical and statistical library; if more than one processor is used, then `setenv NCPUS [nn]' must be executed first with `[nn]' is number of CPU's requested. For more information, click on:
IMSL Software at NPACI
To find out what other special software is at NPACI click on: NPACI Installed Software

[executable-file] < [input-file] > [output-file] & (CR) : Executes the executable module taking input from the file `[input-file]' and redirecting output to `[output-file]' as a background process.

Return to TABLE OF CONTENTS?

UNICOS C Language Commands

cc -o run [file].c (CR) : Compiles source [file].c, using the standard C compiler `scc2.0' and producing an executable named run. In place of `cc', use `scc3.0' or `scc' for the latest version of standard C or `pcc' for portable C.
cc -c [file].c (CR) : Compiles source [file].c, using the standard C compiler `scc2.0' and producing an object file named [file].o.
cc -hnoopt -o run [file].c (CR) : Compiles source [file].c, using the standard C compiler `scc3.0' and producing an executable file named run without scalar optimization or vector optimization while `hopt' enables scalar and vector optimization, Some other optimization related options are `-hinline' for inlining while `-hnone' is the default no inlining, `-hnovector' for no vector (vector is the default), and `-h listing' for a pseudo-assembler (CAL) listing. Some standard C options are `-htask3' for automatic parallelization (autotasking in "crayese") and `-hvector3' for more powerful vector restructuring. Other `-h' suboptions are `ivdep' for ignore vector dependence, `-hreport=isvf' generates messages about inlining (i), scalar optimization (s) and vector optimization (v), and `-hreport=isvf' writes same messages to `[file].v'. A commonly used form will be
cc -o run -h report=isvf [file].c (CR)
See `man cc' or `docview' for more information.
#define fortran : Form of C header statement to permit the call to a fortran subroutine from a C program. For example:
#include <stdio.h> #include <fortran.h> #define fortran main() { fortran void SUB(); float x = 3.14, y; SUB(&x, &y); printf("SUB answer: y = %f for x = %f\n", x, y); }

#pragma _CRI [directive] : Form of C compiler directive placed within the C code, where some example directives are `ivdep' for ignoring vector dependence, `novector' for turning off the default vectorization, `vector' for turning it back on, `inline' for procedure inline optimization, `shortloop', `noreduction', `getcpus [p]', `relcpus', `parallel ........', and `end parallel'. See `vector directives' for instance in `docview' for more information and examples.
segldr -o [executable-file] -l [library list] [source].o (CR) : This segment loader links and loads the object module `[source].o' from the `f90' pure compile step into the execution module named `[executable-file]' by the `-o' option. Without the `-o' option, the executable is the standard `a.out' file. The library option may not be needed because many libraries are searched by default: Pascal (libp.a), I/O (libio.a), utility (libu.a), Fortran (libf.a), C (libc.a), Math (libm.a), and Science (libsci.a). Numerical Recipes in Fortran or C of Press et al. are not directly available in UNICOS.
[executable-file] < [input-file] > [output-file] & (CR) : Executes the executable module taking input from the file `[input-file]' and redirecting output to `[output-file]' as a background process.

Return to TABLE OF CONTENTS?

UNICOS Performance Commands

Cray Prof Profiling Facility:

Cray Error Explaining Command:

explain [error-message-code] (CR) : Elaborates on the command error message '[error-message-code]' for many commands; use `man explain' for a complete list.

Cray Job Accounting (ja) Command:

ja (CR)
{[}executable] (CR)
ja -csf (CR) : This command sequence enables Job Accounting storing the information in a file of the form `.jacct[jobid]', with options `c' giving a command report, `f' giving a command flow report, `s' giving a multitasking breakdown summary report. Note that the NPACI service unit charges are approximately one cpu hour on the T90 and one element hour on the T3E, assuming average memory (about 16MW) usage. Caution: In general, parallel processing on the YMP series like the T90 is very expensive.

Cray Perftrace (perf) or Performance Trace Facilities:

f90 -ef [source].f (CR)
segldr -l perf [source].o (CR)
a.out > [source].perf (CR)
segldr - l perf perf[n] [source].o (CR)
a.out >> [source].perf (CR) : Compiles the FORTRAN 77 program `[source].f' for use for the Cray Perftrace or Performance Trace facilities. (Flowtrace results are similarly found in the output of the executable file executed after loader statement.) The library suboption here is `perf' for referencing the libperf.a library, which has several levels, where `[n]' is the level `', `1', `2' or `3'.

Cray Hardware Performance Monitor (hpm):

hpm -g[n] -d [executable] > [source].hpm[n] (CR) : Simulates the Hardware Performance Monitor with `[executable]' and level `l' = `0' (scalar activity), `1' (hold issue conditions), `2' (memory use), or `3' (instruction and vector operations). The option `-d' means that a dedicated machine is simulated.

Cray JumpTrace (jt) and JumpView (jumpview): JumpTrace and JumpView help gather performance statistics in the form of a report. Some use examples are:

Fortran Example:
f90 -ef [pgm].f jt ./a.out jumpview

C Example:
cc -ltrace -Gp [cpgm].c jt ./a.out jumpview -Luch >[cpgm].listing

JumpView Main Menu:
----------------------------------------------------- MAIN MENU 1 Master Summary | 7 List by Average Time/Call 2 Routines: List by Time | 8 Operating Environment 3 List by Megaflops | 9 Long Report by Routine Name 4 List by In-Line Factor | 10 Detail Report by Symbol 5 List by Name | 11 Detail Report by Block 6 List by Calls | 12 Options ---------------------------------------- H HELP Q QUIT Enter Number/Letter of Action Desired ---------------------------------------------------------------

Cray Autotasking Expert Performance System (atexpert):

atexpert [options] (CR) : Autotasking expert performance system, needing X-windows display for full power. See also `atchop' and `atscope'.

Return to TABLE OF CONTENTS?

UNICOS makefile Commands

make [-options] [step-name] (CR) : Makes the files [files] according to the template in the `makefile'. E.g., the file `makefile.unicos_2':
# Use ``make -f make.unicos_2 mrun>& pgm.l &; run<data>out''. SOURCES = pgm.f OBJECTS = pgm.o FLAGS = -em mrun : $(OBJECTS) segldr -o run $(OBJECTS) .f.o : f90 $(FLAGS) $*.f
{CAUTION: The commands, like `segldr' or `f90', must be preceded by a `Tab-key' tab as a delimiter, but the tab will not be visible in the UNIX listing.}
fmgen -m [make-name] -c f90 -f [-flag] -o [executable] [source].f (CR) : Automatically generates a makefile for compiling under the `f90' compiler and loading up the executable file named `[executable]'. Invoke with `make -f [make-name] [executable](CR)' and the execute `[executable]'. Also produces steps for profiling, flow-traces, performance traces, and clean-up, in the heavily documented makefile. For example, `make -c f90 -f -r3 -o run pgm.f (CR)' produces a makefile named `makefile', executable named `run', an information listing named `[name in program statement].l' with loops marked by optimization type, etc.; the making is done with `make run (CR)'. Caution: the makefile only uses the source name only when that coincides with the name used in the Fortran `program' statement and only one type of `f90' flag can be used. These flaws can be corrected by editing the resulting makefile `[make-name]'.

Return to TABLE OF CONTENTS?

UNICOS Mail Commands

mailx (CR) : Shows user`s mail; caution: `mailx' is close to the usual Unix mail, whereas the UNICOS `mail' command is NOT; use the subcommand `t [N](CR)' to list message number `[N]' , `s [N] mbox (CR)' to append message `[N]' to your mailbox `mbox' file or `s [N] [file](CR)' to append `[N]' to another file; `e [N] (CR)' to edit number [N] or look at a long file with `ex' {see Section on `EX' below}; `v [N] (CR)' to edit number [N] or look at a long file with `vi'; `d [N] (CR)' deletes {your own mail!} `[N]'; `m [user] (CR)' permits you to send mail to another account `[user]'; a `~m [N] (CR)' inside the message after entering a subject, permits you to forward message `[N]' to `[user]', `\d (CR)' to end the new message {see the send form below;`x' quits `mailx' without deleting {use this when you run into problems}; and `q (CR)' to quit.
mailx [user] (CR) : Sends mail to user `[user]'; the text is entered immediately in the current blank space; carriage return to enter each line; enter a file with a `~r[filename] (CR)'; route a copy to user `[userid]' by `~c[userid] (CR)'; enter the `ex' line editor with `~e (CR)' or `vi' visual editor with `~v (CR)' (see Sections on EX and on VI) to make changes on entered lines, exiting `ex' with a `wq (CR)' or `vi' with a `:wq' (CR)'; exit `mailx' by entering `\d (CR)'. {A bug in the current version of Telnet does not allow you to send a copy using the `cc:' entry.
mailx [name]@[machine].[dept].uic.edu < [filename] (CR) : Sends the UNICOS file `[filename]' to user `[name]' on some UNIX or other machine.

Return to TABLE OF CONTENTS?

UNICOS Network Queueing System (NQS)

qsub [options] (CR) : Submit a batch job to the queue; see `man qsub (CR)' for more information. The option, for example, `-lM [16Mw]' permits running jobs with up to 16 mega words of memory, for example. The option `[myjob].script' provides the script instructions for running a background job. Note that NPACI users must specify a script line
#QSUB -lM [memory-amount]
specifies a memory of `[memory-amount]' bytes for a job using `Mw' to denote mega words, instead of an option of `qsub'; and also required is
#QSUB -lT [CPU-time-amount]
specifying the amount of wall (user plus system) clock time in seconds. In addition, T3E users must also specify
#QSUB -l mpp_p=[t3e_procs],mpp_t=[t3e_time]
giving the number of T3E processors and time on the T3E; and also
#QSUB -q mpp
giving the T3E queue name `mpp' (Caution: you must be in the `mpp' group to use this queue, but you can check it by the command
grep [username] /etc/group (CR)
on the T90, whereas the default queue is `batch'.
For more information about batch processing with NQS, click on:

Using the CRAY T90
Using the CRAY T3E

qstat [options] (CR) : Display status of queued batch jobs; see `man qsub (CR)' for more information.
/mpp/bin/mppstat (CR) : Not an NQS command, but displays the current T3E configuration and the number of available processors (PEs).
/usr/local/adm/access/bin/qstatmpp (CR) : Not an NQS command, but displays the currently queued T3E jobs.

Return to TABLE OF CONTENTS?

T90 Fortran90 (f90) and other Extensions

For optimization, it is recommended that your f90 program aid the f90 vector model, i.e. structure the code so that the compiler can automatically recognize as vectorizable. Usually only inner most loop is vectorizable. Avoid loop GOTOs and IFs. Avoid CALLs within loops. Avoid loop READs and WRITEs. Use vectorizable functions. Avoid data dependencies. Use compiler directives, such as `!DIR$ VECTOR' and `!DIR$ NOVECTOR'. Minimize vector strides. Tune code to Fortran column-wise environment in the physically linear memory. Don't even think about using tabs, except in makefiles.

T90 Fortran90 (f90) Compiler Options

See also Section ``Execution of Cray T90 Fortran90 (f90)'' and Subsection ``T90 UNICOS f90 Compile, Load and Execution Commands''. Also see the appropriate sections, `docview' and `man cc' for items on Cray Standard C.

T90 Fortran90 (f90) Miscellaneous Extensions

``FORTRAN90 Array Notation'' {f90 allows Fortran90 extensions for array, making array statements like `AS =S', `C = A +B', `A(1:50) = B(1:100:2)' for appropriately dimensioned arrays AS, A, B and C, and scalar S (i.e., like AS(i,j) = S, for all i and j within subscript bounds); in general 'A([start]:[end]:[step])' references the single subscript array section for i = [start] to [end] in steps of [step]. Other examples are `a(i,:)' for the i-th row of array `a', `a(:,j)' for the j-th column, `a(1::2)' for the odd vector elements, `a(n:1:-1)' for the `n' vector elements of `a' in reverse order, and `z(1:n) = -log(z(1:n))' or `z(1:n) = ranf()'.}
real [variables-list] {The f90 `real' declaration declares variables and array elements as 32-bit (4-byte) words with only 23-bits allotted to the fraction for IEEE precision. This is somewhat different from the old non IEEE precision Cray where real meant an 8 byte or 64 bit real. Thus in f90 code, use the built-in functions `abs', `sqrt', `exp', `amax1' and so forth. The IEEE precision f90 `double precision' declaration is 64-bit with a 54-bit fraction, and hence is entirely different from old Non-IEEE precision Cray `double precision'.}
POINTER (P,A) {The f90 `pointer' statement declares that the declared integer (usually) variable holds (points to, for C-fans) the shifted initial (base) address of the declared array A.}
``Execution Time Allocation'' {f90 allows execution time storage of temporary arrays within subprograms, rather than at compile time; means that f90 will be less sensitive to array bounds over-runs.}
open ([unit],file=`[fn]',status='unknown') {Format of f90 OPEN statement assigning unit number [unit] to filename [fn]; place in program after declarations; [unit] = 5 defaults to UNIX `stdin' as does [unit] = * for read statements or reads from the terminal unless it is redirected by an `open' or a `lt;'; [unit] = 6 defaults to UNIX `stdout' as does [unit] = * for write statements or writes to the terminal unless it is redirected by an `open' or a `>'; [unit] = 0 defaults to UNIX `stderr' or writes diagnostics to the terminal unless it is redirected by an `open' or a `>&'; note that file names are placed in quotes in the OPEN statement; see also `man' for UNICOS `assign' and `env' statements. }
save [variable or array name list separated by commas] {The save statement is essential in f90 subroutines to save parameter variable values for later calls to a subroutine; the `-ev' option of f90 provides a better solution to this problem; if not used can lead logic errors, especially for users accustomed to F66 Fortran in which variables are saved after the RETURN statement is executed, but lost in f90.}
recursive [function or subroutine]([subprogram arguments]) {The 'recursive' prefix is required on subprograms called recursively, but also the recursive suboption is needed in the compiler statement.}
[statement] ! [embedded comment] {The line embedded comment is now legal in Cray Fortran.}
intrinsic [f90-function1][,[f90-function2]] {An Intrinsic function is needed in `f90' to declare any Fortran90 intrinsics, such as ANY, DOT_PRODUCT, MAXVAL, RESHAPE, ALL, EOSHIFT, MINLOC, SPREAD, COUNT, FLOAT, MINVAL, SUM, CSHIFT, MATMUL, PACK, TRANSPOSE, MAXLOC, PRODUCT, UNPACK.}

Return to TABLE OF CONTENTS?

Fortran90 Array Construction Functions

PACK([array],[mask-array][,[vector]]) {Transforms (packs) the array `[array]' into a vector `[vector]' (an optional argument, which if not present, the output goes to the value of the function) according to the true values of the `[array]'-conformable, logical mask `[mask-array]'. }
UNPACK([vector],[mask-array],[field-array]) {Transforms (unpacks) the vector `[vector]' into the array `[field-array]' according to the true values of the `[field-array]'-conformable, logical mask `[mask-array]'. }
SPREAD([array],[dim],[ncopies]) {Transforms (spreads) the source array `[array]' into the output value of the function with `[ncopies]' copies along the dimension `[dim]' (horizontal copies if `[dim]'=1 and vertical if `[dim]'=2. }
RESHAPE([array],[shape][,[pad]][,[order]]) {Transforms (reshapes) the source array `[array]' into the output value of the function with shape `[shape]' with order `[order]' padding the array `[pad]'. }

Fortran90 Array Reduction Functions

The reduction functions reduce the input to a scalar output.

SUM([array][,[dim][,[mask]]]) {The `SUM' function computes the sum of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2) according to the true values in the conditional mask `[mask]', if present. This function makes the Cray sum function the same as the Connection Machine version. }
PRODUCT([array][,[dim][,[mask]]]) {The `PRODUCT' function computes the product of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2) according to the true values in the conditional mask `[mask]', if present. }
MAXVAL([array][,[dim][,[mask]]]) {The `MAXVAL' function computes the maximum value of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2) according to the true values in the conditional mask `[mask]', if present. }
MINVAL([array][,[dim][,[mask]]]) {The `MINVAL' function computes the minimum value of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2) according to the true values in the conditional mask `[mask]', if present. }
COUNT([mask][,[dim]]) {The `COUNT' function computes the number of the true elements of the logical array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2), if present. }
ANY([mask][,[dim]]) {The `ANY' function computes if there are any true elements in the logical array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2), if present, and returns a logical true or false answer. }
ALL([mask][,[dim]]) {The `ALL' function computes if there are all true elements in the logical array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2), if present, and returns a logical true or false answer. }

Fortran90 Array Manipulation Functions

The manipulation functions rearrange the elements of the target matrix.

TRANSPOSE([array]) {The `TRANSPOSE' function transposes the 2-subscript array `[array]' with the result array of reversed dimensions. }
EOSHIFT([array],[shift][,[boundary][,[dim]]]) {The `EOSHIFT' function does an end-off shift on the array `[array]' along the dimension `[dim]' using the boundary value(s) `[boundary]' to fill in, if necessary. Caution: Connection Machine arguments have a different order. }
CSHIFT([array],[shift][,[boundary][,[dim]]]) {The `CSHIFT' function does a circular shift on the array `[array]' along the dimension `[dim]' using the boundary value(s) `[boundary]' to fill in, if necessary. Caution: Connection Machine arguments have a different order. }

Fortran90 Array Location Functions

The location functions find the location of elements of the target matrix.

MAXLOC([array][,[mask]]) {The `MAXLOC' function finds the first element of target array `[array]' having the maximum value, relative to the conditional mask `[mask]', if present. }
MINLOC([array][,[mask]]) {The `MINLOC' function finds the first element of target array `[array]' having the minimum value, relative to the conditional mask `[mask]', if present. }

Fortran90 Array Matrix Multiply Functions

The matrix multiply functions compute the matrix products of the target matrices.

MATMUL([array1][array2]) {The `MATMUL' function computes the matrix product of target arrays `[array1]' and `[array2]' commensurate for multiplication, with the result matrix of appropriate size. This function is also used for matrix-vector multiplication. }
DOT_PRODUCT([vector1][vector2]) {The `DOT_PRODUCT' function computes the scalar, dot product of target vectors `[vector1]' and `[vector2]', with the scalar result. Caution: the Connection Machine function is `dotproduct'. }

Fortran90 Array Functions TEST CODE

T90 Fortran90 (f90) Differences:

The following f90 code contains examples of use of many of the Fortran90 array intrinsic functions mentioned above. There are some rules:

Intrinsic statement is needed for all f90 intrinsics within f90 codes.
Constructors of the form b=(/1 2 3/) work with the f90 compiler.
Fortran90 array intrinsics used within f90 will take no auxiliary markers or keywords like "dim=" or "mask=".
array sections can not be used in print statements: NOT print*,b(1:3)
How do you sum an entire array only subject to a mask, but with no dimension restrictions?
If b = 1 3 5 logical mask=b.gt.3 2 4 6 then s3=sum(b,1,mask) or s2=sum(b,2,mask) work when real s3(3),s2(2) but isum=sum(b,mask) or isum=sum(b,,mask) or isum=sum(b,:,mask) do NOT work. That is how do I enter a scalar dim for the whole array?

Here is a sample T90 Fortran 90 code `pgm.f' = ` t90f90test.f' with many examples, heavily commented and followed by the actual output run on t90.npaci.edu using the commands
f90 -O3 -r3 -o run pgm.f& run>&pgm.out& %%%%%%%%%%% pgm.f=t90f90test.f %%%%%%%%% program f90test code98: compare ranf() and random_number pseudo random number generators code97: update by removing old comments to cmfortran code96: retest=f90test.f redone on borg = convex spp1200/xa-16 integer, parameter :: m = 6 integer, parameter :: n = 4 integer :: i,j integer, dimension(2) :: s2, ctr1, ctr2, ctr3, b2 integer, dimension(3) :: s3 ,at ,ar1 ,ar2 ,br1 ,br2 integer, dimension(4) :: as(4) integer, dimension(2,2) :: c ,bi integer, dimension(2,3) :: b, a integer, dimension(3,2) :: ct integer, dimension(3,4) :: cs integer, dimension(4,3) :: cst logical, dimension(2,3) :: test logical, dimension(64,64) :: inmask real, parameter :: tol = 0.5e-5 integer, parameter :: niter = 5000 real :: diffav real, dimension(8,8) :: us real, dimension(64,64) :: u , du real :: ranf, xran real, dimension(m,n) :: uniranf, uniran real, dimension(n,m) :: truniranf, truniran intrinsic sum,maxval,minval,product & ,dot_product,matmul,transpose & ,cshift,eoshift,spread data b/1,2,3,4,5,6/ !replace constructors initialization data as/2,3,4,5/ data at/2,3,4/ c --------------------Array Constructors: b(1,1:3) = (/1, 3, 5/) ! initialize first row, along dimension 2. b(2,1:3) = (/2, 4, 6/) ! initialize second row, along dimension 2. print*,'Note: constructors like "(/1,2/)" allowed in fc9.5' br1 = b(1,:) br2 = b(2,:) print60,br1,br2 60 format(' b(2,3)'/(3i3)) c --------------------Sum Function sum: isum = sum(b) ! => isum = 21; i.e., Front-End scalar. print61,' isum=sum(b)=',isum 61 format(1x,a36,i4) isum = sum(b(:,1:3:2)) ! => isum = 14; sole ':' means all values '1:2'. print61,' isum = sum("b(:,1:3:2)")=',isum bi=b(:,1:3:2) isum=sum(bi) print61,' isum = sum("b(:,1:3:2)")=',isum print*,'CAUTION: "dim=", etc., markers= NOT allowed in intrinsics' s2 = sum(b,2) ! redeclared with the correct array section shape. print62,' s2 = sum(b,2)=',s2 ! => s2 = (/9,12/), row sums 62 format(1x,a32,2i3) s3 = sum(b,1) ! => s3 = (/3,7,11/); column sums. print63,' s3 = sum(b,1)=',s3 63 format(1x,a32,3i3) print*,'CAUTION: "mask=" marker= STILL not allowed either.' s3 = sum(b,1,b.gt.3) ! => s3 = (/0,4,11/); i.e., conditional col sum print63,' s3 = sum(b,1,"b.gt.3") =',s3 test=b.gt.3 s3 = sum(b,1,test) ! => s3 = (/0,4,11/); i.e., conditional col sum print63,' s3 = sum(b,1,"b.gt.3") =',s3 s2 = sum(b,2,test) ! => s2 = (/5,10/); i.e., conditional row sum print62,' s2 = sum(b,2,b.gt.3) =',s2 cf8er:isum = sum(b,0,test) ! => isum = 18; i.e., add only elements cf8er:print61,' isum = sum(b,0,b.gt.3) =',isum ! that are greater than three. print*,' CAUTION: If "sum(array[dim[,mask]])", CANT use zero (0)' & ,' for [dim] for whole array when there is a mask.' c --------------------Maximum Value Function maxval: imax = maxval(b) ! => imax = 6; array maximum value. print61,' imax = maxval(b)=',imax s3 = maxval(b,1) ! => s3 = (/2,4,6/); column maximums. print63,' s3 = maxval(b,1)=',s3 s2 = maxval(b,2) ! => s2 = (/5,6/); row maximums. print62,' s2 = maxval(b,2)=',s2 c --------------------Minimum Value Function minval: imin = minval(b) ! => imin = 1; array minimum value. print61,' imin = minval(b)=',imin c --------------------Product Function product: s2 = product(b,2) ! => s2 = (/15,48/); products of column elements. print62,' s2 = product(b,2)=',s2 c --------------------Dot Product Function dot_product: idot = dot_product(br1,br2) ! => idot = 44; dot product of row print61,' idot = dot_product(b(1,:),b(2,:))=',idot ! vectors of b. print*,' CAUTION: Array syntax not allowed in actual arguments.' c --------------------Matrix Multiplication Function matmul: ! assuming array b of the previous section. ![Ans] = matmul([Array_1],[Array_2]) ! computes matrix multiplication ! of two rank two matrices. c = matmul(b(:,1:2),b(:,2:3)) ! => c(1,:)=(/15,23/);c(2,:)=(/22,34/). c=transpose(c) print623,'c=matmul(b(:,1:2),b(:,2:3))=',c 623 format(1x,a36/(2i3)) ![Ans] = transpose([Array]) ! transforms an array to its transpose. ct = transpose(b) ! => ct(1,:)=(/1,2/);ct(2,:)=(/3,4/);ct(3,:)=(/5,6/). ctr1 = ct(1,:) ctr2 = ct(2,:) ctr3 = ct(3,:) print623,'ct = transpose(b)=',ctr1,ctr2,ctr3 c --------------------Circular Shift Function cshift: ! assume b is again initialized as ! b = 1 3 5 ! 2 4 6 a = cshift(b,1,2) ! => a = 3 5 1 ! 4 6 2 cshift EG1: ar1 = a(1,:) ar2 = a(2,:) print633,'a = cshift(a,1,2)=',ar1,ar2 633 format(1x,a36/(3i3)) ! i.e., b(i,(j+shift) "mod" n) -> a(i,j) for j=1:2, etc.; ! nonstandard modulus fn: 0 "mod" n = n; 1 "mod" n = 1; ...; n "mod" n = n ! i.e., the result is computed from shifting subscript in specified ! dimension of the source array by the specified shift. a = cshift(b,-1,2) ! => a = 5 1 3 ! 6 2 4 cshift EG2: ar1 = a(1,:) ar2 = a(2,:) print633,'a = cshift(b,-1,2)=',ar1,ar2 ! i.e., b(i,(j+shift) "mod" n) -> a(i,j) for j=2:3, etc. cshift EG3: s2(1) = 1 s2(2) = 2 a = cshift(b,s2,2) ! a = 3 5 1 ! 6 2 4 ! i.e., an array-valued shift, or shift per row. ar1 = a(1,:) ar2 = a(2,:) print633,'a = cshift(b,(/1,2/),2)=',ar1,ar2 cshift Laplace Example: ! Jacobi Iteration for a 5-star discretization of ! 2D Laplace's equation: u = 0 u(1,:)=2 u(64,:)=2 u(:,1)=2 u(:,64)=1 inmask = .FALSE. inmask(2:63,2:63) = .TRUE. diffav = 1 iter=0 do while (diffav.gt.tol.and.iter.lt.niter) iter=iter+1 du = 0 where(inmask) du = 0.25*(cshift(u,1,1)+cshift(u,-1,1)+cshift(u,1,2) & +cshift(u,-1,2)) - u u = u + du end where du = du*du diffav = sqrt(sum(du)/(62*62)) end do ! which is the main program fragment of laplace.fcm. print*,'CAUTION: array sections not allowed in print' us = u(1:64:9,1:64:9) us=transpose(us) print66,'u = laplace-shift(u)= ; iter=',iter,'; av-diff =' & ,diffav,us 66 format(1x,a36,i5,a11,e10.3/(8f8.4)) c --------------------End Off Shift Function eoshift: a = eoshift(b,-1,0,1) ! a = 0 0 0 note default boundary value is 0. ! 1 3 5 ar1 = a(1,:) ar2 = a(2,:) print633,'a = eoshift(b,-1,0,1)=',ar1,ar2 s2=(/-1,0/) b2=(/7,8/) a = eoshift(b,s2,b2,2) ! => a = 7 1 3 ! 2 4 6 ar1 = a(1,:) ar2 = a(2,:) print633,'a = eoshift(b,(/-1,0/),(/7,8/),2)=',ar1,ar2 a = eoshift(b,2,0,2) ! => a = 5 0 0 ! => 6 0 0 ar1 = a(1,:) ar2 = a(2,:) print623,'a = eoshift(b,2,2)=',ar1,ar2 c --------------------Spread Function spread: cs = spread(as,1,3) ! contents of cs: ! 2 3 4 5 ! 2 3 4 5 ! 2 3 4 5 cst = transpose(cs) print64,'as =',as 64 format(1x,a32,4i3) print643,'cs = spread(as,1,3)=',cst 643 format(1x,a36/(4i3)) c -------------------- cs = spread(at,2,4) ! contents of c: ! 2 2 2 2 ! 3 3 3 3 ! 4 4 4 4 cst = transpose(cs) print63,'at =',at print643,'cs = spread(at,2,4)=',cst c --------------------------------------------------------------------------- ! i.e., b=spread(a,d,c) => ! a(n_1,n_2,...,n_(d-1),n_d,...,n_r) -> b(n_1,n_2,...,n_(d-1),c,n_d,...,n_r) ! where r is the rank of source array a and n_i is the size of dimension i; ! noting that a new dimension of size c is added before dimension d. c --------------------------------------------------------------------------- ! Initialize scalar xran with a pseudo random number call random_number(harvest=xran) call random_number(uniran) ! xran and uniran contain uniformly distributed random numbers truniran = transpose(uniran) write(6,65) xran, truniran 65 format(' f90 uniform random_number(): xran =',f14.10/ & ' and f90 subroutine random_number() uniform random array:' & /(4f14.10)) ! standard UNICOS random number generator ranf: do i = 1, m do j = 1, n uniranf(i,j) = ranf() enddo enddo truniranf = transpose(uniranf) write(6,651) truniranf 651 format(' UNICOS function ranf() uniform random array:'/(4f14.10)) stop end %%%%%%%%%%% end pgm.f=t90f90test.f %%%%%%%%%

Click here to get `t90f90test.f', T90 Fortran 90 code
Here is the output t90f90test.output:
%%%%%%%%%%% begin pgm.output = t90f90test.output %%%%%%%%% Note: constructors like "(/1,2/)" allowed in fc9.5 b(2,3) 1 3 5 2 4 6 isum=sum(b)= 21 isum = sum("b(:,1:3:2)")= 14 isum = sum("b(:,1:3:2)")= 14 CAUTION: "dim=", etc., markers= NOT allowed in intrinsics s2 = sum(b,2)= 9 12 s3 = sum(b,1)= 3 7 11 CAUTION: "mask=" marker= STILL not allowed either. s3 = sum(b,1,"b.gt.3") = 0 4 11 s3 = sum(b,1,"b.gt.3") = 0 4 11 s2 = sum(b,2,b.gt.3) = 5 10 CAUTION: If "sum(array[dim[,mask]])", CANT use zero (0) for [dim] for whole array when there is a mask. imax = maxval(b)= 6 s3 = maxval(b,1)= 2 4 6 s2 = maxval(b,2)= 5 6 imin = minval(b)= 1 s2 = product(b,2)= 15 48 idot = dot_product(b(1,:),b(2,:))= 44 CAUTION: Array syntax not allowed in actual arguments. c=matmul(b(:,1:2),b(:,2:3))= 15 23 22 34 ct = transpose(b)= 1 2 3 4 5 6 a = cshift(a,1,2)= 3 5 1 4 6 2 a = cshift(b,-1,2)= 5 1 3 6 2 4 a = cshift(b,(/1,2/),2)= 3 5 1 6 2 4 CAUTION: array sections not allowed in print u = laplace-shift(u)= ; iter= 4730; av-diff = 0.499E-05 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 1.0000 2.0000 1.9762 1.9479 1.9090 1.8491 1.7440 1.5208 1.0000 2.0000 1.9573 1.9068 1.8387 1.7387 1.5836 1.3402 1.0000 2.0000 1.9469 1.8844 1.8014 1.6836 1.5141 1.2817 1.0000 2.0000 1.9469 1.8844 1.8014 1.6836 1.5141 1.2817 1.0000 2.0000 1.9573 1.9068 1.8387 1.7387 1.5836 1.3402 1.0000 2.0000 1.9762 1.9479 1.9090 1.8491 1.7440 1.5208 1.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 2.0000 1.0000 a = eoshift(b,-1,0,1)= 0 0 0 1 3 5 a = eoshift(b,(/-1,0/),(/7,8/),2)= 7 1 3 2 4 6 a = eoshift(b,2,2)= 5 0 0 6 0 0 as = 2 3 4 5 cs = spread(as,1,3)= 2 3 4 5 2 3 4 5 2 3 4 5 at = 2 3 4 cs = spread(at,2,4)= 2 2 2 2 3 3 3 3 4 4 4 4 f90 uniform random_number(): xran = 0.5801136486 and f90 subroutine random_number() uniform random array: 0.9505127350 0.3056509439 0.0986253383 0.6938844384 0.7863714253 0.6891007107 0.2765484551 0.9344770142 0.2976202640 0.3826622387 0.6204460278 0.2120929553 0.4536999003 0.1329027055 0.0835029668 0.1306527482 0.0062619416 0.8318579032 0.9903771206 0.8625969805 0.2757364264 0.5829797958 0.9793469434 0.8189092940 UNICOS function ranf() uniform random array: 0.5407187129 0.0187994091 0.3141160167 0.7651821004 0.9415271082 0.2893071356 0.5849975196 0.9030257778 0.8866798463 0.4966670053 0.3964840582 0.8718218141 0.9311052262 0.5954839343 0.2096123584 0.8881281192 0.4641396487 0.6280308383 0.4467249313 0.4578495774 0.2349011311 0.7635970977 0.5911920675 0.4438340178 STOP executed at line 222 in Fortran routine 'F90TEST' CPU: 1.827s, Wallclock: 0.533s, 24.5% of 14-CPU Machine Memory HWM: 308988, Stack HWM: 37805, Stack segment expansions: 0 %%%%%%%%%%% end pgm.output = t90f90test.output %%%%%%%%%

Click here to get `t90f90test.output', corresponding T90 Fortran 90 output

Cray T3E f90 Differences:

Here is a sample code with many examples, heavily commented and followed by the actual output run on t3e.npaci.edu using the commands
f90 -O3 -r3 -Xm -o fpgm fpgm.f & mpprun -n1 fpgm >& fpgm.output & %%%%%%%%%%% pgm.f=cf97test.f %%%%%%%%%

Click here to get `t3e97test.f', T90 Fortran 90 code
Click here to get `t3e97test.output', corresponding T90 Fortran 90 output

f90 Library Functions

[variable] = ssum (n,a(m),k) {The optimized scientific library SCILIB function `ssum' computes the sum of `n' elements of array `a' starting from element `m' in steps of `k'; the equivalent but not optimal, do loop is {sum=0; do 1 i=m,n+n-1,k; 1 sum=sum+a(i)}; e.g. `sum=ssum(n,a,1)' returns the sum of the first `n' elements of the array 'a'; if `a' is an m X n 2-subscript array, use `t = ssum(m*n,a(1,1),1)'; use `man ssum' for more information; the `-l libsci.a' option in the `segldr' should be optional. UPDATE: In cf77 version 6.0, `ssum' has been replaced by the Fortran90 `sum' function.}
[variable] = sdot (n,a,1,b,1) {The optimized SCILIB function `sdot' returns the calculated value of the dot product of `n' elements of the vectors `a' and `b' in steps of 1; the `-l libsci.a' option of the `segldr' should be optional. UPDATE: In cf77 version 6.0, `sdot' has been replaced by the Fortran90 `DOT_PRODUCT' function.}
call mxm (a,m,b,kmax,c,n) {The optimized SCILIB subroutine returns the calculated value of the full matrix by matrix product of the `m X kmax' array `a' and the `kmax X n' array `b' into the `m X n' output array `c'; use `mxma' for multiplication of sub-matrices when the matrices are not full; use `man mxm' (ignore UNICOS function of the same name) or 'man mxma' for more information; the `-l libsci.a' option of the `segldr' should be optional. UPDATE: In cf77 version 6.0, `mxm' has been replaced by the Fortran90 `MATMUL' function.}
call mxv (a,m,b,n,c) {The optimized SCILIB subroutine returns the calculated value of the full matrix by vector product of the `m X n' array `a' and the `n' vector `b' into the `m X n' output vector `c', by rolling up the `j' loop; use `man mxv'; the `-l libsci.a' option of the `segldr' should be optional. UPDATE: In the T90 cf77 version 6.0, `mxv' has been replaced by the Fortran90 `MATMUL' function.}
call random_number([HARVEST=][variable]) {F90 Pseudo-random number generator on [0,1], as intrinsic subroutine rather than intrinsic function, that gets the first or next random number or array a stores it in the user output variable or array `[variable]'. For example:
real s, r(100,100) call random_number(harvest=s) call random_number(r)
See `man random_number' for more information, or `man rand_seed' for changing the random sequence. }
[random-variable] = ranf() {UNICOS Pseudo-random number generator on [0,1] that gets the first or next random number, e.g.,
real s, r(100,100) s = ranf() do i = 1,100 do j = 1,100 r(i,j) = ranf() enddo enddo
or use `r(1:n) = ranf()' in Fortran90 notation; use `x(1:n) = -log(r(1:n))' to convert to an exponential distribution; change the random generator seed using `call ranset([new-seed]), but it is not necessary to start with a seed; `ranf' is a great random number generator since it properly vectorizes in loops; use `man ranf' for more information, including use for C/C++ as `_ranf()' which requires the following include and declaration statements:
#include double _ranf(void);
}
call wheni[reln] ([nfind],[iarray],[inc],[itarget],[index],[nval]) (CR) {Finds all integer array (`[iarray]') elements in relation (`[reln]') to the integer target (`[itarget]'); `[reln]' = `lt', `le', `gt' or `ge'; `[n]' is the number of elements to be searched in increments of `[inc]'; `[index]' is the integer array of the indices of the output; and `[nval]' is the number in indices found.}

T90 Fortran90 (f90) Compiler Vector Toggling Directives

These statements are placed in the Fortran source just before the loop or other entity they are to effect, but they stay in effect until the opposite directive is given. However, for every toggling compiler directive that turns some action on, there is another directive with an `NO' prefix appended that turns that action off. The leading `C' must be in column 1 and a blank must be in column 6. For more information, see F90 Vol. 1: Fortran Reference Manual, Sect. 1.6 Compiler Directives.

!DIR$ VECTOR {Compiler directive causes all following inner DO loops to be vectorized unless the loop is known to have only one iteration and until superceded by another directive that alters vectorization.}
!DIR$ NOVECTOR {Directive turns off vectorization at next DO loop until turned back on.}
!DIR$ VSEARCH {Directive permits optimization of loops that can have a premature exit, as with convergence of an iteration. !DIR$ NOVSEARCH directive turns it off.}
!DIR$ INLINE {Directive turns on inlining, inline code generation, of subprograms if `-I' or `-o inline' f90 options are used; !DIR$ NOINLINE turns inlining off.}

T90 Fortran90 (f90) Compiler Scalar Optimization Directives

These directives effect scalar optimization at the point at which the directive appears and only affects the local program unit, such as the loop it appears in.

!DIR$ BL {Directive turns on bottom loading for loops, pre-fetching data for the next loop iteration; !DIR$ NOBL turns BL off.}
!DIR$ NOSIDEEFFECTS [subprogram-name] {Allows keeping data in registers across subprograms, if no global data (i.e., arguments of common blocks) are changed.}
!DIR$ SUPPRESS [variable-list] {Directive temporarily suppresses scalar optimization on variables in loops containing the directive.}

T90 Fortran90 (f90) Compiler Loop Directives

These compiler directives hold only for the loop immediately following the directive.

!DIR$ IVDEP {Directive causes compiler to Ignore Vector DEPendencies in only the next inner most DO. Disabled by the NOVECTOR directive.}
!DIR$ NEXTSCALAR {Directive causes only the very NEXT DO loop to be executed in SCALAR Mode with vectorization resuming if on. Disabled by the NOVECTOR directive.}
!DIR$ SHORTLOOP {Directive reduces vectorization overhead for the very next loop, presumed SHORT or has less than 64 iterations. Disabled by the NOVECTOR directive.}
!DIR$ RECURRENCE {Directive turns on vectorization of reduction loops (e.g., sum loops); !DIR$ NORECURRENCE turns it off. Disabled by the NOVECTOR directive.}

T90 Fortran90 (f90) Compiler Storage Directives

These compiler directives alter the way memory is handled.

!DIR$ VFUNCTION [external-function-list] {Directive declares Vector version of an external CAL FUNCTION, where CAL is the Cray Assembler Language, but the function can not be declared in an External statement; works with list of CAL functions separated by commas. See the f90 Vol. 1 for other restrictions. }
!DIR$ AUXILIARY [array-list] {Storage directive allows assignment to the secondary disk storage for the Cray Y-MP only.}

T90 Fortran90 (f90) Compiler Diagnostic Directives

These are used the same way as the vector directives.

!DIR$ BOUNDS [array-names] {Allows the checking of array subscript bounds, but inhibits vectorization; applies to all arrays unless particular one are listed as arguments.}
!DIR$ NOBOUNDS {Prevents checking of subscript bounds.}
!DIR$ FLOW {Turns on Flow-trace and `!DIR$ NOFLOW' turns it off.}

For more information on compiler directives and other f90 statements, refer to the `Cray Fortran (CFT) REFERENCE MANUAL'. Addition information on SCILIB functions can be found in the Cray Library Reference Manual, a copy of which is found in the UIC Supercomputing Support Office along with many other manuals.
Return to TABLE OF CONTENTS?

T90 Fortran (f90) Multitasking Options

The Cray supercomputers now have parallelization or tasking features in additions to vectorization features. However, the cost of running Cray Fortran is extremely large, because the user is charged for time on all processors utilized. In contrast, the user is not charged for each vector element with vectorization. HENCE THE USER SHOULD ONLY USE THE MULTITASKING FEATURES WHEN ABSOLUTELY NECESSARY.} Macrotasking refers to large grain or subroutine level parallelization. Microtasking refers to parallel loop optimization through compiler directives. Autotasking refers to automatic microtasking by the Fortran Preprocessor `fpp', i.e., through automatic code generation for multitasking. Compiler f90, preprocessor fpp and mid-processor fmp are currently version 4.0 at NPACI. More information is found on the NPACI Cray T90, in the directory `/usr/local/doc' files or subdirectories such as `cf77_50.rn' release notes or `unicos.7.0' sections.

A typical job accounting execution sequence might be
ja (CR)
${TMP}/[fn] < [data] > & [output] & (CR)
with the job accounting information appearing in a file of the form `.jacct[jobid]'. Including the pass through option `-Wd"-l [fn].ml" ' will also produce an fpp summary listing in `[fn].ml' (but no executable) with the markers `P` for autotasked, `V' for vectorized, `N' for not chosen or not optimized, adn `D' for data dependent.
f90 -O full -M [fn].f > [fn].m & (CR) : The `-M' option results in the intermediate Fortran file `[fn].m' with microtasking directives automatically inserted into the `[fn].f' source using the dependence analysis of the `fpp' preprocessor; no object or executable file is produced; the user can insert additional compiler directives into `[fn].m' and compile it with the Cray Fortran multitasking processor `fmp', the translator of the directives, by `fmp [fn].m > [fn].j (CR)'; the intermediate expanded file `[fn].j' is further assembled, linked and loaded by `sld -o [exec] [fn].j (CR)'.

Return to TABLE OF CONTENTS?

Return to TABLE OF CONTENTS?

Cray T90 f90 and cc Timing Utility Functions.

T90 Fortran90 (f90) Timing Utility Functions

[time-variable] = second() : The standard UNIX Fortran seconds timer utility, whose output value is user cpu time in seconds, as opposed to system ``cpu time'' and wall clock time (the sum of user and system times); also exists in the format `call second([time-variable])';for timing large loops, `second' overhead should be negligible; for most sizes of loops, the timing part of the code, with `!' marking comments on the statement line, might look like:

real tv(100),cputim() character*24 tchar(100) kt = 1 tv(kt) = second() ! first 2 calls get the overhead kt = kt + 1 tv(kt) = second() ! initial time code-continues ... more code ... code-continues kt = kt + 1 tchar(kt) = `loop [999]' tv(kt) = second() do [999] i = 1, [1000] code-continues ... rest of do .... code-continues 999 continue kt = kt + 1 tv(kt) = second() !tv(kt) - tv(kt-1) = do-cputime code-continues ... more do loops and more timing step pairs .... code-continues kt = kt +1 tv(kt) = second() !final time overhd = tv(2) - tv(1) !timer second overhead do [99999] ks =3, kt - 2 !cpu-time for each timed loop cputim(ks) = tv(ks+1) - tv(ks) -- overhd write(6,[99998]) ks, cputim(ks), tchar(ks) Comment: writes hinder vector optimization, so save writes until last 99999 continue 99998 format(1x,i3,' time =',f12.7,' for ',a) cputot = tv(kt) - tv(2) - (kt-2)*overhd Caution: due to overhead variability, total can be off for small job write(6,*) 'total cpu-time =',cputot

For timing small loops, put the small loop inside another loop that just does a large number of repetitions of the small loop, say N, then divide the time difference by N; use `man second' for other information.
[flag] = gettimeofday(&tp,&tzp); : C/C++ microsecond wall clock timer and timezong utility. Requires the special header include statement: `#include ' and that the following structures be declared:
struct timeval tp ; /* timeval is a structure with pointer name tp and having */ /* unsigned long tp.tv_sec giving seconds since Jan. 1, 1970 */ /* long tp.tv_usec giving microseconds */ struct timezone tzp; /* needed only for time zone data */
See `man gettimeofday' for more information and Cray T90 C Starter example: `t90startcc.c'.}
[time-variable] = tsecnd() : f90 task timer utility giving the cpu time for a task during multitasking.

cc Timing Utility Function

gettimeofday
Example C program using the gettimeofday function:

#include #include #define NTime 20 main() { /* Time variables */ struct timeval tp ; /* timeval is a structure with pointer name tp and having */ /* unsigned long tp.tv_sec giving seconds since Jan. 1, 1970 */ /* long tp.tv_usec giving microseconds */ struct timezone tzp; /* needed only for time zone data */ int gtod; long int tsecs[NTime], tmicrosecs[NTime]; long int ttot[NTime], ttotmoh[NTime]; float ts1, tt1, tu1, tu2, ts2, tt2, tu3, ts3, tt3; double ttotf; /* begin main code */ if (gettimeofday(&tp,&tzp) == -1) { perror("gettimeofday failed"); exit(1);} kt = 1; /* gettimeofday = Microsecond Wall Timer C function; */ /* WallTime = UserTime + SystemTime, Undecomposed; */ /* gettimeofday returns gtod = 0 if successful; */ /* tv_sec in secs since 1/1/70; */ /* tv_usec in added microseconds; */ /* tzp gives the timezone; */ gtod=gettimeofday(&tp,&tzp); tsecs[0] = tp.tv_sec; tmicrosecs[0] = tp.tv_usec; ++kt; gtod=gettimeofday(&tp,&tzp); tsecs[1] = tp.tv_sec; tmicrosecs[1] = tp.tv_usec; /* ...... MUCH DELETED CODE ........ */ /* ...... MUCH DELETED CODE ........ */ /* Clock: Elapsed Total Time: */ ++kt; gtod=gettimeofday(&tp,&tzp); tsecs[kt] = tp.tv_sec; tmicrosecs[kt] = tp.tv_usec; /* Total Elapsed Time Including Clock Overhead*/ ttot[kt] = (tsecs[kt]-tsecs[1])*1000000+(tmicrosecs[kt]-tmicrosecs[1]); /* Total Elapsed Time Minus Clock Overhead */ ttotmoh[kt] = ttot[kt] - (tmicrosecs[1] - tmicrosecs[0]); printf("\nIntermediate Raw Timing Output:"); printf("\ntmicrosecs[(0,1,kt)]=(%12d,%12d,%12d), in microseconds", tmicrosecs[0],tmicrosecs[1],tmicrosecs[kt]); printf("\ntsecs[(0,1,kt)]=(%12d,%12d,%12d), in seconds", tsecs[0],tsecs[1],tsecs[kt]); printf("\n(ttot[kt],ttotmoh[kt])=(%12d,%12d), in microseconds", ttot[kt],ttotmoh[kt]); if (ttot[kt] < 0){printf("\n Error:Negative Times:Bad Clock:Rerun Job\n");} ttotf = ttotmoh[kt]/1.e6; printf("\n T90 Starter C Problem Output"); printf("\n Timing Output:"); printf("\n final total time=%12.4e, in seconds\n",ttotf); /* Change: Extra output statements: */ }

Table of T90/T3E Timers

T90 (perhaps MPP) Timer Summary ... MCS572 F95/FBH Timer TimeMeasured Units Comments ----- ------------ ----- -------- clock System&User Microseconds cpused User ClockTicks RTC ticks gettimeofday WallTime Microseconds plus many other things from TOD; C fn ja ElapsedUserSys Seconds plus more;on T90;only mppexec for T3E rtclock User ClockTicks current RTC ticks RTC RealTimeClock ClockTicks float version IRTC RealTimeClock ClockTicks int version second User Seconds Coarse, not useful for small timings secondr ElapsedWall Seconds Coarse, not useful for small timings sysclock RealTimeClock ClockTicks plus #wraps (overflows) timef ElapsedWall Milliseconds Fn. gives elapsed time since 1st call times Process&Child ClockTicks needs include timex ElapsedUserSys Seconds depends on opts in timex [opts] [cmd] tsecnd ElapsedTask Seconds for current multithreaded task

Notes: There are several other timers, but not appropriate for scientific computing. For actual use, consult the timer man page. Ideally, a timer should give usertime in intervals a small as microseconds. Hence, an ideal timer for the T3E would have to be designed from an rtc clock. Job accounting ja is done on T90, but gives mppexec time (must be T3E time). Using the C routine `gettimeofday' would be rough approximation, suggested on now extinct Thinking Machines Corp. CM-5.

T3E MPI Wall Timer

The Cray T3E at NPACI has a wall timer MPI_Wtime in seconds that works with MPI parallel programming codes for both f90 and cc codes. See the following information on MPI__Wtime and related functions:

MPI_Wtime Man Page, Measurements in Seconds.
MPI_Wtick Man Page, Resolution or Finest Time Interval Measured.
Sample Fortran f90 Code Illustrating MPI_Wtime Usage, for Laplace-Jacobi Application.
Sample C Code Illustrating MPI_Wtime Usage, for Laplace-Jacobi Application (under revision).

The best way to learn these commands is to use and test them in an actual computer session on the TCS Cluster.

Good luck.
Return to TABLE OF CONTENTS?

Please report to Professor Hanson any problems or inaccuracies:
hanson@uic.edu

Web Source: http://www.math.uic.edu/~hanson/tcs03guide.html

MCS572 UIC User's Local Guide to PSC Terascale Computing System (TCS) Cluster

version 0.8024 March 2003

F. B. Hanson

Mail address:

Office address:

Hanson World Wide WEB Home Page:

UIC Fall 2003 Course:

MCS 572 Class World Wide WEB Home Page:

Acknowledgement:

Table of Contents

Preface

TCS Overview.

Guide Notation.

Background References

MPI Message Passing Programming on TCS.

UNICOS T90 Fortran90 (f90) Compile, Load and Execution Commands

UNICOS C Language Commands

UNICOS Performance Commands

UNICOS makefile Commands

UNICOS Mail Commands

UNICOS Network Queueing System (NQS)

T90 Fortran90 (f90) and other Extensions

T90 Fortran90 (f90) Compiler Options

T90 Fortran90 (f90) Miscellaneous Extensions

Fortran90 Array Construction Functions

Fortran90 Array Reduction Functions

Fortran90 Array Manipulation Functions

Fortran90 Array Location Functions

Fortran90 Array Matrix Multiply Functions

Fortran90 Array Functions TEST CODE

T90 Fortran90 (f90) Differences:

Cray T3E f90 Differences:

f90 Library Functions

T90 Fortran90 (f90) Compiler Vector Toggling Directives

T90 Fortran90 (f90) Compiler Scalar Optimization Directives

T90 Fortran90 (f90) Compiler Loop Directives

T90 Fortran90 (f90) Compiler Storage Directives

T90 Fortran90 (f90) Compiler Diagnostic Directives

T90 Fortran (f90) Multitasking Options

Cray T90 f90 and cc Timing Utility Functions.

T90 Fortran90 (f90) Timing Utility Functions

cc Timing Utility Function

Table of T90/T3E Timers

T3E MPI Wall Timer

MCS572 UIC User's Local Guide to
PSC Terascale Computing System (TCS) Cluster

version 0.80
24 March 2003