This User's Local Guide is intended to be a sufficient, hands-on introduction to the UIC Computer Center (CC) HP-Convex SPP1200/XA-16 scalable parallel processor (SPP).
The SPP1200/XA-16 has 16 parallel processors, collected into two subunits of 8 processors called hypernodes, i.e., there is only hypernode 0 and hypernode 1 at UIC. The local machine name is the `borg' and its full internet address is `borg.cc.uic.edu'. The XA model denotes the multiple hypernode eXtended Architecture (XA) SPP system. Normally, program execution is restricted to at most 8 nodes of a hypernode, while special permission is needed to use more than 8 nodes and using processors from both hypernodes requires the use of message passing programming.
Physical Memory: The physical memory of each hypernode is called Globally Shared Memory (GSM) but is distributed as local memory to each processor pair in 2 banks per processor pair. This memory is where the actual computations are performed and is also called the physical or main memory, implemented by fast RAM (Random Access Memory) memory chips. There is a total of 0.5GB physical memory per hypernode or 1GB total or 64MB per processor bank. A single hypernode is also called a Symmetric Multi-Processor (SMP), but could also be called a Shared Memory Processor. GSM is accessed via the cross-bar interconnect within a hypernode, while the CTI rings are used to access the GSM of other hypernodes.
Virtual Memory: Up to 3GB in virtual memory is available through the exchange of memory units called pages, permitting the execution of much larger programs than can fit in physical memory. The compiler generates the addresses of the much larger virtual memory, but only those pages currently executing are translated into physical memory.
Cache: In addition, each processor has a 256KB (KB = kilobytes) of fast data cache memory, a 256KB (KB = kilobytes) of fast instruction cache memory, and each processor pair also has the inter-hypernode ring cache call CTIcache. Cache memory is generally smaller, faster and closer to the CPU than physical memory.
Hard Disk Memory: Hard disk memory consists of 20 4GB SCSI-2 disks, 10 on each of the hypernodes, and on these disks your user directories, operating system, other system software, and application software are stored. Hard disk memory is generally bigger, slower and furthest away from the CPU than hard disk memory. Hence, the memory hierarchy is the order from cache to physical to hard disk memory.
Caution: Message passing must be done using the batch Network Queueing System (NQS), and is not practically feasible, especially on student accounts.
(Caution: in PDF format, so Adobe Acrobat Reader 2.0 is needed.)
This is consistent with another benchmark using the NASA NAS BT Simulated Computational Fluid Dynamics Application which again gives an scaled estimate of about 5.5 old Cray YMP/1 or 1.3 GigaFlops from taking a quarter of performance of the SPP1200/XA-64 formally at NCSA in Urbana using old data. No SPP1200 appears on the current Gunter's benchmark site:
The SPP1200 would be classified as a mini-supercomputer, not quite a supercomputer which denotes the class of the most powerful machines. However, it offers local access and many of the parallel constructs found on massively parallel machines.
TELNET: Access to the `borg' is by either the universal TCP/IP Internet Protocol:
FTP: File transfer to the `borg' is best accomplished by the universal TCP/IP Internet Protocol command:
Caution: If CMS is your home computer system, then you are restricted to file transfers only from CMS to the `borg' while you are logged onto CMS while accessing the `borg'. This is because CMS accounts are single user accounts restricted to single access only. That is, FTP to and Telnet from CMS would count as a double access so is not permitted in CMS. This is very different from Unix which is a multiuser system allowing multiple access, i.e., you can even log into the `borg' several times simultaneous, but you also degrade your own performance.
FORTRAN: A typical format for compiling and linking a Convex Fortran program named `[program].f' with parallel optimizations {`-O3'} in the background {last `&'} is
Fortran-Version Caution: The full path name of the Convex Fortran is `/usr/convex/bin/fc' (link to /usr/convex/fc/fc). Also, if you are using the system programmer's default Korn-Shell (ksh), then you must use the linked alias name `fcc', instead of `fc' which is reserved for the Korn-Shell. The `fcc' command is identical to the `fc' Convex Fortran Compiler. The standard Unix Fortran is `f77' stored as `/usr/bin/f77', but this compiler does note have any parallel code optimization features.
There is also the VAST Fortran 90 compiler `f90' of Pacific Sierra Research in `/usr/convex/vast90//f90', but is not licensed on the `borg', except for Fortran 90 extensions used in `fc'. The primary work of `f90' when available is done by the `vf90' precompiler translator that converts `f90' to `f77' usable code. {CAUTION: If you try to compile with `f90', you get the message "Unable to obtain license for vf90: No such feature exists", since `f90' is unlicensed.}
In summary, the Fortran-Compilers are
Command Version Path/Location -------- ------------------- ------------------ fc (fcc) Convex Fortran /usr/convex/bin/fc f77 HP-UX Fortran 77 /usr/bin/f77 (f90) (PSR VAST Fortran 90) (/usr/convex/vast90//f90 (NO LICENSE!))
Another caution is that if you have trouble with the above `fc' compiler command line syntax and you are using the Korn-Shell, then see the note below the C-language description for using the user standard C-Shell.
C: Similar format is used to compile and link C code.
C-Version Caution: There are multiple versions of the C-compiler. The `cc' Convex C Compiler is the one in `/usr/convex/bin/cc' (linked to /usr/convex/cc/cc). However, the HP-UX Standard C Compiler is the one in `/bin/cc', so the parallel processing user must make sure that the `set path' line in the user's `.cshrc' has /usr/convex/bin/cc' before `/bin/cc', so that the parallelizing `cc' version is found first (Korn-Shell users need take similar precautions, but normally this path problem should be taken care of by the new user template). There also the GNU (Glad its Not Unix public domain) Project version `gcc' in `/usr/local/bin/gcc'. In summary, the C-Compilers are
Command Version Path/Location ------- --------- ------------------ cc Convex C /usr/convex/bin/cc cc HP-UX C /bin/cc gcc GNU C /usr/local/bin/gc+
In addition, there are the C++ compilers:
Command Version Path/Location ------- ----------- ------------------ CC Convex C++ /usr/convex/bin/CC CC HP-UX C++ /usr/bin/CC g++ GNU C++ /usr/local/bin/g++
Shell Caution: If you have trouble with the above compiler command syntax, it may be that you have your borg account shell set as the Korn shell (ksh) rather than the user standard C-shell (csh). You can change you shell for the current session by the command
Caution: the Unix Fortran 77 compiler `/usr/bin/f77' and the Unix C compiler `/usr/bin/cc' are non-optimizing, i. e., non-parallelizing. Also, the Fortran 90 command is not fully supported.
XTERM: The default terminal emulation is MIT's X-Windows systems which is typically used with access from Sun Unix or IBM AIX workstations permitting the windowing interface of these workstations to pass through to the remote Exemplar.
TERM VT100: However, if you are not using X-Windows to access the `borg', then it is suggested that you use the DEC `vt100' terminal emulation by entering the Unix command
VI: If your are also going to be using the usual Unix visual editor `vi', see the following manual (man) pages:
THE: There is also The Hesseling Editor `the', which is like the CMS xedit command, so use the manual command:
PICO?: In addition, there is also the `pico' editor from the `pine' mail program, but `pine' itself is not on the `borg' since you are permitted only to send mail from the `borg' but not to it. There is a man (manual) page accessible by the command:
RUN: On the `borg' at UIC, a custom environment command called `run' greatly simplifies the execution of programs (the full path is `/usr/local/bin/run', so the directory `/usr/local/bin/' should be added to the `set path' line in your `.cshrc' C shell file, say). The Computer Center strongly advises use of `run'. A typical format for executing a module named `[executable-file]' in the background {last `&'} is
(For MCS 572 students, the Intended Usage is MCS 572 Introduction to Supercomputing, with Faculty Sponsor Prof. F. Hanson)
This mini-local-guide is meant to indicate ``what works'' and what is ``useful'' for beginning users. The guide also gives alternate methods for access from UNIX systems.
HP, Convex, Exemplar, and UX are trademarks of Hewlett-Packard Company and its CONVEX Computer Corporation division. UNIX is a trademark of Unix System Laboratories, Inc.
This guide is intended t be self contained, but users who want further information, can consult the following sources (you can just click on the highlighted topics to access if you are surfing the world Wide Web):
Some Examples:
(PDF file, Postscript version broken).
(Caution: Convert all `+O' `f77' options to `-O' `fc'/`fcc' options, since this html is for a different (old or new) format, but the Fortran/C options are basically the same for the `borg'.)
For example, search on `gwennap' to get an HTML reprint of the paper:
The login procedure depends on your local method of accessing the Convex from UIC, but the best access is from a Unix type system since the UIC HP-Convex Exemplar SPP1200 operating system is SPP-UX, which is substantially Unix and it is to the user's advantage to use Unix to Unix communication. If you do not now have a Unix account you should try to get one from your department's Unix system or from the UIC Computer Center graduate student Unix (IBM RS6000 AIX) server called `icarus'. Unix workstations are available in many science and engineering departments. Communication using the IBM mainframe suffers from horrible terminal emulation problems, so that you should avoid it if you can. If that does not work out or is not practical see Professor Hanson about other alternatives.
UNIX Access: The preferred remote login command:
rlogin borg.uic.edu -l [borg_user_name]
is best since it passes your local terminal emulation from the local session through to the remote UIC one and you do not have to enter your login name as with `telnet' and can proceed directly to the `password' step below.
PCLAB Access: Access by `telnet' TCP/IP command also works for PCLab PCs, or CMS, as well as in Unix, with the format:
telnet borg.uic.edu
{Caution: From a UNIX operating system, it is essential to use lower case; borg.cc.uic.edu is the full Internet name for the UIC Convex SPP1200, but the shortened name borg.cc.uic.edu also works; the corresponding Internet Number of the `borg' is `128.248.100.55' and is more basic since the Internet Name is derived from the number and the number may work when the UIC computer domain name server (DNS) is down; the `borg' should respond with:}
``login:'' [borg_user_name] (CR) {Enter your actual `borg' user name for `[borg_user_name]' on the SPP1200, the name supplied by the computer center when you apply for the MCS572 class or for research with your faculty advisor; whether `telnet' or rlogin; Next}
``password:'' [password] (CR) {Enter your actual user password in place of `[password]' after the password prompt, again in lower case at the cursor. If you make a mistake or are telnetting from an account with a different user name, you will probably have to try both login-id and password again.}
``...system Message Of The Day (MOTD) and Copyright information...''
``borg[n]: '' {The `borg[n]: ' is the default SPP1200 `borg' prompt and `[n]' is some command line counter which we will ignore in the sequel. You are now in SPP-UX on the UIC Convex Exemplar SPP1200. The UIC Computer Center, as are almost all computer centers, is very serious about computer security, so the first thing you should do is to change the UIC temporary password originally given to you by entering the password change command}
passwd (CR)
``Old password:'' {Enter your original password again.}
[old-password] (CR)
``New password:'' {Enter your new 8 character password, which should contain at least two alphabetic characters and at least one numeric or special character.}
[new-password] (CR)
``Re-enter new password:'' {Retype new password to confirm original typed change.}
[new-password] (CR)
``borg: '' {Congratulations, you made it to the UIC Convex Exemplar SPP1200; have a nice session.
Logging Out:
You can end this session at any time you have a `borg: ' prompt
by entering:
logout (CR)
or pressing the `Ctrl' control key and
'd' key simultaneously (i.e., `ctrl-d').
File Commands: You can check what the name of your `borg' home directory (file system) is by the Unix ``print working directory'' command:}
``borg: '' pwd (CR) {Your disk directory should be something like `/home/home7/[borg_user_name]' where `[borg_user_name]' is your Convex Login-id on the Convex SPP1200. You can list the current files on your account by the Unix ``list sets'' command:}
``borg: '' ls (CR) {If this is a new account you probably will not have any regular files listed, but the following form of `ls' command has options that reveal hidden `dot' files and give the long form of file information:}
``borg: '' ls -al (CR) {You may continue with a Convex SPP-UX session by getting help with the usual UNIX `man' manual command:}
``borg: '' man ls (CR) {The default `man' output is paged, so press the `Spacebar' or enter `d' for another page and enter `q' for quit. Try `man man' for more information.}
VI SETUP: If you are going to be using the Unix `vi' visual editor rather than the `emacs' editor or an X-Windows editor via `xterm', then you might want to set your terminal environment to the standard `vt100' terminal emulation by the command:
``borg: '' setenv TERM vt100 (CR)
Along with this change to `vt100', you should first save your old Unix edit (`EX') resource configuration file `.exrc' using the Unix copy `cp' command :
``borg: '' cp .exrc .exrc_old (CR)
to a `vt100' friendly one (if permissions permit from Professor Hanson's account) by
``borg: '' cp ~hanson/.exrc .exrc (CR)
or by using the web link with a web browser at your accessing computer to get a copy:
although you might have to use the File Transfer Protocol `ftp' to get it to the `borg', for example by Anonymous FTP:
``borg: '' ftp www.math.uic.edu (CR)
``Name (www.math.uic.edu:[user]): '' anonymous (CR)
``Password: '' [send_email_identity_as_password] (CR)
``ftp> '' cd pub/Hanson/MCS572 (CR)
``ftp> '' get .exrc_borg .exrc (CR)
``ftp> '' quit (CR)
FTP will be described below in Section on FTP.
FORTRAN SESSION: For a sample session for compiling and executing a Fortran Program, you can get a copy of the MCS572 `borg' starter problem via the web and transfer it to the borg:
or by `cp' copy command:
``borg: '' cp ~hanson/borgstart.f start.f (CR)
or by Anonymous FTP:
``borg: '' ftp www.math.uic.edu (CR)
``Name (www.math.uic.edu:[user]): '' anonymous (CR)
``Password: '' [send_email_identity_as_password] (CR)
``ftp> '' cd pub/Hanson/MCS572 (CR)
``ftp> '' get borgstart.f start.f (CR)
``ftp> '' quit (CR)
{Note: if this Anonymous FTP method is used than the getting of `.exrc' above and `start.f' here can be combined.}
For an example of compiling and linking the MCS572 Fortran based `borg' starter problem (assuming `start.f' has been transferred to your `borg' account), enter:
``borg: '' fc -O3 -LST -o start start.f >& start.LIST& (CR) {Here, `fc' is the optimizing SPP-UX optimizing Fortran compiler, `-O3' is the parallel optimization option, `-LST' is the optimizing report listing option with output along with error messages going to the file `start.LIST' according to the redirection option `>&', `start.f' is the Fortran source file, `-o start' means that the execution file is named just `start', and the last `&' means compilation is done in the background permitting use of the terminal while waiting for the job to finish. The status of the job can be determined form the Unix `jobs' command:}
``borg: '' jobs (CR) {When you get the message:
``Done fc -O3 -LST -o start start.f >& start.LIST''
instead of ``+Running ...'', then the listing file can be examined by the Unix `more' paging command:}
``borg: '' more start.LIST (CR) {Else, the listing file can be viewed by the Unix visual editor (or other favorite editor):}
``borg: '' vi start.LIST (CR) {If compilation and listing is satisfactory, then the module `start' may be executed using the UIC custom environment `run' command preceding the `start executable rather than directly using `start':}
``borg: '' run start >& start.output & (CR) {If the proper directory `/usr/local/bin' for `run' is not in your defined directory path, then you can add it to your C-Shell Resource Configuration file `.cshrc' or you can use the full name of run:}
``borg: '' /usr/local/bin/run start >& start.output& (CR) {Again, the job status may be checked while waiting by entering:}
``borg: '' jobs (CR) {When the ``Done'' message is displayed rather than ``+Running ...'', then the output can be viewed in pages:}
``borg: '' more start.output (CR) {Caution: the executable `start', as with the default executable `a.out', is a binary, rather than text file, so is not readable. Next you can transfer your files back to your printer connected computer (Computer Center would be prefer that you not print from the `borg') by FTP:}
``borg: '' ftp [home_machine].[department].uic.edu (CR)
``login: '' [user_name] (CR)
``password: '' [user_password] (CR)
``ftp> '' cd [target_directory] (CR)
``ftp> '' put start.f (CR)
``ftp> '' put start.LIST (CR)
``ftp> '' put start.output (CR)
``ftp> '' quit (CR) {Finally for the job, it is good file management to remove your executable file since it is cheaper to regenerate than to store, so as not to be a "storage hog" using the Unix remove `rm' files command:}
``borg: '' rm start (CR) {At this point you can `logout' of the `borg' by entering:}
``borg: '' logout (CR)
``% '' {Return to your local UIC UNIX session.}
C SESSION:
Caution: Borg has had some license problems, but seem to be solved. If you try to run Convex `cc' in `/usr/convex/bin/cc' you may get the message:
Unable to obtain license for cc: Cannot connect to license server, (-15,12:239), Connection refused
You can run a sample Convex C program by transferring to the `borg' a starter C code version:
or by `cp' copy command:
``borg: '' cp ~hanson/borgstartcc.c start.c (CR)
or by Anonymous FTP:
``borg: '' ftp www.math.uic.edu (CR)
``Name (www.math.uic.edu:[user]): '' anonymous (CR)
``Password: '' [send_email_identity_as_password] (CR)
``ftp> '' cd pub/Hanson/MCS572 (CR)
``ftp> '' get borgstartcc.c start.c (CR)
``ftp> '' quit (CR)
{In order to compile and link this C code, enter:}
``borg': '' cc -O3 -or all -o start start.c >& start.LIST & (CR) {Here, `cc' is the optimizing SPP-UX optimizing C compiler (`/usr/convex/bin/cc'), `-O3' is the parallel optimization option (Caution: the optimizing compiler must be the one in `/usr/convex/bin', so to make sure execute the command `which cc (CR)' and `/usr/convex/bin' be displayed.), `-or all' is the optimizing report listing option with output along with error messages going to the file `start.LIST' according to the redirection option `>&', `start.c' is the C source file, `-o start' means that the execution file is named just `start', and the last `&' means compilation is done in the background permitting use of the terminal while waiting for the job to finish. The status of the job can be determined form the Unix `jobs' command:}
``borg: '' jobs (CR) {When you get the message:
``Done cc -O3 -or all -o start start.c >& start.LIST''
instead of ``+Running ...'', then the listing file can be examined by the Unix `more' paging command:}
``borg: '' more start.LIST (CR) {If compilation and listing is satisfactory, then the module `start' may be executed sing the UIC custom environment `run' command preceding the `start' executable rather than directly using `start':}
``borg: '' run start >& start.output & (CR) {If the proper directory `/usr/local/bin' for `run' is not in your defined directory path, then you can add it to your C-Shell Resource Configuration file `.cshrc' or you can use the full name of run:}
``borg: '' /usr/local/bin/run start >& start.output & (CR) {Again, the job status may be checked while waiting by entering:}
``borg: '' jobs (CR) {When the ``Done'' message is displayed rather than ``+Running ...'', then the output can be viewed in pages:}
``borg: '' more start.output (CR) {Next you can transfer your files back to your printer connected computer (Computer Center would be prefer that you not print from the `borg') by FTP as about with the `start' sample. Remember to remove your large executable:}
``borg: '' rm start (CR) {At this point you can `logout' of the `borg' by entering:}
``borg: '' logout (CR)
``% '' {Return to your local UIC UNIX session.}
The FTP file transfer protocol is the fastest method of file transfer between UIC and UIC Convex SPP-UX, because it uses a fast internet communication link.
{Caution: FTP involving UIC CMS should be initiated from CMS, because you can not have multiple write links to the same CMS disk (multiple read links should be OK. There are no similar problems using UNIX system, because UNIX is a multi-user system.}
At the UIC Convex you can transfer file between the Convex and UICVM or UNIX, even the PSC C90, PSC T3D or NCSA Power Challenge Array if you have an account there. The `ftp' command on UNIX is very much like the `ftp' command in SPP-UX.
In order to transfer a file from SPP-UX and to UIC, enter the commands:
``borg: '' ftp [machine].[dept].uic.edu (CR) {where you must enter the actual machine-department "[machine].[dept]" name; {Caution: some machines will have more than one part to the name}}
``Name ([machine].[dept].uic.edu:[UIC-name]): '' [UIC-name] (CR)
``Password: '' [UIC-password] (CR) {If you make a mistake at this point and you get an `ftp> ' prompt then you can either enter `quit' and start again or enter `user [UIC-name]' continuing:}
``ftp> '' cd [UIC-directory] (CR) {change directory to the target UIC directory.}
``ftp> '' lcd [SPP-UX-directory] (CR) {locally change directory to the target UIC SPP-UX directory, but note that the meta variables `$HOME' or `$TMP' will not work here and you will need the full path name since FTP is an entirely different session! Hence it is best to initiate FTP from the SPP-UX directory from or to which the transfer will be made. For the home directory the the abbreviations `~' or `~[name]' will still work even if `$HOME' does not.}
``ftp> '' ls (CR) {Lists files in the UIC directory; wild-cards can be used.}
``ftp> '' !ls (CR) {Lists files in the local SPP-UX directory using the `!' escape to the local SPP-UX session; wild-cards can be used.}
``ftp> '' !get [UIC-fn.ext] [[SPP-UX-fn.ext]] (CR) {Gets file from the remote UIC directory and save in the local SPP-UX directory with optional new name `[SPP-UX-fn.ext]'; wild-cards cannot be used, but the ftp multiple get command `mget' does permit wild-cards.}
``ftp >'' put [SPP-UX-fn.ext] [[UNIX-fn.ext]] (CR) {Puts file from the local SPP-UX directory and save in the remote UIC directory with optional new name [UNIX-fn.ext] ; wild-cards cannot be used, but the ftp multiple get command `mput' does permit wild-cards.}
``ftp >'' quit (CR) {Quit when done transferring files.}
``borg: ''
``borg: '' ftp uicvm.cc.uic.edu (CR)
or
``borg: '' ftp 128.248.2.50 (CR)
{This command allows you to enter the FTP communication system that uses the same lines and protocol as Telnet, but essentially only allows file transfer. The Internet numbers are more reliable. In the SPP-UX to UICVM FTP connection, you will be prompted for your CMS user-id and your CMS password:}
``Connected to uicvm.uic.edu'' {If you do not get connected but end up in FTP you can try 'open 128.248.2.50' without restarting in SPP-UX again.}
``Name(uicvm.cc.uic.edu:u[default-id]):'' [CMS-user-id like `[userid]'] (CR)
``Password(uicvm.cc.uic.edu:[CMS-user-id]):'' [CMS password] (CR) {If successful, then:} {If you make a mistake with either your password or username, you can enter `user (CR)' after the ``ftp>'' prompt to restart. At the SPP-UX FTP ``ftp>'' prompt (it differs from the IBM FTP prompt), you can issue FTP commands:}
``ftp >'' help [FTP-command] (CR) {This `help' command gives a short information or definition of the command `[FTP-command]'; `help', alone, will display a list of FTP commands; `?' is an brief alias for `help'.}
``ftp >'' ls (CR) {Either `ls' or `dir' list the current contents of the remote directory if you need more information. `pwd' displays the remote (SPP-UX here) working directory.}
``ftp>'' ls *.fortran (CR) {This example causes the listing of Fortran files on your CMS disk, with the wild-card `*' standing for any filename. Similarly, use `ls *.f (CR)' in UNIX.}
``ftp>'' cd u12688.192.r (CR) {This UICVM example show how to change CMS disks for reading, where `u12688.192' denotes Hanson`s public CMS disk and the `.r' suffix denotes read only access, which is all anyone else would have. Note that you can not use the FTP `put' command to place a SPP-UX file on a CMS disk you currently have CMS links to, because if you try to `put' to such a disk you will get a message like:}
``550 Write access unavailable for ... due to other links''
``550 Writing is not allowed: .... is Read Only'' {However, if you have a disk that you are not currently linked to, such as a friend`s disk and you are FTPing onto the friend`s account, you may use the command:}
``ftp>'' cd [CMS-user-id].[cuu-number].w (CR) {This version of the change directory command permits writing on a disk that is not an `A' or `191' disk, where the `.w' suffix requests write access and the `[cuu-number]' is the disk number (a regular `A' disk has a CUU number of `191'). If this or the prior access requests are successful then you can transfer a file from SPP-UX to UIC using:}
``ftp >'' put [SPP-UX-fn.ext] [UICVM-fn.ft.fm] (CR)
or {The put command stores the local (SPP-UX) file on the remote (UICVM or UNIX) system in an FTP session started from SPP-UX. `send' is an alias for `put', while `mput [SPP-UX-files] (CR) is used to send multiple files to UIC with the similar names.}
``ftp >'' get [UICVM-fn.ft.fm] [SPP-UX-fn.ext] (CR) {The GET command transfers a file from UIC to SPP-UX If successful, you should get messages like this:} {`recv' is an alias of `get'.and `mget [UIC-files] (CR)' is used to receive multiple files quickly with wild-cards, but the file names will be the same as they are on the remote machine. Caution: `mget' stores the CMS file into SPP-UX with an upper case name, unlike `get' which stores into SPP-UX properly with a lower case name. You can transfer some more files if you want, changing the directory if needed, or you can quit FTP by using the command:}
``ftp >'' quit (CR) {You can also use `bye' to exit FTP, except in IBM FTP. Either will get you back to the SPP-UX (UNIX+) shell with prompt ``u* *%''.}
``borg: ''
The file transfer protocol program from a UNIX session is a similar to file transfers from the UIC SPP-UX sessions, because both have UNIX or extended UNIX operating systems, as discussed in the last section.
File transfer protocol (ftp) on a PC Lab PC may not be practical for must users, due to lack of permanent storage. Transfer between CMS or UNIX and the Convex may be more practical when you are accessing them using `telnet' from the PCs. The nearest Xerox PostScript printer to 2249f is SEL2263, while others are SEL2265, SEL2058, SEO308 and elsewhere. However, if the PC is your favorite medium, then use it as in the above Convex or Unix subsections.
At UIC the IBMNET version of FTP is used on CMS and it uses the following commands:
GETDISK TELNET (CR) {This is the first step. It accesses the IBM TELNET disk that contains both FTP and IBM TELNET commands, so if you already have accessed the IBM TELNET disk for TELNET in your UICVM session, you do not have to get it again for FTP, and vice versa. For more information on FTP, type `HELP FTP' or `HELP FTP MENU'.}
FTP borg.cc.uic.edu (CR)
{This is the usual 2nd step, unless you have already accessed the Telnet disk in your session. This brings up the file transfer protocol system so that you can transfer files and connects to the UIC Convex. If you do not get the Convex Internet name or number correct and do not get connected, you can still open a connection to the Convex using the command:
OPEN borg.cc.uic.edu (CR) You still have to log into the Convex. You can get brief help in FTP by typing `?' after the `Command:' FTP prompt. However, next you get the FTP banner and login requests.}
``VM TCP/IP FTP R1.1''
``Connecting to BORG.UIC.EDU 128.248.100.55, port ...''
``220 FTP server ( ... ) ready.''
``USER (identify yourself to the host):'' {The user must enter his Convex user-id of the form `u[?]' or `[userid]' assigned on the UIC ``Sign-on'' sheets.}
[Convex-user-id] (CR)
``>>>USER u[?]''
``331 Password required for u[?]''
``Password:'' {FTP and Convex request the account password.}
[password] (CR)
``>>>PASS ********''
``230 User u[?] logged in.''
``Command:'' {This is the IBM Telnet FTP prompt, so at this point you can enter such commands as help, get, put, cd, ls, open, binary or quit.}
``Command:'' help [FTP-command] (CR) {The HELP command without any arguments displays a list of FTP commands; otherwise, with an argument, gives the syntax for `[FTP-command]'. `?' also gives short useful descriptions of FTP subcommands.}
``Command:'' get [SPP-UX-fn.ext] [UICVM-fn.ft.fm] [(REPLACE] (CR) {The GET command gets the Convex file [SPP-UX-fn.ext] and stores in into the CMS file [UICVM-fn.ft.fm]. E.g., `get pgm.l SPP1200_pgm.listing' transfers the SPP-UX compiler program listing to your CMS disk and renames it `SPP1200_pgm.listing'. Note that inside FTP you must use `.' to separate UICVM CMS name, type and mode, not like the format in CMS. Convex UNIX files usually have `.' separating name and additional extensions. If the UICVM CMS file already exists, the `(REPLACE' option must be used to over-write it. There is a form `mget [Unix-files] [(REP] (CR)' for getting multiple files in a single command. Caution: the `get' command assumes the working or root directory, otherwise you must change the UNIX directory using the `cd' command. Several carriage control characters do not transfer very well from computer system to computer system, especially tabs {Warning: DO NOT use tabs for files, except for `makefiles', because tabs will only cause you problems upon file transfer.} In CMS XEDIT, you can remove trouble-some carriage control characters by typing `set hex on (CR)' on the bottom command line, and then changing tabs to a set of blanks {4 blanks is a good number} by typing `c/x'05'/ /* * (CR)', or deleting underscores by `c/x'6d16'/* * (CR)', or deleting just backspace characters by `c/x'16'/* * (CR)'. Warning: type only the quotes surround the Hex numbers.}
``Command:'' cd [SPP-UX-directory-name] (CR) {The FTP command `cd' changes the working directory. E.g., `cd /tmp/user/[n]/{username] (CR)' changes to the user's temporary directory provided the proper disk number and id-number are used.}
``Command:'' put [UICVM-fn.ft.fm] [SPP-UX-fn.ext] (CR) {The FTP `put' command (usually `send' for most FTP programs) stores the CMS file [UICVM-fn.ft.fm] into the UNIX file [SPP-UX-fn.ext]. E.g., put craytest.fortran.e pgm.f transfers the test program from the class CMS disk to SPP-UX. `mput' is the multiple version.}
``Command:'' ls (CR) {The FTP commands `ls}' and `dir' produce a working directory listing, with `ls' listing only file names and `dir' giving a full listing like `ls -lg'. Wild cards such as in `ls pgm.* (CR)' are permitted.}
``Command:'' quit (CR) {This command causes an exit form FTP and returns you to your CMS session. Caution: `bye' is not an alias for `quit' in the CMS FTP, as it is in UNIX FTP.}
``>>>QUIT ********''
``230 Goodbye.''
``MORE...'' Clear-key
``Ready; T=............'' {This Ready prompt indicates that you are back in CMS on the VM machine.}
An alternate method of sending files is to use the CMS NOTE command and reading in the CMS file using the CMS GET command. This BITNET file transfer method can produce variable results, because CMS SENDFILE does not work for this purpose, BITNET expects a blank first line in the message and it depends on all the computer links between here and there.
fc -c -O3 -LST [source].f >& [source].LIST& (CR)
{Here, `-O3' enables parallel optimization and `-LST' is the option that enables the helpful information marking of loops for optimization in the compiler information listing file that is automatically stored in the file `[source].LIST'. No additional option is needed for the automatic optimization, unless altered.}
fc -O3 -LST -o myrun [source].f >& [source].LIST & (CR)
{Here, the missing `-c' option means that both compiling and linking will be performed with an executable module produced rather than an object module and the additional option `-o [executable]' enables renaming of the executable to `myrun' for example rather as the default `a.out' executable name.
{For Convex Standard C source file named `[source].c':}
cc -c -O3 -or all [source].c (CR) {This is the Convex Standard C compile command, with compiled source produced called `[source].o'. Here, the option `-c' denotes compilation, `-O3' enables parallel optimization and `-or all' enables the full optimization report for the compilation. Also}
cc -or all -o myrun [source].c >& [source].LIST & (CR)
{compiles and links the C program while producing messages on scalar and parallel optimizations.}
ld -o myrun [source].o (CR) {This step is the link or load step using the Convex SPP-UX segment loader with a typical format. Here the executable name is taken to be the generic name `myrun', but you can choose whatever name you like, omit the `-o' option produces the default UNIX name `a.out' for the executable. However, it is wise to stick to a single executable name, because the file size of typical executable is huge and a single name makes it easier to delete. It is usually cheaper to regenerate the executable than take up valuable storage space. }
{If you use UNIX redirection for input and output then the format is of the execute command is like:}
run myrun < [data] >& [output] & (CR) {Here, `run' is the UIC enveloping package command (the Computer Center strongly urges you to use it, or else), `[data]' is the input file and `[output]' is the output file that also receives diagnostic messages in the background. Input files for Fortran `read' statements and output files for Fortran write statements can also be allocated to the terminal or SPP-UX files using the Fortran `open' statement to reallocate units 5 and 6 for FC as in the example programs used below.}
As practice, you can run any source program that you have transported to SPP-UX. {If available, the simple code `convert.f'
program convert
code: convert from debug fortran cogs, slightly modified.
change: input & output is to & from terminal, input at prompt.
Caution: compile, load, and execute in SPP-UX using the three commands:
command: fc -o convert convert.f
command: convert
real a(999)
write(*,*) 'input any integer less than 1000:'
read(*,*) i
a(i) = float(i)
write(*,6000) a(i)
6000 format(' floating point representation: ',e13.5)
write(*,*) 'What happens when you exceed array bound of 999?'
stop
end
Can also be obtained on UICVM CMS using}
GETDISK HANSON (CR)
and
filel convert f *
{Since this simple-minded `convert.f' program uses the terminal as undeclared input and output units, corresponding respectively to `read(*,*)' and 'write(*,[n])' statements, without specifying an the FC `open' statement. Be sure to do this in your temporary directory which you can change to by using `cd $TMP'. The source `convert.f' is executed with the 3 commands:}
fc -o convert convert.f (CR)
convert (CR) {Upon execution with `convert', you are asked to supply an integer like 6:}
``Input an integer less than 1000:''
6 (CR) {SPP-UX responds with the output:}
floating point representation: 0.60000E+01
What happens when you exceed array bound of 999?
STOP executed at line 15 in Fortran routine 'CONVERT'
CP: 0.002s, Wallclock: 2.311s
HWM mem: 163926, HWM stack: 2048, Stack overflows: 0
{To rerun the same code without recompiling, merely enter `convert' again:}
convert (CR) {Your response should be to enter another number as above. Do not spend too much time with `convert.f, because it can only read and write.}
The second example uses data files for both input unit 5 and output unit 6, as well as the UNIX Fortran seconds timer `second()'
CODE: Borg version (caution)
code_old: craytest fortran from cogs disk = ncsa ctss users guide eg#1
program tempt
calculation: c(i)=exp(-0.5)*pi/i
change: etime cpu timer replaces 'second'.
caution: note the use of a 2 subscript timer array is a fortran trick.
change: input from file 'tempt.data' and output to file 'tempt.output'.
change: particular input is the vector length "nx".
change: second cpu(user) time 'second' added.
caution: In Convex SPP1200/XA-16 SPP-UX compile, load and execute with:
command: fc -o tempt tempt.f
command: tempt
parameter (ndim=5120)
real a(ndim),b(ndim),c(ndim)
real etime,tv(3,3)
caution: fc "real" implies IEEE precision which means 23 or 24 bits fraction
caution: cft "real" implies 48bits = 6bytes for the fraction,
continued: unlike ibm "real" which implies 24bits = 3bytes.
continued: Otherwise all variables not starting with (i-n) are
continued: implicitly real, unless otherwise declared. The fraction
continued: for "double precision" is 96bits=12bytes, hence no "real*8"
open(6,file='tempt.output')
open(5,file='tempt.data',status='old')
read(5,*) nx
tv(3,1)=etime(tv(1,1))
tv(3,2)=etime(tv(1,2))
call init(ndim,nx,a,b)
call calc(ndim,nx,a,b,c)
tv(3,3)=etime(tv(1,3))
tuoh=tv(1,2)-tv(1,1) ! user cpu timer overhead
tsoh=tv(2,2)-tv(2,1) ! system cpu timer overhead
ttoh=tv(3,2)-tv(3,1) ! system cpu timer overhead
tusr=tv(1,3)-tv(1,2)-tuoh
tsys=tv(2,3)-tv(2,2)-tsoh
ttot=tv(3,3)-tv(3,2)-ttoh
write(6,66) nx,nx,c(nx),tuoh,tusr,tsoh,tsys,ttoh,ttot
66 format(1x,' nx=',i5,'; c(',i5,')=',f10.7
& /' ohead=',e12.4,' seconds; user time=',e12.4,' seconds',
& /' ohead=',e12.4,' seconds; system time=',e12.4,' seconds',
& /' ohead=',e12.4,' seconds; total time=',e12.4,' seconds')
stop
end
subroutine init(ndim,nx,a,b)
comment: removed cdir$ novector
real a(ndim),b(ndim) pi=acos(-1.0) do 10 ix=1,nx
a(ix)=pi b(ix)=float(ix)
10 continue
return end
subroutine calc(ndim,nx,a,b,c)
comment: removed cdir$ novector
real a(ndim),b(ndim),c(ndim) do 20 ix=1,nx
c(ix)=exp(-0.5)*a(ix)/b(ix)
20 continue
return
end
{The source is a modified version of the old USER'S GUIDE craytest code. A copy can be obtained via a web browser:
or a copy can be obtained by Anonymous FTP:
or
An old Cray copy also resides in Hanson`s public CMS disk. Use your own copy or get Hanson`s CMS copy by using SPP-UX `ftp' to your own UICVM CMS account and when in FTP, enter
``ftp>'' cd u12688.192.r (CR)
``ftp>'' get tempt.f (CR)
``... {return code messages}..''
``ftp>'' bye (CR) {`bye' is often an FTP alias for `quit', but not in IBM FTP. 'tempt.f' performs a Convex `real', 24 bit precision, floating point vector calculation and times the calculation using `second()' It requires that a single vector dimension (not more than 5120) be inputted from an unformatted file called `tempt.data' using a `read(5,*)' statement, while the output is written to a file called `tempt.output' using a `write(6,[n]) [list]' statement. These files, usually enclosed in quotes, are allocated units with the FC `open' statements `open(5,file='tempt.data')' and `open(6,file='tempt.output')'. Before executing `tempt.f', a file called `tempt.data' is required. You can use the SPP-UX line editor `ex' to create one as follows:}
``borg:'' ex tempt.data (CR) {Caution: you should not use `ex' or `vi' for extensive editing on the SPP1200, which you can do at your local host, UICVM or UNIX; however, this short editing session will only take a short amount of time. A selected list of `ex' commands are given below. The `ex' prompt is `:'. You add lines with the `ex' subcommand `0a' to start with, to add one after line 0 in:}
``:'' 0a (CR)
500 (CR) {Note that using the IBM Telnet interface, you enter the data at the bottom of the screen and only when you return does it appear as above. End the insert mode by a entering a solitary period:}
. (CR) {Finally save and exit `ex' with the combined `wq' = `w | q' subcommand:}
``:'' wq (CR) {If successful (you can check by `cat tempt.data (CR)' you should now have files called `tempt.data' from `ex' and `tempt.f' from `ftp' or other source, so that you can now compile, load, and execute `tempt.f' by entering the three command lines:}
``borg:'' fc -o tempt tempt.f (CR)
``borg:'' run tempt (CR)
{Your output will be in `tempt.output' and you can list is by the command:}
``borg:'' cat tempt.output (CR) {Your output should look something like:}
nx= 500; c( 500)= .0038109 ohead= .0000E+00 seconds; user time= .0000E+00 seconds ohead= .0000E+00 seconds; system time= .1000E-01 seconds ohead= .0000E+00 seconds; total time= .1000E-01 seconds
{If you wish to re-run the program again with a different number, the enter}
``borg: '' ex tempt.data (CR) {again, or}
``borg: '' !ex (CR) {and enter within EX the subcommand}
``:'' 1c (CR)
5000 (CR)
. {to change `500' to `5000' and to end the change subcommand, while to end `ex' enter after the ``:'' prompt:}
``:'' wq (CR) {after the ``:'' prompt, and then enter}
``borg: '' run tempt (CR) {and}
``borg: '' cat tempt.output (CR) {again. When you are done with `tempt.f', remove all the files from SPP-UX that you do not need, using:}
``borg: '' rm tempt (CR) {for example, but especially the big executables like `tempt'.}
For information on C language programs use the SPP-UX commands:
man cc (CR) : for the manual help pages on the C compile and load command; also the `-q' option of man will produce quick, compact manual pages.
See `man fc' for more information.
fc -LST -[options] [source].f [other source files] >& [source].LIST& (CR) : Compiles and links source file `[source].f' and `[other source files]' with the Convex compiler listing option `-LST' redirecting listing and error message output to `[source].LIST' in the background. The actual compiler is called `fskel', but the compiler controller `fc' automatically calls it. The command
fc -c -LST [source].f >& [source].LIST& (CR) : Only compiles {`-c}' the source file `[source].f' and yields a compiler optimization report, with default optimization level `O2'.
Use other options
`-O0' for basic block scalar optimizations,
`-O1' for program unit scalar optimizations plus `O0',
`-O2' for global instruction scheduling, software pipelining, and data localizations plus `O1' {`-O2' is the default optimization level},
`-O3' for full parallel optimizations plus `O2',
`-o [executable-file]' to name the executable module instead of the `a.out' default,
`-c' for compilation only {no link or executable} and `[source].o' binary object file {`ld' command to is then needed to load the execution module, which then can be used to execute the program.}
fc -LST -[options] -o [executable] [source].f >& [source].LIST & (CR) : The `fc' with option `-o [executable]' produced an named executable file rather than the default `a.out' executable file.
Note: It is much better to use makefiles for such commands.
The current optimizing SPP-UX C compiler is `/usr/convex/cc6.5/cc'.
cc -[options] -o myrun [file].c (CR) : Compiles source [file].c, using the standard C compiler `cc6.5' and producing an executable named `myrun'. Use options {similar to Fortran `fc' compiler list}
`-O0' for basic block scalar optimizations,
`-O1' for program unit scalar optimizations plus `O0',
`-O2' for global instruction
scheduling, software pipelining, and data localizations plus `O1'
{`-O2' is the default optimization level},
`-o [executable-file]' to
name the executable module instead of the `a.out' default,
`-c' for compilation only {no link or executable} and `[source].o'
binary object file {`ld' command to is then needed to load
the execution module, which then can be used to execute the program.}
Use `man cc' for more about options.
cc -c [file].c (CR) : Compiles source [file].c, using the standard C compiler `cc6.5' and producing an object file named [file].o.
cc -o [executable_file] -or all [file].c >& [file].LIST & (CR) : Produces an optimization report including all tables, which may be restricted by replacing `all' by `loop', `private' or `array', the report itself is redirected to the output file `[file].LIST' in the background.
See `man cc' for more information.
#pragma _CNX [directive] : Form of C compiler directive placed within the C code, where some example directives are `LOOP_PARALLEL' for running next loop in parallel, `PARALLEL([list])' and `END_PARALLEL' for marking a code segment for parallel execution, `PARALLEL_PRIVATE([list])' for designating a list of variable in a parallel code segment as private or local. Refer to `man cc' or the `C User's Guide' for more information.
See `man ld' for more information. Note that the linker or loader `ld' works for both Fortran and C code object files.
ld -o [executable-file] -l [library list] [source].o (CR) : This segment loader links and loads the object module `[source].o' from the `fc' step into the execution module named `[executable-file]' by the `-o' option. Without the `-o' option, the executable is the standard `a.out' file. The library option may not be needed because many libraries are searched by default: Fortran90 (/usr/convex/fc9.5/lib/libF90.a), C (/lib/libc.a), I/O (/usr/lib/libio.a) and Math (/lib/libm.a).
To find out what other special software is at UIC click on:
run [executable-file] < [input-file] > [output-file] & (CR) : Executes the executable module taking input from the file `[input-file]' and redirecting output to `[output-file]' as a background process.
The HP Convex Exemplar SPP1200 has several Parallel Information Functions that permit finding the number of processors with threads (parallel execution streams running on paralllel processors), the number of threads, the number of hypernodes and similar information. The type declarations and usage of these functions is best illustrated by the following Convex Fortran `fc' code fragments:
Fortran Example:
c............ deleted nonrelevant code
Check: Parallel Information Functions
Caution: Avoid "()" args in declaration statements as in Programmer's Guide.
Check: Output (write) statements for function name meanings.
Code: Typical Parallel Information Function Declaration Statements:
integer num_procs
integer num_threads
integer num_nodes
integer num_node_threads
integer my_thread
integer my_node
integer level_of_parallelism
Code: Typical Parallel Information Output Variable Declaration Statements:
integer nproc,nth,nnode,nnodeth,myth,mynode,levpar,vlevpar(n)
c............ deleted nonrelevant code
Code: Typical Parallel Information Function Output Statements:
write(6,*) 'Parallel Information Function Output:'
606 format(1x,a,' = ',i3)
Caution: Deadlock could result if num_procs, etc., are used in I/O operations,
continued: i.e., do not use Parallel Information Functions directly in writes.
nproc=num_procs()
write(6,606) 'Number Processors with Threads',nproc
nth=num_threads()
write(6,606) 'Number Threads',nth
nnode=num_nodes()
write(6,606) 'Number HyperNodes',nnode
nnodeth=num_node_threads()
write(6,606) 'Number Threads on HyperNodes',nnodeth
myth=my_thread()
write(6,606) 'My Thread ID',myth
mynode=my_node()
write(6,606) 'My Hypernode ID',mynode
levpar=level_of_parallelism()
write(6,606) 'Level of Parallelism',levpar
Code: Output Table Explanation of Allowed Levels of Parallelism 0 to 9:
write(6,*) 'where level 0 means Not Parallel'
write(6,*) 'where level 1 means Asymmetric Thread Parallelism'
write(6,*) 'where level 2 means Node Parallelism'
write(6,*) 'where level 3 means Node Parallelism plus level 1'
write(6,*) 'where level 4 means Thread Parallelism'
write(6,*) 'where level 5 means Thread Parallelism plus level 1'
write(6,*) 'where level 6 means Thread and Node Parallelism'
write(6,*) 'where level 7 means Thread Parallelism plus levels',
& '1 & 2'
write(6,*) 'where level 8 means Single Dim. Thread Parallelism'
write(6,*) 'where level 9 means Single Dim. Thread Parallelism',
& 'plus level 1'
c........ deleted nonrelevant code:
Code: Example of Use of "level_of_parallelism()" in a loop:
do 5 i=1,n
.........
CodeTechnique: vector form is used to be less likely to hinder loop parallelism:
vlevpar(i)=level_of_parallelism()
.........
5 continue
write(6,606) 'Level of Parallelism: do 5',vlevpar(n)
c.............
.......
C Example (untested):
/* Include Parallel Information Required Header File: */
#include
Some command options can not be directly entered in compiler, load or execution commands, but the `mpa' modifying attribute commands permits specifying such things as the number of processors or threads for an executable module file.
mpa -[options] [executable_file] (CR) : Modifies the attributes of an executable `[executable_file]' according to the options `-[options]'. For example,
mpa -min 2 -max 4 -over -m myrun & (CR) : Permits the use of between a minimum of 2 and a maximum of 4 processors or threads when executing `myrun', while allowing oversubscription {`-over'} of threads with the modification {`-m'} of the given file `myrun'.
mpa -min 8 -max 8 -over -m myrun & (CR) : Permits exactly 8 processors or threads to be used. {Caution: the Computer Center does not allow `borg' to be used with more than 8 processors without special permission; also timing for of each execution is necessary since more processors does not necessarily mean faster execution due to overhead considerations. Also, since the `borg' is configured into processor sets called subcomplexes, typically as SC=sys with four processors, SC=c4 with four processors and SC=c8 with 8 processors, it would be wise to have jobs, using more than a few threads, running in a single subcomplex with a sufficient number of physical processors. The current subcomplex configuration is determined by the command:}
scm -c (CR) : Subcomplex manager command list the current subcomplexes, their node IDs, subcomplex names, subcomplex IDs and processor IDS within each subcomplex. For Example,
mpa -sc c8 -min 6 -max 8 -over -m myrun & (CR) : Permits between 6 and 8 processors to be used and to be assigned from the subcomplex `c8' with 8 processors.
Use `man mpa' for more information.
fc -cdxb -ctifiles -o myexec [source].f (CR)
cxdb -ctifiles -o myexec (CR) : Compiles the Fortran program [source].f for using the Convex CXDB debugging facility with debugging information placed in a Compiler Tools Interface (CTI) directory `.CTI'.
See also the Convex `prof' profiling command using `man prof'.
qsub [options] (CR) : Submit a batch job to the queue; see `man qsub (CR)' for more information. The option, for example, `-lM [16Mw]' permits running jobs with up to 16 mega words of memory, for example. The option `[myjob].script' provides the script instructions for running a background job. Note that UIC users must specify a script line
See also the UIC Exemplar Home Page section
for more information on the NQS queues.
qstat [options] (CR) : Display status of queued batch jobs; see `man qsub (CR)' for more information.
For more information about batch processing with NQS, check the man pages for
qdel
qps
qstat
Convex Fortran (fc) contains Fortran90 extensions beyond `f77' Fortran. Fortran90 is best used from within the Convex Fortran optimizing compiler `fc' (Although you can not put the `-f90' option on the `fc' command, the optimization report for `fc -O3 -LST' shows that `f90' is indeed the implicit default even if not requestion, although it may be turned off with the `-nof90' option.
The `f90' compiler is part of the Pacific-Sierra Research Corporation VAST optimizing compiler that is used by most optimizing compilers and is used along with the `vf90' translator, both being found in the directory `/usr/convex/vast90/' {CAUTION: There is no `vf90' license on the `borg' now and `f90' needs `vf90'}.
For optimization, it is recommended that your fc program aid the fc parallel optimization model, i.e.,
See also Section
and Subsection
Also see the appropriate sections, `man cc' for items on Convex Standard C.
Fortran 90 Array Notation and Array Sections: Convex Fortran fc allows most Fortran90 extensions for arrays, making array statements like
C = A + B ! arrays A, B, C must have the same ``shape'' (dimensions); i.e., same as `C(i)=A(i)+B(i)' with in a do loop for all `i' of subscripts.
A = 3*B + S ! for scalar S and array A; exception to same ``shape'' rule. i.e., same as `A(i)=3*B(i)+S' with in a do loop for `i=1,N', N=[dimension].
A(1:50) = B(1:100:2)' ! array sections must have same ``shape'', i.e., same as `A(i)=B(2*i-1))' for `i=1,50' in a do loop.
A([start]:[end]:[step])=[expression] ! in general, references the single subscript array section for i = [start] to [end] in steps of [step].
a(i,:)=[expression] ! for the i-th row of array `a'.
a(:,j)=[expression] ! for the j-th column in `a'.
a(1::2)=[expression] !for the odd vector elements.
a(n:1:-1)=[expression] ! for the `n' vector elements of `a' in reverse order.
z(1:n) = -log(z(1:n)) ! example for Fortran built-in function.}
Array Constructors: Fortran 90 array constructors permit initialization of vectors and arrays by enclosing data separated by commas with all data enclosed between `(/' and `/)' delimiters. For example (assuming proper dimensioning):
a=(/1,2,3,4,5,6/) ! for `integer a(6)'; `!' delimits inline comments
a=(/(i,i=1,6)/) ! do loop form, enclosed by `(' and `)'
b(1,:)=(/1,3,5/) ! for first row when `integer b(2,3)'
b(2,:)=(/(2*i,i=1,3)/) ! do loop form of constructor for 2nd row of `b'
Caution: Constructors can not be currently used in print statement arguments.}
real [variables-list] {The fc `real' declaration declares variables and array elements as 32-bit (4-byte) words with only 23-bits (6-bytes) allotted to the fraction for precision with one additional bias bit. This is IEEE precision, and is half of the precision on the Cray C90 YMP. The fc `double precision' or `real*8' declaration is 64-bits with a 53-bit fraction, and hence is entirely different from Cray 128-bit `double precision'.}
POINTER (P,A) {The fc new Fortran `pointer' statement declares that the declared integer (usually) variable holds (points to, for C-fans) the shifted initial (base) address of the declared array A. Not tested.}
open ([unit],file=`[fn]',status='unknown') {Format of fc OPEN statement assigning unit number [unit] to filename [fn]; place in program after declarations;
[unit] = 5 defaults to UNIX `stdin' as does
[unit] = * for read statements or reads from the terminal unless it is redirected by an `open' or a `lt;';
[unit] = 6 defaults to UNIX `stdout' as does
[unit] = * for write statements or writes to the terminal unless it is redirected by an `open' or a `>';
[unit] = 0 defaults to UNIX `stderr' or writes diagnostics to the terminal unless it is redirected by an `open' or a `>&'; note that file names are placed in quotes in the OPEN statement; see also `man' for SPP-UX `assign' and `env' statements.}
save [variable or array name list separated by commas] {The save statement is essential in f77 subroutines to save parameter variable values for later calls to a subroutine; the `-ev' option of fc provides a better solution to this problem; if not used can lead logic errors, especially for users accustomed to F66 Fortran in which variables are saved after the RETURN statement is executed, but lost in f77.}
recursive [function or subroutine]([subprogram arguments]) {The 'recursive' prefix is required on subprograms called recursively, but also the recursive suboption is needed in the compiler statement.}
[statement] ! [embedded comment] {The line embedded comment is now legal in Convex Fortran.}
intrinsic [f90-function1][,[f90-function2]] {An Intrinsic statement is needed in `fc' to declare any to Fortran90 extended intrinsics, such as ANY, DOT_PRODUCT, MAXVAL, ALL, EOSHIFT, MINLOC, SPREAD, COUNT, FLOAT, MINVAL, SUM, CSHIFT, MATMUL, PACK, TRANSPOSE, MAXLOC, PRODUCT, UNPACK.}
Many of the Fortran90 Array Functions that have been available on the Connection Machine are now available for the `fc' compiler version 9.5. However, a few functions take their arguments in a different order on the Convex than on the Connection Machine.
intrinsic pack, unpack, spread {Reminder that F90 extended intrinsic functions need an `intrinsic' statement right after the type declarations. }
PACK([array],[mask-array][,[vector]]) {Transforms (packs) the array `[array]' into a vector `[vector]' (an optional argument, which if not present, the output goes to the value of the function) according to the true values of the `[array]'-conformable, logical mask `[mask-array]'. }
UNPACK([vector],[mask-array],[field-array]) {Transforms (unpacks) the vector `[vector]' into the array `[field-array]' according to the true values of the `[field-array]'-conformable, logical mask `[mask-array]'. }
SPREAD([array],[dim],[ncopies]) {Transforms (spreads) the source array `[array]' into the output value of the function with `[ncopies]' copies along the dimension `[dim]' (horizontal copies if `[dim]'=1 and vertical if `[dim]'=2. }
RESHAPE([array],[shape][,[pad]][,[order]])
{Transforms (reshapes) the source array `[array]' into the output value of the
function with shape `[shape]' with order `[order]' padding the array `[pad]'.
Caution: `reshape' has not yet been implemented in `fc9.5'.
}
[uniform]=rand([iseed]) {Generates pseudo random numbers uniformly distributed with the value in the `uniform_variable' argument and is part of the standard Convex Fortran library rather than for f90. The value of the argument seed `[iseed]' must be nonzero to start a new random sequence and further calls for the same sequence must be made with `[iseed]' = 0. }
call random_number([uniform_variable])
{Generates pseudo-random numbers uniformly distributed with the
value in the `uniform_variable' argument, which is called the
`harvest' tag.
Caution: Does not work with `fc' even with an `intrinsic'
statement: use the standard `rand' (see `man rand' for information
on `rand' and similar pseudo-random number generators). }
call random_seed([SIZE=[seed_size]][PUT=[seed](1:[seed_size])]
[GET=[seed](1:[seed_size])])
{For `random_number', either sets random seed vector size, or put
(sets) the seed, or gets the seed.
Caution: see above.
}
The reduction functions reduce the input to a scalar output.
intrinsic sum, product, spread, maxval, minval, count, any, all {Reminder that F90 extended intrinsic functions need an `intrinsic' statement. }
SUM([array][,[dim][,[mask]]]) {The `SUM' function computes the sum of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2 or `[dim]'=0 for the whole array), according to the true values in the conditional mask `[mask]', if present. This function makes the Convex sum function the same as the Connection Machine version except that `dim=[dim]' and `mask=[mask]' labels are illegal in `fc9.5',as they are in the rest of the functions in this section. }
PRODUCT([array][,[dim][,[mask]]]) {The `PRODUCT' function computes the product of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2 or `[dim]'=0 for the whole array), according to the true values in the conditional mask `[mask]', if present. }
MAXVAL([array][,[dim][,[mask]]]) {The `MAXVAL' function computes the maximum value of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2 or `[dim]'=0 for the whole array), according to the true values in the conditional mask `[mask]', if present. }
MINVAL([array][,[dim][,[mask]]]) {The `MINVAL' function computes the minimum value of the elements of the array `[array]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2 or `[dim]'=0 for the whole array), according to the true values in the conditional mask `[mask]', if present. }
COUNT([mask][,[dim]]) {The `COUNT' function computes the number of the true elements of the logical array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2 or `[dim]'=0 for the whole array), if present. }
ANY([mask][,[dim]]) {The `ANY' function computes if there are any true elements in the logical array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2 or `[dim]'=0 for the whole array), if present, and returns a logical true or false answer. }
ALL([mask][,[dim]]) {The `ALL' function computes if there are all true elements in the logical array `[mask]' along the dimension `[dim]' (by columns if `[dim]'=1 or by rows if `[dim]'=2 or `[dim]'=0 for the whole array), if present, and returns a logical true or false answer. }
The manipulation functions rearrange the elements of the target matrix. See `cm-guide.tex' on `CMS getdisk hanson' for examples. However, the arguments may be in a different order on the Connection Machine.
intrinsic transpose, eoshift, cshift {Reminder that F90 extended intrinsic functions need an `intrinsic' statement. }
TRANSPOSE([array]) {The `TRANSPOSE' function transposes the 2-subscript array `[array]' with the result array of reversed dimensions. `transpose' is useful for converting arrays prior to printing since the transposed array will be printed out by rows, since the original array would be printed out by columns due to the column-wise nature of Fortran by default when array notation is used in a print statement. Note: `man transpose' leads to the `LIBF90' manual page, but `man libf90' does not. }
EOSHIFT([array],[shift][,[boundary][,[dim]]]) {The `EOSHIFT' function does an end-off shift on the array `[array]' along the dimension `[dim]' using the boundary value(s) `[boundary]' to fill in, if necessary. Caution: `man eoshift' does not lead to documentation, indicating it is not officially supported, but tests below show it works. Connection Machine arguments have a different order. }
CSHIFT([array],[shift][,[boundary][,[dim]]]) {The `CSHIFT' function does a circular shift on the array `[array]' along the dimension `[dim]' using the boundary value(s) `[boundary]' to fill in, if necessary. Caution: man eoshift' does not lead to documentation, indicating it is not officially supported, but tests below show it works. Connection Machine arguments have a different order. }
The location functions find the location of elements of the target matrix. See `cm-guide.tex' on `CMS getdisk hanson' for examples. However, the arguments may be in a different order on the Connection Machine and the Connection Machine also has the location functions `firstloc', `lastloc' and `project'.
intrinsic maxloc, minloc {Reminder that F90 extended intrinsic functions need an `intrinsic' statement. }
MAXLOC([array][,[mask]]) {The `MAXLOC' function finds the first element of target array `[array]' having the maximum value, relative to the conditional mask `[mask]', if present. }
MINLOC([array][,[mask]]) {The `MINLOC' function finds the first element of target array `[array]' having the minimum value, relative to the conditional mask `[mask]', if present. }
The matrix multiply functions compute the matrix products of the target matrices. See `cm-guide.tex' on `CMS getdisk hanson' for examples.
intrinsic matmul, dot_product {Reminder that F90 extended intrinsic functions need an `intrinsic' statement. }
MATMUL([array1][array2]) {The `MATMUL' function computes the matrix product of target arrays `[array1]' and `[array2]' commensurate for multiplication, with the result matrix of appropriate size. This function is also used for matrix-vector multiplication. }
DOT_PRODUCT([vector1][vector2]) {The `DOT_PRODUCT' function computes the scalar, dot product of target vectors `[vector1]' and `[vector2]', with the scalar result. Caution: the Connection Machine function is `dotproduct'. }
The following fc code contains examples of use of many of the Fortran90 array intrinsic functions mentioned above. Having array intrinsic more or less like those in CMFortran is helpful for portability. There are some subtle differences:
If b = 1 3 5 logical mask=b.gt.3
2 4 6
then s3=sum(b,1,mask) or s2=sum(b,2,mask) work when
real s3(3),s2(2)
but isum=sum(b,mask) or isum=sum(b,,mask) or isum=sum(b,:,mask) do NOT work.
That is, how do you enter a scalar dim for the whole array?
borg: fc -LST -o f90test f90test.f >& f90test.LIST&
borg: f90test >& f90test.output&
%%%%%%%%%%% f90test.f %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
program f90test
code97: update by removing old comments to cmfortran
code96: retest=f90test.f redone on borg = convex spp1200/xa-16
code95: retest=cf6testre.f redone 2y later, 01oct95, changes over 93 test
code93: test if cm Fort90 array functions work for cf77 v6.0
cf90er: => convex fc9.5 flagged for error.
cf6er: => cray cf77v6.0 flagged for error and stilled flagged on convex fc9.5
integer b(2,3),s2(2),s3(3)
integer c(2,2),ct(3,2),cs(3,4),as(4),at(3),ctr1(2),ctr2(2),ctr3(2)
integer a(2,3),cu(2,3),ar1(3),ar2(3),cst(4,3)
integer bi(2,2),br1(3),br2(3),b2(2),cr1(2),cr2(2)
logical test(2,3),inmask(64,64)
real u(64,64),du(64,64),us(8,8),diffav
intrinsic sum,maxval,minval,product,dot_product,matmul,transpose
& ,cshift,eoshift,spread
data b/1,2,3,4,5,6/ !replace constructors initialization
data as/2,3,4,5/
data at/2,3,4/
c --------------------
b(1,1:3) = (/1, 3, 5/) ! initialize first row, along dimension 2.
b(2,1:3) = (/2, 4, 6/) ! initialize second row, along dimension 2.
print*,'Note: constructors like "(/1,2/)" allowed in fc9.5'
br1 = b(1,:)
br2 = b(2,:)
print60,br1,br2
60 format(' b(2,3)'/(3i3))
c --------------------
isum = sum(b) ! => isum = 21; i.e., Front-End scalar.
print61,' isum=sum(b)=',isum
61 format(1x,a36,i4)
isum = sum(b(:,1:3:2)) ! => isum = 14; sole ':' means all values '1:2'.
print61,' isum = sum("b(:,1:3:2)")=',isum
bi=b(:,1:3:2)
isum=sum(bi)
print61,' isum = sum("b(:,1:3:2)")=',isum
cf6er:s2 = sum(b,dim=2) ! declared with the correct array section shape.
print*,'CAUTION: "dim=", etc., markers= NOT allowed in intrinsics'
s2 = sum(b,2) ! redeclared with the correct array section shape.
print62,' s2 = sum(b,2)=',s2 ! => s2 = (/9,12/), row sums
62 format(1x,a32,2i3)
s3 = sum(b,1) ! => s3 = (/3,7,11/); column sums.
print63,' s3 = sum(b,1)=',s3
63 format(1x,a32,3i3)
cf6er:isum = sum(b,mask=b.gt.3) ! =>isum = 18; i.e., add only elements
print*,'CAUTION: "mask=" marker= STILL not allowed either.'
s3 = sum(b,1,b.gt.3) ! => s3 = (/0,4,11/); i.e., conditional col sum
print63,' s3 = sum(b,1,"b.gt.3") =',s3
test=b.gt.3
s3 = sum(b,1,test) ! => s3 = (/0,4,11/); i.e., conditional col sum
print63,' s3 = sum(b,1,"b.gt.3") =',s3
s2 = sum(b,2,test) ! => s2 = (/5,10/); i.e., conditional row sum
print62,' s2 = sum(b,2,b.gt.3) =',s2
cf6er:isum = sum(b,,test) ! => isum = 18; i.e., add only elements
cf6er:print61,' isum = sum(b,,b.gt.3) =',isum ! that are greater than three.
isum = sum(b,0,test) ! => isum = 18; i.e., add only elements
print61,' isum = sum(b,0,b.gt.3) =',isum ! that are greater than three.
print*,' CAUTION: If "sum(array[dim[,mask]])", use zero (0)'
& ,' for [dim] for whole array when there is a mask.'
c --------------------
imax = maxval(b) ! => imax = 6; array maximum value.
print61,' imax = maxval(b)=',imax
s3 = maxval(b,1) ! => s3 = (/2,4,6/); column maximums.
print63,' s3 = maxval(b,1)=',s3
s2 = maxval(b,2) ! => s2 = (/5,6/); row maximums.
print62,' s2 = maxval(b,2)=',s2
c --------------------
imin = minval(b) ! => imin = 1; array minimum value.
print61,' imin = minval(b)=',imin
c --------------------
s2 = product(b,2) ! => s2 = (/15,48/); products of column elements.
print62,' s2 = product(b,2)=',s2
c --------------------
idot = dot_product(br1,br2) ! => idot = 44; dot product of row
print61,' idot = dot_product(b(1,:),b(2,:))=',idot ! vectors of b.
print*,' CAUTION: Array syntax not allowed in actual arguments.'
c --------------------
! assuming array b of the previous section.
![Ans] = matmul([Array_1],[Array_2]) ! computes matrix multiplication
! of two rank two matrices.
c = matmul(b(:,1:2),b(:,2:3)) ! => c(1,:)=(/15,23/);c(2,:)=(/22,34/).
c=transpose(c)
print623,'c=matmul(b(:,1:2),b(:,2:3))=',c
623 format(1x,a36/(2i3))
![Ans] = transpose([Array]) ! transforms an array to its transpose.
ct = transpose(b) ! => ct(1,:)=(/1,2/);ct(2,:)=(/3,4/);ct(3,:)=(/5,6/).
ctr1 = ct(1,:)
ctr2 = ct(2,:)
ctr3 = ct(3,:)
print623,'ct = transpose(b)=',ctr1,ctr2,ctr3
c --------------------
! assume b is again initialized as
! b = 1 3 5
! 2 4 6
a = cshift(b,1,2) ! => a = 3 5 1
! 4 6 2
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = cshift(a,1,2)=',ar1,ar2
633 format(1x,a36/(3i3))
! i.e., b(i,j+shift) -> a(i,j) for j=1:2, etc.;
! i.e., the result is computed from shifting subscript in specified
! dimension of the source array by the specified shift.
a = cshift(b,-1,2) ! => a = 5 1 3
! 6 2 4
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = cshift(b,-1,2)=',ar1,ar2
! i.e., b(i,j+shift) -> a(i,j) for j=2:3, etc.
s2(1) = 1
s2(2) = 2
a = cshift(b,s2,2) ! a = 3 5 1
! 6 2 4
! i.e., an array-valued shift, or shift per row.
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = cshift(b,(/1,2/),2)=',ar1,ar2
c --------------------
! Jacobi Iteration for a 5-star discretization of
! 2D Laplace's equation:
u = 0
u(1,:)=2
u(64,:)=2
u(:,1)=2
u(:,64)=1
inmask = .FALSE.
inmask(2:63,2:63) = .TRUE.
diffav = 1
iter=0
do while (diffav.gt.5.e-3.and.iter.lt.100)
iter=iter+1
du = 0
where(inmask)
du = 0.25*(cshift(u,1,1)+cshift(u,-1,1)+cshift(u,1,2)
& +cshift(u,-1,2)) - u
u = u + du
end where
du = du*du
diffav = sqrt(sum(du)/(62*62))
end do
! which is the main program fragment of laplace.fcm.
cf90er:print66,'u = laplace-shift(u)=',u(1:64:16,1:64:16)
cf90er: & ,' array section like "u(1:64:16,1:64:16)".'
print*,'CAUTION: array sections not allowed in print'
us = u(1:64:9,1:64:9)
us=transpose(us)
print66,'u = laplace-shift(u)= ; iter=',iter,'; av-diff ='
& ,diffav,us
66 format(1x,a36,i3,a7,e10.3/(8f7.3))
c --------------------
a = eoshift(b,-1,0,1) ! a = 0 0 0 note default boundary value is 0.
! 1 3 5
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = eoshift(b,-1,0,1)=',ar1,ar2
s2=(/-1,0/)
b2=(/7,8/)
a = eoshift(b,s2,b2,2) ! => a = 7 1 3
! 2 4 6
ar1 = a(1,:)
ar2 = a(2,:)
print633,'a = eoshift(b,(/-1,0/),(/7,8/),2)=',ar1,ar2
a = eoshift(b,2,0,2) ! => a = 5 0 0
! => 6 0 0
ar1 = a(1,:)
ar2 = a(2,:)
print623,'a = eoshift(b,2,2)=',ar1,ar2
c --------------------
cs = spread(as,1,3)
! contents of cs:
! 2 3 4 5
! 2 3 4 5
! 2 3 4 5
cst = transpose(cs)
print64,'as =',as
64 format(1x,a32,4i3)
print643,'cs = spread(as,1,3)=',cst
643 format(1x,a36/(4i3))
c --------------------
cs = spread(at,2,4)
! contents of c:
! 2 2 2 2
! 3 3 3 3
! 4 4 4 4
cst = transpose(cs)
print63,'at =',at
print643,'cs = spread(at,2,4)=',cst
c --------------------
print*,'F90 random_number does not work, even with intrinsic stmt'
C call random_number(uniform)
uniform=rand(1)
print660,'seed=',1,', uniform random variate =',uniform
C call random_number(uniform)
uniform=rand(0)
print660,'seed=',0,', uniform random variate =',uniform
C call random_number(uniform)
uniform=rand(0)
print660,'seed=',0,', uniform random variate =',uniform
660 format(1x,a,i2,1x,a,f10.6)
c ---------------------------------------------------------------------------
! i.e., b=spread(a,d,c) =>
! a(n_1,n_2,...,n_(d-1),n_d,...,n_r) -> b(n_1,n_2,...,n_(d-1),c,n_d,...,n_r)
! where r is the rank of source array a and n_i is the size of dimension i;
! noting that a new dimension of size c is added before dimension d.
c ---------------------------------------------------------------
stop
end
%%%%%%%%%%% end f90test.f %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
Compiler directives can be used to force optimization or stop optimization for both `fc' and `cc' compilers. These statements are placed in the Fortran or C source just before the loop or other entity they are to effect. However, some come in single statements, while others come in pairs with one beginning directives and another ending directive. The compiler directives have different formats, i.e., for `fc' they have the form:
C$DIR [directive]
while for `cc' they are called pragmas and have the form:
#pragma _CNX [directive]
If several compatible directives are used on the same code fragment like a loop, then they may be combined, for example in the case of two directives, as
C$DIR [directive1], [directive2]
#pragma _CNX [directive1], [directive2]
Caution: for `fc', the leading `C' of `C$DIR' must be in column 1 and a blank must be in column 6. The `#pragma' statement keyword must be in lower case for `cc', but the directives themselves as well as `C$DIR' can be in upper or lower case. Also, it is not wise to force optimization where inappropriate and risk synchronization errors.
These directives effect scalar optimization or nonoptimization at the point at which the directive appears and only affects the local program unit, such as the loop it appears in.
C$DIR scalar
#pragma _CNX scalar {Directive prevents parallelization or data localization of the following loop. Can be used on inner loop, while larger outer loop is parallelized. }
C$DIR no_side_effects [function_or_subroutines_list]
#pragma _CNX no_side_effects [procedure_list] {Allows inlining, transformations of arguments or common or input/output or other calls for named procedures, functions or subroutines. }
C$DIR critical_section [([gate_variable])]
.......[critical_section]..........
C$DIR end_critical_section [([gate_variable])]
#pragma _CNX critical_section [([gate_variable])]
.......[critical_section]..........
#pragma _CNX end_critical_section [([gate_variable])] {Allows only a single thread (i.e., prevents parallelization) of critical (i.e., necessarily serial sections). The optional `([gate_variable])' argument is used to label a parallel task, but see the `fc' directive `C$DIR gate([gatename_list])' using `man fc' and the `typedef' in `cc'. The two directives must be used in pairs to surround the critical section. }
C$DIR no_block_loop
#pragma _CNX no_block_loop {Disables loop blocking or strip mining for the next loop only. }
C$DIR no_fuse
#pragma _CNX no_fuse {Disables loop fusion for the next loop only. }
C$DIR no_parallel
#pragma _CNX no_parallel {Disables loop parallelization for the next loop only. }
C$DIR no_peel
#pragma _CNX no_peel {Disables loop peeling, i.e., disables removal of test for first and last iterations, for the next loop only. }
C$DIR no_unroll
#pragma _CNX no_unroll {Disables loop unrolling for the next loop only. }
C$DIR no_unroll_and_jam
#pragma _CNX no_unroll_and_jam {Disables loop unrolling and jamming (vertical unrolling or loop collapsing) for the next loop only. }
These compiler directives hold only for the loop immediately following the directive.
C$DIR no_loop_dependence([array_list])
#pragma _CNX no_loop_dependence([array_list]) {Informs compiler that arrays named in `[array_list]' do not have any dependencies for the next loop only. See about `C$DIR loop_private' and `C$DIR thread_private' in `man fc' (or `#pragma _CNX loop_private' and `#pragma _CNX thread_private' in `man cc') about informing the compiler about scalar that have no dependencies. }
C$DIR loop_parallel[([attribute_list])]
#pragma _CNX loop_parallel([arraylist]) {Enables parallelization of the next loop according to the properties in the optional `[attribute_list]' argument. Some possible attributes are `DIST' to distribute iterations over parallel threads (virtual processors to be scheduled on single physical processor upon execution); `max_threads=[number]'; number of iterations (chunks) and 'chunk_size=[number]' to be distributed to each parallel processor, with compatible attributes separated by commas. Caution: directive may not work if the number of loop iterations are not known to the compiler (e.g., computed number of iterations known only at execution) and if F90 array sections are used in implied loops. The hypernode parallel attribute `nodes' is not too useful at the 2 hypernode Exemplar at UIC (thread parallelism with the `threads' attribute is the default). See `man fc' or `man cc' or the Exemplar Programming Guide for more information and examples. }
C$DIR prefer_fuse
#pragma _CNX prefer_fuse {If fusing has been disabled by compiler option `-nfl', then enables loop fusion for neighboring loops, each preceded by this directive. }
C$DIR prefer_parallel
#pragma _CNX prefer_parallel {Informs compiler to parallelize the following loop only if it is safe to do so by checking for loop carried dependencies. }
C$DIR peel
#pragma _CNX peel {Enables peeling of the following loop by removing first and last iteration tests and replicating the code. }
C$DIR block_loop[(block_factor=[number])]
#pragma _CNX block_loop[(block_factor=[number])] {Enables blocking or strip mining of the following loop by an optional `(block_factor=[number]' factor. Caution: directive will work only for scalar, innermost loops without conditional branching. }
C$DIR unroll[(unroll_factor=[number])]
#pragma _CNX unroll[(unroll_factor=[number])] {Enables unrolling of the following loop by an optional `unroll_factor=[number]' number of times. Caution: directive will work only for scalar, innermost loops without conditional branching. }
C$DIR unroll_and_jam[(unroll_factor=[number])]
#pragma _CNX unroll_and_jam[(unroll_factor=[number])]
{Enables (horizontal) unrolling and jamming (vertical unrolling) of the
following non-innermost loops to an optional `unroll_factor=[number]'
unroll depth.
Caution: Directive will work only on loops that are non-innermost
after automatic loop interchanges.
}
C$DIR begin_tasks[([attribute_list])]
.......[parallel_task_first]..........
C$DIR next_task
.......[more_parallel_tasks]..........
C$DIR next_task
.......[parallel_task_last]..........
C$DIR end_tasks
#pragma _CNX begin_tasks[([attribute_list])]
.......[parallel_task_first]..........
#pragma _CNX next_tasks
.......[more_parallel_tasks]..........
#pragma _CNX next_tasks
.......[parallel_task_last]..........
#pragma _CNX end_tasks {Enables parallelization of different sections of codes (not just loops) that are separated by the `begin_tasks', perhaps several `next_task' and `end_tasks' compiler directives as indicated above for the case of two tasks `[parallel_task_first]' and `[parallel_task_last]'. Convex calls this Task Parallelism in contrast to Loop Parallelism. A typical attribute is `max_threads=[number]' giving the number of threads (processors) to be used, where `threads' is the default attribute and `nodes' makes little sense at UIC. Caution: there are no dependency checks, but see information on `C$DIR task_private([variable_list])' in `man fc' or `#pragma _CNX task_private([variable_list])' in `man cc' to avoid unintended dependences between tasks (note: the `begin_task' and `task_private' directives can be combined, separated by commas). }
The Convex performance analyzer CXpa permits analyzing the performance of both Fortran and C programs. It can be used in both X-Windows and line edit mode, but only the line edit example for Fortran code will be given here, otherwise consult the `man cxpa' command. The `cxpa' analyzer also servers as a coarse CPU timer.
fc -cxpa -O3 -LST -o [exec] [source].f > & [source].LIST & : The `-cxpa' option causes compilation of the program for analysis of all four regions: loops, routines, parallel regions and basic blocks. However, the `-cxpar' can be used just for regions and `-cxpab' for basic blocks, although special options can be selected with in the `cxpa' line edit command, as in the second step:
cxpa -nw [exec] : Here, the line edit mode is invoked by the `-nw' (``no-windows'') option {not used in the X-Windows version}. The third step is to select which regions the user wants analyzed:
``(CXpa)'' select all : Here, all regions are selected corresponding to `-cxpa' option, but we could have also used `select loop all' for all loops or `select routine all' for all routines. However, you must select something to be analyzed in spite of the CXpa compile option. The CXpa `help select' command displays many more select options, e.g., targeting a particular loop. The fourth step (optional) is about what data to collect:
``(CXpa)'' collect cpu wall_clock : Here, cpu time, along with iteration and execution counts, and wall clock time (caution: coarse timings) will be collected. Other options are `collect cpu' (confusing since wall time is set to zero), `collect call-graph', and many others display with the `help collect' command. The fifth step is to execute the executable under CXpa"
``(CXpa)'' run : The sixth step is to request performance analysis:
``(CXpa)'' analyze > [exec].report : Here the analysis report is directed into the file `[exec].report' for later study or just for the record. The CXpa `analyze' command has many other options that can be displayed by the `help analyze' command. More information about the form of the report format can be obtained by using `help Reports'. The final step is to quit:
``(CXpa)'' quit : When finished, the user should remove the large, binary executable `[exec]' and the large, binary performance data file `[exec].pdf' to conserve directory space.
See also,
top : Displays information of ``top'' SPP1200 processes and indicates what the global system load of other users is currently. The system load due to other users may effect both your response time and performance, since a large number of current users may cause your job to be frequently swapped in and o