MCS572 UIC User's Local Guide to
UIC ACCC Argo Linux Cluster

version 0.60
14 April 2003


F. B. Hanson

Mail address:

Office address:

Hanson World Wide WEB Home Page:

UIC Fall 2003 Course Web Page:

Acknowledgement:


 

Table of Contents


Preface

This User's Local Guide is intended to be a sufficient, hands-on introduction to the UIC ACCC Argo Beowulf Linux Cluster for our MCS 572 Introduction to Supercomputing class. The Argo cluster has a Linux variation of the UNIX operating system.


Argo Overview.

The UIC ACCC Argo is a small scale parallel cluster with 16 compute nodes, each with one 4-GHz AMD 1600 XP (Athlon) processor. In addition there is one Master access node using the same processor and one file server node using a Pentium III 550 MHz processor, making a total of 18 processors. The Master access and compute nodes are running Red Hat Linux 7.1 and Scali's cluster management and interconnection software. The UIC ACCC Argo's Master node is the only user access node to the compute nodes, using the internet address

with the prompt

where "[User]" is the Argo user's netid.

For Argo information, see

What does the Argo look like? UIC ACCC Argo Pictures


Argo Compute Nodes.

Each compute node is a single 4-GHz AMD 1600 XP (Athlon) with a 128K Level 1 cache and a 256KB Level 2 cache The peak performance is not specified. The compute node network interconnect (I/C) is Dolphin Interconnect hardware, using Scali 3.01 cluster management and software with MPI. The compute node are organized into 4 zones or groups with 4 processors or nodes each using the naming conventions "argo[zone]-[node]", where "[zone]=1:4" and "[node]=1:4". The physical organization of the zones is by a ring of Scalable Coherent Interface (SCI: an IEEE Cache Coherent Shared Memory protocol, but the coherency is not supported on Argo primitive PC nodes; see also the so-called SCI-based Dolphin interconnect).

For more information on the compute nodes, see


Argo Benchmark Performance.

The Argo Beowulf Cluster, installed at UIC in 2002 is too small a system to have a ranked performance on the Top 500 List of the most powerful computers. It would be classified as an Intel NOW (Network of Workstations or PCs) Cluster. See the MCS 572 selected list at


Argo Memory Units.

The random access memory (RAM) is 768 MB DRAM memory on each compute node and on the master node. The type of memory organization is distributed memory, as opposed to shared memory.


Argo Operating System.

The operating system is the Red Hat Linux 7.1, a version of the Unix Operating System. Compilation is done on the Master access node, but execution is by remote batch scheduling using primarily the Scali scasub envelope or wrapper command, described later.


Argo Login Shells.

The operating system environment is set by a UNIX shell and the default shell on the Argo is the Bash-Shell. The shell can be changed, but that is not recommended, by the Change Shell command "chsh" and has the format:

where the Shell Path "[shellpath]" can be found with system "which" in the format:

where "[shell]" is the standard system shell "sh", user C-shell "csh", Bourne again shell "bash", the Korn shell "korn" and others. However, many command given here assume the C-shell, which uses the resource configuration file ".cshrc" which resides in the user's home directory and can be used to define commands and make aliases (format:

in cases of special command characters quotation marks are needed.). A sample of a ".cshrc" file for use on the Argo (TCS) with local modifications is


Argo-UIC Login Access.

Users MUST access the UIC Argo directly using the Secure Shell (ssh), such as from UIC `icarus' or from department systems,

OR

If your computer system does not have this secure form, you will have to find one that does, like the UIC student computer server icarus.uic.edu since every student should have a UIC netid. If ssh has difficulty with the Unix ".ssh/known_hosts" (will differ on other platforms) then edit the file by deleting the entry for the node that is giving the problem since the ssh key may be expired and try ssh command again.

SSH works like the Unix remote login command "rlogin", but encrypts your password so that it is nearly impossible to steal. See "man ssh" for help from the UNIX manual pages.

SSH is a UNIX command found on may UNIX systems, but you can get a free MS Windows version that comes in two main flavors:


Argo-UIC File Transfer.

Standard FTP (File Transfer Protocol) can be used to transfer files between different UIC computer systems. However, secure copy scp is more robust, since secure FTP sftp can be more difficult to connect with, but the user may want to use it for security. For example, from UIC

SCP Secure Copy:

or from Argo

This form of the command works well for a single file, which can also have a directory path, but the user password has to be given each time. For multiple files a wild card version can be use, e.g., for all C files omitting the target file name from Argo:

SFTP Secure File Transfer Protocol: See "man scp" for help from the UNIX manual pages.

Also, you can use the secure File Transfer Protocol (FTP) called "sftp" that works like the usual FTP, except that you can not use any abbreviations of the FTP subcommands (e.g., use "put" and not "pu"), but SFTP secures your session better. For example, from UIC,

or from Argo

Remark: If your username is the same at both UIC node and the Argo node, then the "[username]@" is optional. See "man sftp" for help from the UNIX manual pages.


Argo File Systems.

HOME Directory:

The cluster user home directory area has a total 600 GB organized as an NSF (Network File System) file system. Each Argo User has a home directory on the Argo interactive access Master node to keep files and sub-directories with the full path specified by "/home/homes5[n]/[username]" where "[n]=0:5". The home directory can be more simply referenced by the UNIX symbol "~" or the UNIX meta or environmental variable representation "${HOME}" as in "cd $HOME" to change directory back to home or "ls ${HOME}/mcs572" (same as "ls ~/mcs572") to list contents of a home sub-directory "mcs572" (note that the curly brackets are optional in the first example but required in the second example where "HOME" is followed by non-blank characters. Home directory quotas are are not set at this time according to the "quota" command.

SCRATCH Directory:

There are 3 parallel scratch file systems "pvfs-scratch[n]" for "[n]=1:3" with 96 GB each for temporary storage, accessible by TCP (Transmission Control Protocol). There are other file systems, most not accessible from the compute nodes. For instance, the "/scratch" directory is for temporary storage and it's size is 132 GB, but is not accessible to the compute nodes. Similarly with the "/tmp" Master node temporary directory having 5GB of storage.


Argo Programming Languages.

Prior to compilation in the user C-Shell, Argo SCALI MPI environment needs the following environmental meta variables have to be defined for each session (The environment should be the default for the default BASH-Shell):

These also can be set in your ".chsrc" C-Shell resource configuration file as long as the full explicit path is set for the latter two since they are recursive, and this full path can be determined, for example, using "echo $PATH" or "echo $LD_LIBRARY_PATH".

The Argo programs are compiled and linked directly on the Argo, given here with some typical options when interfaced with MPI, using the following compilers with link step included:

GNU C Compiler gcc:

or the

GNU C++ Compiler is called g++

or the

GNU F77 Compiler g77:

See "man gcc" or "man g++" or "man g77" for help from the UNIX manual pages for gcc (with g++ too), g++ (alone) and g77, respectively.

Argo supports the Portland Group C compiler "pgcc", the Portland Group C++ "pgCC" and the Portland Group Fortran 90 compiler "pgf90". However, these compilers must be referenced with their full paths, such as "/usr/common/pgi/linux86/bin/pgcc" for "pgcc". Also, for command line help, for example, try "man pgc" for the Portland Group (PGI) C.

In the above compilation commands, the options are

 

SCASUB MPIRUN Parallel and Batch Run Commands:

where "scasub" is the Scali version of the batch submit command (replacing NQS "qsub" but with entirely different syntax, "mpirun" is the Scali modified version of the standard MPI run command (syntax again is different), "[#processors]" is the requested number of processors for 1 to 16 compute nodes, and "[executable]" is the copy of the executable in the user's home directory "$HOME". Since the Scali command are quite different than the usual command and they do not take Unix redirection of input in and output out. The Unix standard output (stdout) it written by default into the file "${HOME}/mpirun.o[Job#]" and standard error (stderr) is written by default into the file "${HOME}/mpirun.e[Job#]", where "[Job#]" is the batch job number that comes with the first output should be the Job Number "[Job#]" in the format

but "[Job#]" can also be found through the standard

batch status command.

However, data transfer can be accomplised b the options of the "scasub" for instance,

Also, data input can be done through the code using assignments, defines, data declaration initializations of through fopen file functions, since Unix redirected input does not work with the SCALI modified "mpirun" or for the SCALI native "mpimon" commands.

 

SCASUB MPIMON Parallel, Batch and Monitoring Run Commands:

The MPIMON procedure allows monitoring, selection of specific processors when available and the selection of specific number of virtual processes (jobs) per physical process (processor) or Argo node (here illustrated for the 4 processors of the 4th zone sub-cluster with just one virtual process per node). The selection of processes requires the double dash "--". The general form for the node/processes pair is

Actually on Argo, "mpirun" is wrapped around "mpimon", unlike the standard "mpirun".

When doing comparative performance measurement, Argo users must be careful in determining whether the number of processes are virtual processes (i.e., multiple jobs on one PC node) on a single physical processor or just multiple physical processes/processors or combinations of virtual and physical processes.

 

MPI C Code Test Examples:

A user can try out the class sample codes by down loading and copying (these use no data input):

to your home directory and then recopying it, say "[Example-Code].c".

For more information, see the Argo Getting Started page,


Argo Batch Queueing Systems: PBS and NQS with MPI.

NQS Job Scripts:
Although remote batch job scheduling on the Argo is available by using the UNIX Network Queueing System (NQS) job scripts, but the use of Scali command line batch scheduling is preferred since the system is tuned to the Scali optimization with MPI. However, the NQS/PBS command are still listed here, since some can be used with SCALI.

 

NQS qsub Submit Command:

These job scripts are run with the NQS QSUB submit command from the user's "${HOME}" home directory or "${SCRATCH}" scratch directory, for example,

where "${HOME}" and "${SCRATCH}" denote the meta-names of the user's home and scratch directories, respectively, on the Argo cluster. See "man qsub" for help from the UNIX manual pages or the Argo webpage

 

NQS qstat Status Command:

The job status can be checked by the NQS QSTAT status command:

and when done, the user can view the output if any. Under the table heading called "S" ,e.g., "Q" means that the job is queued waiting to run, "R" means running, and "E" means exiting. See "man qstat" for help from the UNIX manual pages.

 

NQS qdel Delete Command:

If for any reason you need to kill the job before the end, first note the job id number "[Job#]" at the beginning of your job line in the

output, then enter the command:

which should stop a running job, unless the system is busy. See "man qdel" for help from the UNIX manual pages. From more information see


Argo Message Passing Interface (MPI) Sources.

Argo general information and MPI web-pages:

NCSA MPI Basics:


Argo Timers, Profiling and Debugging.

For MPI programs, timing programs is usually accomplished with the MPI wall timer function "MPI_Wtime()", which can be used in an unsynchronized way by itself or synchronized with the MPI barrier function "MPI_Barrier([Communications_group])". As an example, in C, consider the fragment:

For information on more Unix, Linux and other timers, as well as performance profilers and debuggers, see the quite general NCSA page:


Argo Editors.

For more on editors and other software, see


More Information.

For more information, see


Guide Notation.

This local-guide is meant to indicate ``what works'' primary for access from UNIX systems to Argo. The use of the Unix C-Shell on the Argo is assumed throughout most of this local guide.

UNIX is a trademark of AT&T.

Computer prompts or broadcasts will be enclosed in double quotes (``_''), background comments will be enclosed in curly braces ({_}), commands cited in the comments are highlighted by single quotes or double quotes depending on emphasis (`_') or ("_") {do not type the quotes when typing the commands}, and optional or user specified arguments are enclosed in square brackets ([_]) {However, do not enter the square brackets.}. The symbol (CR) will denote an immediate carriage return or enter. {Ignore the blanks that precede it as in `[command] (CR)', making it easier to read.} The symbol (Esc) will denote an immediate pressing of the Escape-key {Use no brackets please.} The symbol (SPACE) will denote an immediate pressing of the Space-bar {Warning: Do not type any of these notational symbols in an actual computer session.}


Return to TABLE OF CONTENTS?

REST OF GUIDE UNDER CONSTRUCTION!

See PSC TCS Local User's Guide in the interim.


The best way to learn these commands is to use and test them in an actual computer session on the Argo Linux Cluster.

Good luck.

Return to TABLE OF CONTENTS?

Please report to Professor Hanson any problems or inaccuracies:


Web Source: http://www.math.uic.edu/~hanson/argo03guide.html