Wind-US User's Guide - Parallel Processing

(Wind-US Documentation Home Page) (Wind-US User's Guide) (GMAN User's Guide) (MADCAP User's Guide) (CFPOST User's Guide) (Wind-US Utilities) (Common File User's Guide) (Wind-US Installation Guide) (Wind-US Developer's Reference) (Guidelines Documents)

(Introduction) (Tutorial) (Geometry and Flow Physics Modeling) (Numerical Modeling) (Boundary Conditions) (Convergence Monitoring) (Files) (Scripts) (Parallel Processing) (Keyword Reference) (Test Options)

Parallel Processing

Terminology
Parallel-Capable Executables
Remote Shell Commands
Directory Structure for Executables
Running Parallel Jobs
Multi-Processing Control File
- host - Specify worker systems
- #master - Master mode
- i/o - Worker-to-master file access
- communication - Worker-to-worker communication
- packmode - Data packing mode
- checkpoint - Checkpoint controls
- assignment mode - Processor assignment mode
- task mode - Task assignment mode
- route - Master/worker message routing
- #LOADLIMIT - Load limit specification

The time required to compute a solution can be reduced dramatically by using the parallel processing capability of Wind-US, with multiple zones being solved simultaneously. Jobs may be run in parallel mode on a multi-processor system (i.e., a single system with multiple CPUs), on a workstation cluster (i.e., a collection of networked nodes designed for parallel computation, with NFS-mounted home directories), or on a collection of separate, possibly heterogeneous, distributed networked systems, with or without NFS-mounted home directories. Either the Parallel Virtual Machine (PVM) or Message Passing Interface (MPI) libraries may be used to handle the parallel communication.

Running parallel jobs with Wind-US is remarkably simple. The only user requirement, beyond some initial system and account setup needed for communication, is the creation of a file listing the systems and/or number of processes to be used. The Wind-US scripts automatically take care of copying the necessary files and executables between the systems being used, starting and stopping the message passing software, and cleaning up at the end of the run.

Terminology

When operating in parallel processing mode, the system or node on which the job originates is called the master. [Strictly speaking, the master is the system controlling the job, not necessarily the system used to submit the job. When "master mode" is used, a system different from the originating system may be specified as the master. See the description of the .mpc file #master directive for details.] The nodes used to perform the actual solution are called workers. The master distributes grid zones to participating workers. Each zone is solved in parallel with other zones on other workers. Boundary information is exchanged at the end of each cycle to propagate information throughout the computational domain. There may be fewer workers than zones to be computed, in which case, a worker will be assigned another zone when it finishes its current assignment. The user specifies the names of the participating workers via the multi-processing control (.mpc) file.

Parallel-Capable Executables

At the time a Wind-US executable is built, it is linked with the appropriate message passing libraries (i.e., PVM, MPI, or both) that are required for parallel operation. The Wind-US software distributed by the NPARC Alliance includes PVM, but does not include MPI. If MPI is to be used, the MPI libraries and executable must be pre-installed on the multi-processor system being used.

The makefiles used to build the Wind-US executable include switches that specify the message passing software to be used. The default build option is PVM only. If MPI is to be used, the code must be compiled with the appropriate makefile switches set. See the Wind-US Installation Guide for details.

If Wind-US will be run in parallel mode on a collection of heterogeneous networked systems, executables must be available for each of the system and CPU types that will be used, and stored on the originating system. The appropriate Wind-US and PVM executables will automatically be copied to the workers at the start of the run (and to the master when master mode is used), and removed when the run finishes. Details on the directory structure required for the executables on the originating system are presented in the section Directory Structure for Executables

The run scripts used with Wind-US are designed to use the PVM executables that are part of the Wind-US distribution. If PVM happens to be pre-installed on the system(s) being used, some PVM-related environment variables may already be defined, such as PVM_ROOT or PVM_HOME. (The environment variables may be listed by issuing the command setenv, assuming the C shell or one of its variants is being used as the interactive shell.) In some cases, these have been found to conflict with the execution of Wind-US with its own version of PVM. In this case, the PVM-related environment variables should be unset before starting Wind-US. If the pre-installed version of PVM isn't needed for running other applications, a simple way to do this is to add a line like

   unsetenv PVM_ROOT

in your shell startup file (i.e., .chsrc for the C shell).

Remote Shell Commands

To run in parallel mode on a cluster or on distributed networked systems, the user must of course have a valid account on each system. The user name on all the systems must be the same.

In addition, the master must be allowed to communicate with each worker, and vice versa, using remote shell commands, and without entering a password. Here "remote shell commands" means either rsh and rcp, or ssh and scp. [Note that rsh and rcp are considered insecure, and many organizations, if not most, now require use of ssh and scp.] If TCPD access control is installed, which remote shell commands are allowed is normally controlled at the system level by information in the files /etc/hosts.allow and /etc/hosts.deny.

The following two sections describe how to set up password-less communication between the master and workers for rsh/rcp and for ssh/scp. If master mode is used, the same procedure must be followed to set up password-less communication between the originating system and the master, as well as with each worker. Note that this only has to be done once for a given cluster or collection of distributed systems.

Important: Wind-US uses the Unix hostname command to determine a system's name. Thus, in the procedures described below, whenever a system name is specified the name to be used must be the same as the name returned by the hostname command on that system. For example, for a system with the fully-qualified name "workerbee.bigcompany.com", if hostname returns just the machine name "workerbee", then workerbee should be used for the system name. If hostname returns the fully-qualified name "workerbee.bigcompany.com", then workerbee.bigcompany.com should be used for the system name.

rsh/rcp

To allow rsh and rcp to be used without entering a password, the host name of the master system must be in the file .rhosts in the user's home directory on the worker system, or in the system file /etc/hosts.equiv on the worker system, and vice versa. Note that this is required even if the master and worker are the same system. I.e., if the master is also being used as a worker, that system's name must be listed in the .rhosts or /etc/hosts.equiv file.

The .rhosts file is a text file containing a list of system names, and the userids on each of those systems, that are allowed to access the current host via rsh and rcp. The file should have its permissions set to rw-------, so issue the following command after creating the file.

   chmod 600 .rhosts

Once the .rhosts file has been created, it may be tested by issuing the following command from the system where the job will be submitted.

   rsh worker-name ls -l

Things are functioning properly if the directory listing appears.

When rsh/rcp remote shell commands are used, the maximum number of hosts that may be specified in the .mpc file is limited to 25, due to timeout issues that can occur with rsh/rcp.

More information about .rhosts files may be found by entering man rhosts on most Unix systems.

ssh/scp

Setting up password-less communication using ssh and scp is a bit more complicated, but as noted earlier it only needs to be done once for a given cluster or collection of distributed systems.

When performing the following steps, when you use ssh to connect to a new system (i.e., one that you've never connected to using ssh, or one whose host key has been changed), you may get a message like

   The authenticity of host 'system_name (133.11.217.42)' can't be established.
   RSA key fingerprint is ec:73:17:40:8d:c0:b5:96:76:27:6b:ce:f4:f9:96:73.
   Are you sure you want to continue connecting (yes/no)?

If so, respond with "yes" (without the quotes). This will add the host key for system_name in your .ssh/known_hosts file.

First, on the master, create private and public authentication keys by doing
```
   ssh-keygen -t rsa
```
The option -t rsa means use RSA authentication. If your site uses DSA authentication, you should use the option -t dsa. Use the defaults (i.e., just hit Enter) when prompted for a file name and passphrase (i.e., use no passphrase, in order to allow password-less ssh connections).
This creates, in your ~/.ssh directory, the files id_rsa, containing your private authentication key, and id_rsa.pub, containing your public authentication key. Make sure the id_rsa file is only readable by you. I.e., in the .ssh directory, doing "ls -l id_rsa" should give something like:
```
   -rw-------  1 userid userid  883 Jun 23 09:12 id_rsa
```
If it doesn't, do "chmod 600 id_rsa".
Still on the master, add the public authentication key to the file .ssh/authorized_keys in your ~/.ssh directory by doing:
```
   cd .ssh
   cat id_rsa.pub >> authorized_keys
```
If you're using a cluster or distributed systems with NFS-mounted home directories (i.e., your home directory physically resides on one node, and is NFS-mounted, or "shared", with the other nodes), do the following.
- For each node, including the master, do
```
   ssh node_name ls
```
  where node_name is the node name. If you get the "authenticity of host ... can't be established." message described earlier, respond with "yes". This will ensure that each node is listed in your .ssh/known_hosts file (or is already in the system-wide ssh_known_hosts file).
- In your .ssh directory, edit the authorized_keys file. There should already be a long line there for the master that was created in step 2, that looks something like
```
   ssh-rsa public_key= userid@master
```
  where public_key is a long string of characters containing your public authentication key, userid is your user ID, and master is the name of the master node. For each additional node, copy this line, and change the name master at the end of the line to the name of the node.
If you're using distributed systems with separate home directories on each system, do the following.
- From the master, add the master's public authentication key to the file .ssh/authorized_keys on each worker. I.e, from the .ssh directory on the master, for each worker system do the following. (Here, and in the following instructions, replace "worker" with the name of the worker system.)
```
   cat id_rsa.pub | ssh worker 'cat >> .ssh/authorized_keys'
```
- Log in to each worker system, and create private and public authentication keys on that system. I.e., from the master you could do the following.
```
   ssh worker
   ssh-keygen -t rsa
   cd .ssh
   cat id_rsa.pub >> authorized_keys
```
- On each worker, add that worker's public authentication key to the file .ssh/authorized_keys on the master. I.e., from the .ssh directory on each worker, do the following (where "master" is replaced with the name of the master system).
```
   cat id_rsa.pub | ssh master 'cat >> .ssh/authorized_keys'
```

You should now be able to use ssh (and scp) from the master to a worker, and vice versa, without entering a password. To test this, on the master do

   ssh worker ls -l

The contents of your home directory on the worker should be displayed. Similarly, on the worker do

   ssh master ls -l

The contents of your home directory on the master should be displayed.

Note that if the master is also being used as a worker, you must also be able to use ssh locally. To test this, on the master do

   ssh master ls -l

When running Wind-US, use the -usessh option to the wind script to specify that the executable and I/O files should be copied between the master and workers using ssh. I.e.,

   wind -usessh

Directory Structure for Executables

The run scripts expect to find the Wind-US executable(s) (Wind-US4.exe, for Wind-US 4.0) and the PVM executables (pvm, pvmd3, and pvmgs) for particular systems and CPUs in specific locations below the CFDROOT directory on the originating system, corresponding to the values of the SYSTEM and SYSTEM_CPU environment variables for those systems. [The CFDROOT, SYSTEM, and SYSTEM_CPU environment variables are set at login time, by running the cfd.login script in the user's .login file. For details see the instructions for installing the application or build distribution in the Wind-US Installation Guide.]

As an example, suppose the systems being used are a mix of 32-bit Linux systems with XEON processors and glibc version 2.3 (i.e., SYSTEM = LINUX32-GLIBC2.3 and SYSTEM_CPU = XEON) and 64-bit Linux systems with OPTERON processors and glibc version 2.3. The directory structure on the originating system below CFDROOT would be:

   $(CFDROOT)/
      LINUX32-GLIBC2.3/
         XEON/
            bin/
               Wind-US executable(s)
      LINUX64-GLIBC2.3/
         OPTERON/
            bin/
               Wind-US executable(s)
      bin/
         Run scripts
      pvm/
         lib/
            LINUX32-GLIBC2.3/
               XEON/
                  PVM executables
            LINUX64-GLIBC2.3/
               OPTERON/
                  PVM executables

When Wind-US is installed following the instructions in the Wind-US Installation Guide, the directory structure shown above is automatically created. Note that symbolic links may be used where appropriate to share executables between similar systems. E.g., if only Opteron executables are available for LINUX64-GLIBC2.3 systems, the directory $(CFDROOT)/LINUX64-GLIBC2.3/XEON may be a symbolic link to $(CFDROOT)/LINUX64-GLIBC2.3/OPTERON.

Running Parallel Jobs

As noted earlier, running parallel jobs with Wind-US is remarkably simple. Assuming a parallel-capable executable is available for the system(s) being used, and the user's system and account have been set up as described above, the basic steps are:

Split the grid into multiple zones. Ideally, there should be one zone for each processor, and each zone should be the same size, assuming the processors have equal computing power. See the section Zone Size Considerations for additional details.
Create a multi-processing control (.mpc) file, listing the host names of the systems to be used.
Issue the wind command in one of the following forms. [The commands shown here are the simplest forms. Additional wind script options may be used as needed.]
- When running on a cluster or collection of distributed systems using rsh:
```
   wind -nousessh
```
- When running on a cluster or collection of distributed systems using ssh:
```
   wind -usessh
```
- When running on a multi-processor system using PVM:
```
   wind -mp -mpmode PVM
```
- When running on a multi-processor system using MPI:
```
   wind -mp -mpmode MPI
```
For each of these, respond with "y" when prompted for whether or not you want to run in multi-processor mode. [The wording of this prompt is unfortunate. It really means "parallel mode", not necessarily on a single multi-processor system as defined earlier.]

More detail on various topics related to running parallel jobs with Wind-US is given in the following sections.

Command Line Options

The following wind script options are directly applicable to running Wind-US in parallel mode.

	`-(no)parallel`		Specifying `-parallel` indicates that the job is to be run in parallel mode, while `-noparallel` indicates serial mode. Parallel mode requires a multi-processing control (.mpc) file. If `-parallel` is specified, but an .mpc file doesn't exist, the user will be asked if serial mode should be used. If neither `-parallel` nor `-noparallel` is specified, and an .mpc file exists, the user will be asked if parallel mode should be used.
	`-mpmode` mode		Message passing mode to be used when running in parallel: either PVM or MPI. To use MPI message passing, MPI must be pre-installed on your system(s) (unlike PVM, MPI is not distributed with Wind-US), and you'll need to compile an executable that includes links to the MPI library. See Installing the Build Distribution in the Wind-US Installation Guide for instructions on creating the executable. The default message passing mode is MPI (if available), otherwise PVM will be used.
	`-(no)usessh`		When `-usessh` is specified, ssh/scp remote shell/copy commands will be used when copying files between systems when running in parallel mode on a clusters or distributed systems. The default is to use ssh/scp. For more details see the discussion of remote shell commands for parallel processing.
	`-(no)mp`		Specifying `-mp` indicates that a multi-processor machine (i.e., a single machine with multiple processors) is being used, with either PVM or MPI message passing. The default is `-nomp`.
	`-(no)cl`		Specifying `-cl` indicates that a cluster machine (i.e., a network of multiple machines) is being used, with either PVM or MPI message passing. The default is `-nocl`.
	`-nzones` number		Number of zones, used in MPI message passing mode. If not specified, the Wind-US utility mpigetnzone will be used to get the number of zones from the common grid (.cgd) file. If mpigetnzone is not installed, the user will be prompted for the number of zones.

Zone Size Considerations

Because synchronization takes place at the end of each cycle, total throughput is established by the processor that takes the longest to complete its assigned work. The optimum situation is to have all zones of equal size and have one processor for each zone. This gives maximum throughput and processor utilization, but is generally not achievable. If all zones cannot be close to the same size, a mixture of sizes is preferable. The case to avoid is a configuration with one zone of comparable size to the sum of the remaining zones. In this case, one can achieve at most a factor-of-two performance improvement regardless of the number of processors used. In general, if n is the number of points in the largest zone and N is the total number of points, the maximum possible speed up is N/n (assuming identical processors and similar algorithm specification).

Given a number of processors P with relative speeds p_i (larger p implies faster), and a number of zones N of sizes n_j, the assignment of work is done as follows.

Assign the largest zone j to processor 1 and compute T₁ = n_j / p₁.
Repeat step 1 for the remaining P − 1 processors, assigning the largest remaining zone j to processor i and compute T_i = n_j / p_i.
If any zones remain to be assigned, locate processor i such that T_i is a minimum. Assign the largest remaining zone j to processor i, computing T_i = T_i + n_j / p_i.
Repeat step 3 for remaining unassigned zones.

Consider adding processors if T for any processor is significantly larger than the others, and that processor has more than one zone assigned.

The list output (.lis) file will indicate what zones are assigned to what processor, and will have a report showing the utilization of each processor.

Checkpointing and Fault Tolerance

The flow (.cfl) file contains the computed flow field results for all the zones, and is stored on the master. Since in parallel mode the zones are solved on separate processors, it's necessary to periodically update the .cfl file on the master. By default, the frequency for doing this is once an hour (wall clock time), but this may be changed using the checkpoint directive in the .mpc file.

If a worker system fails due to either a system or network failure during the course of a run, the job will restart from the last checkpoint without the failed system. The automatic restart ability will be invoked as many times as necessary during a job until no more systems are available.

Intermediate Processing

At each checkpoint, the existing .cfl file is overwritten with the updated flow field. During long-running parallel jobs (or serial jobs, for that matter), it's sometimes desirable to do some intermediate processing, to examine how the solution is evolving, or to save snapshots of the results in an unsteady calculation.

The easiest way to do this is by using the SPAWN keyword, which allows user-specified processes to be run at user-specified intervals. Wind-US will temporarily stop while a spawned process is running, and continue when it finishes. One typical use of this capability is to run a user-written shell script that creates intermediate results from information in the .cfl file. The description of the SPAWN keyword includes an example showing how to save intermediate .cfl files for later post-processing.

By default, the .cfl file is automatically updated before starting the spawned process. This is in addition to the update of the .cfl file that's done at each checkpoint. Thus, if the SPAWN keyword is used, you may want to eliminate the normal checkpointing by specifying checkpoint none in the .mpc file. To monitor the convergence of fluxes or forces for particular surfaces, the LOADS keyword in Wind-US is far more efficient than spawning external processes.

Multiple Parallel Jobs

When running in PVM parallel mode on a cluster or collection of distributed systems, the master system and all worker systems being used by a given user cannot be used by any other parallel job from the same user as long as the first job is active. A different user, however, can have a parallel job running simultaneously on the same systems, assuming that the memory, disk space, etc., are sufficient to support multiple jobs. If the queuing system on your cluster assigns dedicated resources such that no other job will have access to them, then a single user can run multiple parallel jobs at the same time.

Note that in master mode the same originating system may be used to launch multiple parallel jobs, as long as the specified master and workers for each job don't overlap.

There are no restrictions on the number of parallel jobs for a given user on a multi-processor system (i.e., using the -mp option to the wind script), again assuming that the computer resources are available to support multiple jobs.

Stopping a Job

When a parallel Wind-US job finishes, the results files are updated on the master, various temporary files are removed on both the master and workers, and the run ends. If PVM message passing was used, PVM is stopped on the master and on all workers.

The methods for specifying when a parallel Wind-US run should stop are the same as for serial runs.

The job will automatically end when the number of cycles specified by the user have been completed, or the solution converges.
For non-interactive jobs, either a stop time or run time (depending on the queueing system being used) may be specified when the job is started using the wind script.
An NDSTOP file may be created in the Wind-US run directory to stop the job.
A WINDCTRL file may be created in the Wind-US run directory to modify or stop the job.

Because parallel jobs are often run during off-shift hours, using systems that are needed for other work during normal hours, scripts are supplied with Wind-US that may be executed by the Unix cron process to assure that jobs aren't inadvertantly run beyond a certain time. In the Wind-US distribution, these scripts are in the directory wind/bin/pvmkill. Four files are located there:

	cronkill		This file tells the continuous running job scheduler when to terminate processes. The first two digits on each line are the minute, the third digit is the hour, and following the `*`'s are the days when each of the commands will be executed (Monday = 1). The first command is the "nicest" way to kill the job, with the following two successively harsher. Note that this file must be edited so that output goes to your directory and the paths for the scripts are correct.
	pvmclean		A script which terminates jobs in a relatively nice fashion.
	naskill		A script which terminates jobs in a bit harsher fashion.
	naspvmkill		A script which terminates jobs in the meanest fashion.

To invoke these processes, copy the above scripts to each master you're using, edit cronkill appropriately, and insert these processes into the crontab on each master by entering

   crontab cronkill

[Depending on how your system is configured, use of crontab may require root access.] To check if this worked, enter

   crontab -l

which will give a list of all your cron entries.

Multi-Processors vs Clusters and Distributed Systems

Experience has shown that the differences in the procedures for running on a multi-processor system (i.e., a single system with multiple CPUs), and on a cluster or collection of distributed systems, can be confusing. The following table is an attempt to summarize the differences.

	Multi-Processor	Cluster/Distributed Systems
Definition	Single system with multiple CPUs	Networked systems (with or without NFS-mounted home directories)
Message Passing	MPI or PVM	MPI or PVM
wind Option for Machine Type	`-mp`	`-cl`
wind Option for MPI	`-mpmode MPI`	`-mpmode MPI`
wind Option for PVM	`-mpmode PVM`	`-mpmode PVM`
*Host List in .mpc* File**	One `host` line, with `nproc` > 1	Multiple `host` lines, typically one per machine, with `nproc` > 1 for each multiprocessor system
Multiple Jobs OK?	Yes	Yes, but each PVM job requires dedicated master/worker resources

Multi-Processing Control File

The multi-processing control file specifies the hosts that will be available as well as some miscellaneous options. If the Wind-US input data file name is input.dat, the name of the multi-processing control file must be input.mpc. When this file is present, the wind script will ask the user if they really want to use multi-processing mode. [As noted earlier, the wording of this prompt, and the terminology "multi-processing control file", is unfortunate. It really applies to all parallel jobs, not just those on a multi-processor system as defined earlier.]

Comments may be included in the file with the normal Wind-US comment indicator "/", or additionally "#". Blank lines are ignored. Trailing comments are not allowed. The formats of the directives follow.

host {localhost | name} [nproc n]

host directives specify the names of the worker systems (given by the name parameter) that will be used to process zones. In general, there should be one host directive for each worker system. If a particular system appears more than once, each occurrence is treated as a unique system and will process assigned zones simultaneously. This is not advisable unless the system has multiple processors and sufficient memory.

As noted earlier, Wind-US uses the Unix hostname command to determine a system's name. Thus, in the host directive the specified name must be the same as the name returned by the hostname command on that system. For example, for a system with the fully-qualified name "workerbee.bigcompany.com", if hostname returns just the machine name "workerbee", then workerbee should be used for name in the .mpc file. If hostname returns the fully-qualified name "workerbee.bigcompany.com", then workerbee.bigcompany.com should be used for name.

The optional parameter nproc n may be used to specify the number of processes to allow to run in parallel on the specified host. It is equivalent to repeating the host directive n times.

If no host entries appear in the multi-processing control file, the originating system will automatically be selected as the only host. When used on a system with sufficient memory and the assignment mode dedicated directive, the normal I/O associated with a single processor solution will be eliminated (except for checkpoints).

The special parameter localhost is used when running on a multi-processing system and the system name is unknown at the time of job submittal, such as for batch systems (like NQE) that can spawn to multiple systems or clusters of servers. Using localhost is preferred over not putting in any host directives because it assures that the scripts set up Wind-US consistently.

host entries should appear in the file in decreasing order of computational power. The most computationally intensive zones will be assigned to the highest entries in the list.

The system that originates the job is not automatically included in the host list. If it is desired to also assign solution tasks to the originating system, it should have a host entry like any other system. For estimating purposes, the master process typically consumes less than one percent of the CPU time on the master host.

When rsh/rcp remote shell commands are used, the maximum number of hosts that may be specified is limited to 25, due to timeout issues that can occur with rsh/rcp. If more than 25 hosts are to be used, ssh/scp must be used for communication between the master and workers.

#master master_host [run_dir]

This directive specifies the use of master mode, which allows a system different from the originating system to be used as the master. [Note that the #master directive is an exception to the use of # as a comment indicator.] The input parameter master_host specifies the name of the master system, and run_dir specifies the run directory to be used on the master. Like the system names in the host directive, the specified master_host must be the same as the name returned by the hostname command on that system.

If run_dir is not specified, the job will be run in the subdirectory logname, where logname is the user's login name, under a parent directory chosen from the following, in the order listed.

PVM_TEMP, if the environment variable PVM_TEMP is defined
/lscratch, if it exists
/scratch*, if it exists, where * matches any string of 0 or more characters
/data/local, if it exists
/tmp, if it exists

When the job finishes, the output files (i.e., the .cfl file, .lis file, etc.) are copied back to the originating system, and everything is deleted from the run directory on the master, and the workers.

When master mode is used, the -runinplace wind script option is automatically set. Master mode may not be used in debug mode (i.e., with the -debug wind script option).

i/o {direct | indirect}

This directive specifies the type of I/O access that worker systems have to files on the master. The default is indirect, which means that workers do not have access to the files on the master, and that file I/O must therefore be done using message passing to/from the master process.

On multi-processor systems, however, i/o direct may be used to indicate that the worker processes may access the files directly, bypassing communication through the master process. This significantly reduces communication overhead and increases performance by as much as 10-40%.

There are a couple of things to be aware of when using the i/o direct option. First, it should only be used when running on a multi-processor system, not with clusters or distributed systems. And second, the maximum number of open files per process that is allowed (an operating system limit) must be large enough.

communication {direct | indirect}

The communication directive specifies how messages and boundary condition data are sent between worker systems. The default is direct, meaning that workers are able to communicate directly, without going through the master. Specifying indirect means that communication between workers must go through the master.

communication direct may not be used with assignment mode transient.

If the Wind-US executable was built using the -DF90 Fortran compiler option (necessary with compilers that don't support allocatable components in derived types), specifying communication direct has no effect; communication indirect is automatically used.

packmode {memory | memoryxdr | pointer | pointerxdr}

This directive specifies the packing mode used when transferring data between the master and workers.

	`memory`		The data to be transferred is sent just as it is stored in memory on the local machine, and not XDR (External Data Representation) encoded. Thus, all the systems must use the same internal data format.
	`memoryxdr`		This mode only applies to PVM message passing, and specifies that the data being transferred is to be XDR encoded, allowing the systems to have different internal data formats.
	`pointer`		The data being transferred is copied directly from memory, instead of being first copied into a send buffer. During the packing process, the amount of data to be sent is determined, and pointers are used to identify the data itself. This is similar to the `memory` option, in that the data is not XDR encoded before being sent, but should be faster. [This mode currently doesn't work on Linux systems with Intel compilers, due to a problem with character pointers and array temporaries.]
	`pointerxdr`		This mode is currently the same as `memoryxdr`.

The default packing mode is memory for PVM message passing, and pointer for MPI message passing.

Note that when memory or pointer is used, since the data is not XDR encoded, the master and all workers must have the same internal data format. If a parallel job is being run on a collection of distributed systems with different internal data formats, the directive packmode memoryxdr must be specified in the .mpc file.

checkpoint {[every] {time minutes | count cycles} | none}

This directive specifies how often the worker systems transfer their flow field information to the flow file on the master system. In the event of a failure, the solution is automatically restarted from the last checkpoint. Specifying too small a number can result in very high network overhead and low throughput. A large number improves performance but can cause wasted time if a lot of network failures occur. If checkpoint none is specified, the flow field information is updated only at the end of the job. The default value is

   checkpoint every 60 minutes

Note that if the SPAWN keyword is used in the input data (.dat) file, the flow field information is also updated before each spawned process, unless the NOCHECKPOINT option is specified.

assignment mode {dedicated | shared | transient | combined}

assignment mode controls how tasks are assigned to processors. There may be multiple appearances of this directive. Each one affects subsequent host entries up to the next assignment mode directive. A description of each mode follows.

	`dedicated`		Each task (zone) gets a unique Unix process on the target system. If a system must process more than one zone, each will have a separate process, but only one will be allowed to run at a time unless multiple `host` entries are present for the system. This is the default mode and should not be changed unless there is insufficient memory and swap space for the processes assigned to the host.
	`shared`		Unless a system must process more than one zone, this mode is the same as `dedicated`. If more than one zone must be processed, only one Unix process is allocated and data for individual zones is swapped to and from local disk on the target system. This mode should be used only if the target system does not have sufficient memory and swap space to contain the zones it needs to process.
	`transient`		This is similar to `shared` mode, in that when a system must process more than one zone, only one Unix process is allocated. However, data for individual zones is written back to the master processor instead of the local disk.
	`combined`		Like `shared` and `transient` modes, when a system must process more than one zone, only one Unix process is allocated. However, instead of writing data for individual zones to the local disk or to the master, all zonal data is kept in memory.

task mode [dynamic | static]

When there are more tasks (i.e., zones) than processors, this directive may be used to pre-determine which tasks are assigned to which processors.

The default procedure (i.e., when task mode is not used, or task mode dynamic is specified) is to start by ordering the tasks by the estimated amount of computational work each will require. Then, for n processors, the task requiring the most work is assigned to the processor specified by the first host directive in the .mpc file, the task requiring the second-most work is assigned to the second processor, etc., until the first n tasks have been assigned to the n processors. When a processor finishes its task, the next task in the queue (i.e. task n + 1) is assigned to that processor. And so on, until all tasks have been assigned.

When task mode static is specified, the tasks are again ordered by the estimated work required. The first n tasks are assigned to the first n processors, just as above. But then, instead of waiting for a processor to become idle before assigning the next task, we continue assigning tasks, now starting with processor n and moving upwards in the list of processors. I.e., task n + 1 is assigned to processor n, task n + 2 is assigned to processor n − 1, etc. This continues back and forth until all tasks have been assigned.

Note that if there are the same number of processors (or more) as there are tasks, this directive has no effect.

route {indirect | direct}

Controls how data is routed between the master and worker tasks when PVM message passing is used on a cluster or collection of distributed systems. This directive does not apply to MPI message passing, or to multi-processor systems.

	`indirect`		All data goes from the task on the local machine to the local PVM daemon, over the network to the remote PVM daemon, which forwards it to the remote task. This is the standard data transfer procedure in PVM, uses UDP (User Datagram Protocol), and is scalable.
	`direct`		All data goes directly from the local task to the remote task, bypassing the PVM daemons. This is implemented by setting the PVM option `PvmRoute` to `PvmRouteDirect`, and uses TCP (Transmission Control Protocol) for transferring data. This takes more time to initially set up the TCP links, but is faster for subsequent data transfers. It should be noted that this procedure is not scalable, and may fail if the number of zones is large. (Each TCP link requires a file descriptor, and the total number of file descriptors that is allowed is limited by the operating system.) However, if a direct link cannot be established, the indirect procedure through the PVM daemons will automatically be used.

#LOADLIMIT limit

The Wind-US initialization script will automatically eliminate workers that are deemed "too busy." A system is defined to be "too busy" when its 15 minute load factor, as reported by the Unix uptime command (the last number on the line) is greater than a certain limit (0.60 by default). [Note that the load factor is checked only at initialization time and not during the course of a run.]

The load factor for each worker will be displayed in the list output file at the top with the other messages that occur during the preparation of the workers. The load factor will be displayed as a percentage (0.60 corresponds to 60%). Note that load factors in excess of 100% are possible. A message will also be displayed if the load factor exceeds the allowed threshold.

Occasionally, there is a problem with the uptime command and it reports a high load factor when there is no load on the system. To avoid this problem, the #LOADLIMIT directive may be used to override the default value of 0.60. [Note that this directive is an exception to the use of # as a comment indicator.] The parameter limit specifies the load limit for all hosts up to the next #LOADLIMIT directive. A #LOADLIMIT directive with no parameters restores the default load limit. This command should only be used when you know that including an overloaded host will not affect your job.

The following example illustrates the use of the #LOADLIMIT directive in the multi-processing control file.

   / Next statement considers hosts ws1463 and ws1464 loaded
   /    only if their load factor exceeds 100%
   #loadlimit 100
   host ws1463
   host ws1464
   / The next statement restores the default load limit
   #loadlimit
   host ws1465
   / Use a really high limit for ws1466 - disables the limit check
   #loadlimit 9999
   host ws1456

Another way to modify the default load limit is to set the PVM_LOAD_LIMIT environment variable before you submit your job. For example, csh/tcsh users could do:

   setenv PVM_LOAD_LIMIT 75

Last updated 30 Sep 2016