Skip navigation links
(WIND Documentation Home Page) (WIND User's Guide) (GMAN User's Guide) (MADCAP User's Guide) (CFPOST User's Guide) (WIND Utilities) (Common File User's Guide) (WIND Installation Guide) (WIND Developer's Reference) (Guidelines Documents)

(Introduction) (Tutorial) (Geometry and Flow Physics Modeling) (Numerical Modeling) (Boundary Conditions) (Convergence Monitoring) (Files) (Scripts) (Parallel Processing) (Keyword Reference) (Test Options)

Parallel Processing

The time required to compute a solution can be reduced by using the distributed parallel processing capability of WIND. WIND can simultaneously use multiple systems connected via a network as though they were a single computer. The systems are typically workstation class machines and need not be all of the same vendor.

When operating in distributed parallel processing mode, the system on which the job originates is called the master system. All other systems are called worker systems. WIND distributes complete zones to participating worker systems. Each zone is solved in parallel with other zones on other systems. The systems exchange boundary information during each cycle in order to propagate information throughout the computational domain. There may be fewer systems than zones to be computed, in which case, a system will be assigned another zone when it finishes its current assignment. The user specifies the names of the participating worker systems via the multi-processing control file.

If a worker system fails due to either a system or network failure during the course of a run, the job will restart from the last checkpoint (see the checkpoint directive) without the failed system. The automatic restart ability will be invoked as many times as necessary during a job until no more systems are available.

Multi-Processing Control File

The multi-processing control file specifies the hosts that will be available as well as some miscellaneous options. The presence of a valid multi-processing control file is all that is required to utilize the distributed parallel processing capability of WIND. If the main WIND input data file name is input.dat, the name of the multi-processing control file must be input.mpc. When this file is present, the wind script will ask the user if they really want to use multi-processing mode.

Comments may be included in the file with the normal WIND comment indicator "/", or additionally "#". Blank lines are ignored. Trailing comments are not allowed. The formats of the directives follow.

host {localhost | hostname} [nproc n]

host directives specify the names of the worker systems (given by the hostname parameter) that will be used to process zones. In general, there should be one host directive for each worker system. If a particular system appears more than once, each occurrence is treated as a unique system and will process assigned zones simultaneously. This is not advisable unless the system has multiple processors and sufficient memory.

The optional parameter nproc n may be used to specify the number of processes to allow to run in parallel on the specified host. It is equivalent to repeating the host directive n times.

If no host entries appear in the multi-processing control file, the originating system will automatically be selected as the only host. When used on a system with sufficient memory and the assignment mode dedicated directive, the normal I/O associated with a single processor solution will be eliminated (except for checkpoints).

The special parameter localhost is used when running on a single multi-processing system and the system name is unknown at the time of job submittal, such as for batch systems (like NQE) that can spawn to multiple systems or clusters of servers. Using localhost is preferred over not putting in any host directives because it assures that the scripts set up WIND consistently.

host entries should appear in the file in decreasing order of computational power. WIND assigns the most computationally intensive zones to the highest entries in the list.

The system that originates the job is not automatically included in the host list. If it is desired to also assign solution tasks to the originating system, it should have a host entry like any other system. For estimating purposes, the master process typically consumes less than one percent of the CPU time on the master host.

i/o {direct | indirect}

This directive specifies the type of access that worker systems have to files on the master. The default is indirect, which means that workers do not have access to the files on the master, and that file I/O must therefore be done using message passing to/from the master process.

On multi-processor systems, however, i/o direct may be used to indicate that the worker processes may access the files directly, bypassing communication through the master process. This significantly reduces communication overhead and increases performance by as much as 10-40%.

There are a couple of things to be aware of when using the i/o direct option. First, it should only be used when running on a single multi-processor system, not with workstation clusters. And second, the number of open files per process must be set large enough.

checkpoint {[every] {time minutes | count cycles} | none}

This directive specifies how often the worker systems transfer their flow field information to the flow file on the master system. In the event of a failure, the solution is automatically restarted from the last checkpoint. Specifying too small a number can result in very high network overhead and low throughput. A large number improves performance but can cause wasted time if a lot of network failures occur. If checkpoint none is specified, the flow field information is updated only at the end of the job. The default value is:

   checkpoint every 60 minutes
assignment mode {dedicated | shared | transient}

assignment mode controls how tasks are assigned to target systems. There may be multiple appearances of this directive. They affect subsequent host entries up to the next assignment mode directive. A description of each mode follows:

dedicated
Each task (zone) gets a unique Unix process on the target system. If a system must process more than one zone, each will have a separate process, but only one will be allowed to run at a time unless multiple host entries are present for the system. This is the default mode and should not be changed unless there is insufficient swap space for the processes assigned to the host.

shared
Unless a system must process more than one zone, this mode is the same as dedicated. If more than one zone must be processed, only one Unix process is allocated and individual zones are swapped to and from local disk on the target system. This mode should be used only if the target system does not have sufficient swap space to contain the zones it needs to process.

transient
When a task completes, it writes all of its flow information back to the master processor and terminates. A new Unix process is then started for the next cycle. This mode is for future use and should not be specified.
route [indirect | direct]

Controls how data is routed between the master and worker tasks.

indirect
All messages go from the task on the local machine to the local PVM daemon, over the network to the remote PVM daemon, which forwards it to the remote task.

direct
All messages go directly from task to task, bypassing the PVM daemons. This mechanism should be more efficient, but cannot be used when the number of zones is large.
#LOADLIMIT limit

The WIND/PVM initialization script will automatically eliminate workers that are deemed "too busy." A system is defined to be "too busy" when its 15 minute load factor, as reported by the Unix "uptime" command (the last number on the line) is greater than a certain limit (0.60 by default). The purpose of this check is to eliminate hosts that are occupied by hung processes. [Note that the load factor is checked only at initialization time. If the system becomes busy during a run ... well ... too bad.]

The load factor for each worker will be displayed in the list output file at the top with the other messages that occur during the preparation of the workers. The load factor will be displayed as a percent (0.60 corresponds to 60%). Note that the load factors in excess of 100% are possible. A message will also be displayed if the load factor exceeds the allowed threshold.

Occasionally, there is a problem with the "uptime" command and it reports a high load factor when there is no load on the system. To avoid this problem, the #LOADLIMIT directive may be used to override the default value of 0.60. [Note that this directive is an exception to the use of # as a comment indicator.] The parameter limit specifies the load limit for all hosts up to the next #LOADLIMIT directive. A #LOADLIMIT directive with no parameters restores the default load limit. This command should only be used when you know that including an overloaded host will not affect your job.

The following example illustrates the use of the #LOADLIMIT directive in the multi-processing control file.

   # Next statement considers hosts ws1463 and ws1464 loaded
   #    only if their load factor exceeds 100%
   #loadlimit 100
   host ws1463
   host ws1464
   # The next statement restores the default load limit
   #loadlimit
   host ws1465
   # Use a really high limit for ws1466 - disables the limit check
   #loadlimit 9999
   host ws1456

Another way to modify the default load limit is to set the PVM_LOAD_LIMIT environment variable before you submit your job. For example, using the Bourne shell:

   $ PVM_LOAD_LIMIT=75
   $ export PVM_LOAD_LIMIT
   $ wind

Multi-Processing Script Considerations

Unix Security Considerations

The network communication software uses remote shell commands to start the worker tasks. This implies the user must have a valid account on the worker system. In addition, the user name on the master system and all worker systems must be the same. [By "remote shell commands" we mean rsh (remsh on HP and Cray systems) and rcp; or, if the ssh (secure shell) software is used, ssh and scp. Note that if ssh is to be used, that must be specified during the creation of the PVM and WIND executables. See the WIND Installation Guide for more information.]

To allow remote shell commands to access a worker system, the host name of the master system must be in the file .rhosts in the user's home directory on the worker system, or in the system file hosts.equiv on the worker system. [If ssh is being used, the corresponding file names are .shosts and shosts.equiv. These files are used in exactly the same way as .rhosts and hosts.equiv and on a non-ssh system.] Note that this is required even if the master and worker are the same system.

The .rhosts file is a text file containing a list of system names, and the userids on each of those systems, that are allowed to access the current host via a remote shell command. The file should have its permissions set to rw-------, so issue the following command after creating the file:

   chmod 600 .rhosts

Once the .rhosts file has been created, it may be tested by issuing the following command from the system where the job will be submitted. [Use the command remsh or ssh instead of rsh, if that's appropriate for your system.]

   rsh worker-name ls -la

Things are functioning properly if the directory listing appears. In many situations the users home directory is located on a file server, so only one .rhosts file needs to be maintained.

More information about .rhosts files may be found by entering man rhosts on most Unix systems.

Ensuring PVM Jobs Stop at a Specific Time

To avoid the possibility of a PVM job continuing outside an allotted off-shift window, a series of scripts can be executed by the Unix cron process. In the WIND distribution, these scripts are in the directory wind/cfd/bin/pvmkill. Four files are located there:

  1. cronkill - This file tells the continuous running job scheduler when to terminate processes. The first two digits on each line are the minute, the third digit is the hour, and following the *'s are the days when each of the commands will be executed (Monday = 1). The first command is the "nicest" way to kill the job, with the following two successively harsher. Note that this file must be edited so that output goes to your directory and the paths for the scripts are correct.
  2. pvmclean - A script which terminates jobs in a relatively nice fashion.
  3. naskill - A script which terminates jobs in a bit harsher fashion.
  4. naspvmkill - A script which terminates jobs in the meanest fashion.

To invoke these processes, copy the above scripts to each master you're using, edit cronkill appropriately, and insert these processes into the crontab on each master by entering

   crontab cronkill
To check if this worked, enter
   crontab -l
which will give a list of all your cron entries.

Multiple Parallel Jobs

When running in parallel mode on a cluster of workstations, the master system and all worker systems being used by a given user cannot be used by any other parallel job from the same user as long as the first job is active. A different user, however, can have a parallel job running simultaneously on the same systems, assuming that the memory, disk space, etc., are sufficient to support multiple jobs.

There are no restrictions on the number of parallel jobs for a given user on a multi-processor system (i.e., using the -mp option to the wind script), again assuming that the computer resources are available to support multiple jobs.

Hints

Because synchronization takes place at the end of each cycle, total throughput is established by the processor that takes the longest to complete its assigned work. The optimum situation is to have all zones of equal size and have one processor for each zone. This gives maximum throughput and processor utilization, but is generally not achievable. If all zones cannot be close to the same size, a mixture of sizes is preferable. The case to avoid is a configuration with one zone of comparable size to the sum of the remaining zones. In this case, one can achieve at most a factor-of-two performance improvement regardless of the number of processors used. In general, if n is the number of points in the largest zone and N is the total number of points, the maximum possible speed up is N/n (assuming identical processors and similar algorithm specification).

Given a number of processors P with relative speeds pi (larger p implies faster), and a number of zones N of sizes nj, the assignment of work is done as follows:

  1. Assign the largest zone j to processor 1 and compute T1 = nj / p1.
  2. Repeat step 1 for the remaining P - 1 processors, assigning the largest remaining zone j to processor i and compute Ti = nj / pi.
  3. If any zones remain to be assigned, locate processor i such that Ti is a minimum. Assign the largest remaining zone j to processor i, computing Ti = Ti + nj / pi.
  4. Repeat step 3 for remaining unassigned zones.

Consider adding processors if T for any processor is significantly larger than the others, and that processor has more than one zone assigned.

The output file from a run will indicate what zones are assigned to what processor, and will have a report containing the utilization of each processor.