The time required to compute a solution can be reduced by using the distributed parallel processing capability of WIND. WIND can simultaneously use multiple systems connected via a network as though they were a single computer. The systems are typically workstation class machines and need not be all of the same vendor.
When operating in distributed parallel processing mode, the system on which the job originates is called the master system. All other systems are called worker systems. WIND distributes complete zones to participating worker systems. Each zone is solved in parallel with other zones on other systems. The systems exchange boundary information during each cycle in order to propagate information throughout the computational domain. There may be fewer systems than zones to be computed, in which case, a system will be assigned another zone when it finishes its current assignment. The user specifies the names of the participating worker systems via the multi-processing control file.
If a worker system fails due to either a system or network failure during the course of a run, the job will restart from the last checkpoint (see the checkpoint directive) without the failed system. The automatic restart ability will be invoked as many times as necessary during a job until no more systems are available.
The multi-processing control file specifies the hosts that will be available as well as some miscellaneous options. The presence of a valid multi-processing control file is all that is required to utilize the distributed parallel processing capability of WIND. If the main WIND input data file name is input.dat, the name of the multi-processing control file must be input.mpc. When this file is present, the wind script will ask the user if they really want to use multi-processing mode.
Comments may be included in the file with the normal WIND comment
indicator "/", or additionally "#".
Blank lines are ignored.
Trailing comments are not allowed.
The formats of the directives follow.
| host {localhost | hostname} [nproc n] |
host directives specify the names of the worker systems (given by the hostname parameter) that will be used to process zones. In general, there should be one host directive for each worker system. If a particular system appears more than once, each occurrence is treated as a unique system and will process assigned zones simultaneously. This is not advisable unless the system has multiple processors and sufficient memory.
The optional parameter nproc n may be used to specify the number of processes to allow to run in parallel on the specified host. It is equivalent to repeating the host directive n times.
If no host entries appear in the multi-processing control file, the originating system will automatically be selected as the only host. When used on a system with sufficient memory and the assignment mode dedicated directive, the normal I/O associated with a single processor solution will be eliminated (except for checkpoints).
The special parameter localhost is used when running on a single multi-processing system and the system name is unknown at the time of job submittal, such as for batch systems (like NQE) that can spawn to multiple systems or clusters of servers. Using localhost is preferred over not putting in any host directives because it assures that the scripts set up WIND consistently.
host entries should appear in the file in decreasing order of computational power. WIND assigns the most computationally intensive zones to the highest entries in the list.
The system that originates the job is not automatically included in
the host list.
If it is desired to also assign solution tasks to the
originating system, it should have a host entry like any other system.
For estimating purposes, the master process typically consumes less than
one percent of the CPU time on the master host.
| i/o {direct | indirect} |
This directive specifies the type of access that worker systems have to files on the master. The default is indirect, which means that workers do not have access to the files on the master, and that file I/O must therefore be done using message passing to/from the master process.
On multi-processor systems, however, i/o direct may be used to indicate that the worker processes may access the files directly, bypassing communication through the master process. This significantly reduces communication overhead and increases performance by as much as 10-40%.
There are a couple of things to be aware of when using the i/o direct
option.
First, it should only be used when running on a single multi-processor
system, not with workstation clusters.
And second, the number of open files per process must be set large enough.
| checkpoint {[every] {time minutes | count cycles} | none} |
This directive specifies how often the worker systems transfer their flow field information to the flow file on the master system. In the event of a failure, the solution is automatically restarted from the last checkpoint. Specifying too small a number can result in very high network overhead and low throughput. A large number improves performance but can cause wasted time if a lot of network failures occur. If checkpoint none is specified, the flow field information is updated only at the end of the job. The default value is:
checkpoint every 60 minutes
| assignment mode {dedicated | shared | transient} |
assignment mode controls how tasks are assigned to target systems. There may be multiple appearances of this directive. They affect subsequent host entries up to the next assignment mode directive. A description of each mode follows:
| route [indirect | direct] |
Controls how data is routed between the master and worker tasks.
| #LOADLIMIT limit |
The WIND/PVM initialization script will automatically eliminate workers that are deemed "too busy." A system is defined to be "too busy" when its 15 minute load factor, as reported by the Unix "uptime" command (the last number on the line) is greater than a certain limit (0.60 by default). The purpose of this check is to eliminate hosts that are occupied by hung processes. [Note that the load factor is checked only at initialization time. If the system becomes busy during a run ... well ... too bad.]
The load factor for each worker will be displayed in the list output file at the top with the other messages that occur during the preparation of the workers. The load factor will be displayed as a percent (0.60 corresponds to 60%). Note that the load factors in excess of 100% are possible. A message will also be displayed if the load factor exceeds the allowed threshold.
Occasionally, there is a problem with the "uptime" command and it reports a high load factor when there is no load on the system. To avoid this problem, the #LOADLIMIT directive may be used to override the default value of 0.60. [Note that this directive is an exception to the use of # as a comment indicator.] The parameter limit specifies the load limit for all hosts up to the next #LOADLIMIT directive. A #LOADLIMIT directive with no parameters restores the default load limit. This command should only be used when you know that including an overloaded host will not affect your job.
The following example illustrates the use of the #LOADLIMIT directive in the multi-processing control file.
# Next statement considers hosts ws1463 and ws1464 loaded # only if their load factor exceeds 100% #loadlimit 100 host ws1463 host ws1464 # The next statement restores the default load limit #loadlimit host ws1465 # Use a really high limit for ws1466 - disables the limit check #loadlimit 9999 host ws1456
Another way to modify the default load limit is to set the PVM_LOAD_LIMIT environment variable before you submit your job. For example, using the Bourne shell:
$ PVM_LOAD_LIMIT=75 $ export PVM_LOAD_LIMIT $ wind
The network communication software uses remote shell commands to start the worker tasks. This implies the user must have a valid account on the worker system. In addition, the user name on the master system and all worker systems must be the same. [By "remote shell commands" we mean rsh (remsh on HP and Cray systems) and rcp; or, if the ssh (secure shell) software is used, ssh and scp. Note that if ssh is to be used, that must be specified during the creation of the PVM and WIND executables. See the WIND Installation Guide for more information.]
To allow remote shell commands to access a worker system, the host name of the master system must be in the file .rhosts in the user's home directory on the worker system, or in the system file hosts.equiv on the worker system. [If ssh is being used, the corresponding file names are .shosts and shosts.equiv. These files are used in exactly the same way as .rhosts and hosts.equiv and on a non-ssh system.] Note that this is required even if the master and worker are the same system.
The .rhosts file is a text file containing a list of system names, and the userids on each of those systems, that are allowed to access the current host via a remote shell command. The file should have its permissions set to rw-------, so issue the following command after creating the file:
chmod 600 .rhosts
Once the .rhosts file has been created, it may be tested by issuing the following command from the system where the job will be submitted. [Use the command remsh or ssh instead of rsh, if that's appropriate for your system.]
rsh worker-name ls -la
Things are functioning properly if the directory listing appears. In many situations the users home directory is located on a file server, so only one .rhosts file needs to be maintained.
More information about .rhosts files may be found by entering man rhosts on most Unix systems.
To avoid the possibility of a PVM job continuing outside an allotted off-shift window, a series of scripts can be executed by the Unix cron process. In the WIND distribution, these scripts are in the directory wind/cfd/bin/pvmkill. Four files are located there:
To invoke these processes, copy the above scripts to each master you're using, edit cronkill appropriately, and insert these processes into the crontab on each master by entering
crontab cronkillTo check if this worked, enter
crontab -lwhich will give a list of all your cron entries.
When running in parallel mode on a cluster of workstations, the master system and all worker systems being used by a given user cannot be used by any other parallel job from the same user as long as the first job is active. A different user, however, can have a parallel job running simultaneously on the same systems, assuming that the memory, disk space, etc., are sufficient to support multiple jobs.
There are no restrictions on the number of parallel jobs for a given user on a multi-processor system (i.e., using the -mp option to the wind script), again assuming that the computer resources are available to support multiple jobs.
Because synchronization takes place at the end of each cycle, total throughput is established by the processor that takes the longest to complete its assigned work. The optimum situation is to have all zones of equal size and have one processor for each zone. This gives maximum throughput and processor utilization, but is generally not achievable. If all zones cannot be close to the same size, a mixture of sizes is preferable. The case to avoid is a configuration with one zone of comparable size to the sum of the remaining zones. In this case, one can achieve at most a factor-of-two performance improvement regardless of the number of processors used. In general, if n is the number of points in the largest zone and N is the total number of points, the maximum possible speed up is N/n (assuming identical processors and similar algorithm specification).
Given a number of processors P with relative speeds pi (larger p implies faster), and a number of zones N of sizes nj, the assignment of work is done as follows:
Consider adding processors if T for any processor is significantly larger than the others, and that processor has more than one zone assigned.
The output file from a run will indicate what zones are assigned to what processor, and will have a report containing the utilization of each processor.