Parallelization and Networking Issues
The treatment of complex geometries has led us to adopt a multi-block grid made of several structures w ith overlapping or patched domains. The method of distributing domains over processors is important in that it leads to typical load balancing problems and synchronization of waiting time. Issues that need to be addressed are: flexibility in load balancing, reading and generating block data, reading and generating interface data, and updating block and interface data during communication. The advantage in using the multi-block approach is that one can have more than one block solver (for example. Euler. Navier-Stokes. etc.) in different blocks depending on the complexity of the flow field in a given block, thereby improving the overall efficiency of the algorithm. The most important parallelization issue for CFD applications is the way in which the computational domain is partitioned among a cluster of processors. Even a highly efficient parallel algorithm can give poor results for a poorly implemented domain decomposition The domain decomposition technique has been successfully implemented on both SIMD and distributed memory M1MD computers. Algorithms like the ‘Masked Multi-block Algorithm" allow for dynamically partitioning the domain depending on the distnbution of load among processors. This eliminates the possibility of distributing each domain on a processor since in most cases domains will have irregular sizes. Other algorithms distribute separated planes of a 3-D computational domain between processors to synchronize lime waiting. The measure of the efficiency of a parallel algorithm is given by the ratio of the computation lime to the communication time for a particular application. High performance message passing can be achieved by "overlapping communication", performing assembly-coded gather-scatter operations. A machine like the CM – 2 is a SIMD type machine where most of the parallelization is earned out by the compiler which is responsible for data layout. The Cray’s parallel processing capabilities can be exploited by the use of "auto-tasking" wherein the user indicates points of potential parallelism in the implementation by the use of directives which instruct a pre-processor to reconfigure the source program in such a way so as to enable maximum speedup to be obtained.
An alternative to parallel hardware architecture is the poor man’s machine or PVM (Parallel Virtual Machine) that can simulate a parallel machine across a host of senal machines The programming model supported by PVM is distributed memory multi-processing w ith low level message passing. A PVM application essentially uses routines in the PVM to do message passing, process control and automatic data conversion. A special process runs on each node (each machine) of the virtual machine and provides communication support and process control. However since message passing is carried out on the Ethernet it is considerably slower than the Intel Interprocessor network. There is also an overhead on account differences between speeds of different machines that create load balancing problems.