The steady, time-domain and harmonic balance solvers of COSA are all parallelized using a distributed memory parallelization paradigm implemented by means of the Message Passing Interface (MPI) libraries. In this framework, each MPI process handles a subset of blocks of the given grid, and the size of the subset can vary between the total number of blocks of the grid (in this case the simulation is serial) and 1 (in this case each MPI process looks after one block of the given multi-block grid. Special care has been taken in achieving an extremely high parallel efficiency of both the computational part of COSA and its parallel input/output file management. The figure below reports the outcome of a recent strong scalability test performed on ARCHER, the Cray cluster of the UK national supercomputing service. The test case is a three-dimensional oscillating wing solved with the harmonic balance solver retaining 4 complex harmonics in the truncated Fourier series used to reconstruct the sought periodic flow (HB4). The multi-block grid has 37,748,736 cells, and consists of 16,384 blocks.The x-axis reports the number of MPI processes or cores adopted in the simulations, and the speed-up variable on the y-axis is defined as the ratio between the wall-clock time of the simulation using 2n cores and that of the simulation using n cores.The black dashed curve is the ideal speed-up curve, which is a line, whereas the red solid line with circles is the measured speed-up of COSA compiled with IFORT on ARCHER. An excellent parallel efficiency is observed.

One of the fairly rare features of COSA that makes it very user-friendly even when preparing, running and postprocessing very large simulations is its highly efficient parallel management of all input and output files. A single mesh file is read in by all MPI processes, and a single restart file is read in or written out by all MPI processes. The parallel efficiency of these data handling operations is comparable with that of the computational part of the code.

COSA is often run on large national and international clusters, such as the N8 HPC SGI cluster POLARIS, and ARCHER. The code is under continuous development, and it is currently being used for diverse rotary machine applications, including the aerodynamic analysis and design of horizontal and vertical axis wind turbines. The TD solver is presently being intensively run with more than 8,000 cores to investigate the complex unsteady flow past oscillating wings for tidal power generation.