MP CBM-Z V1.0: design for a new Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical mechanism architecture for next-generation processors
Precise and rapid air quality simulations and forecasting are limited by the computational performance of the air quality model used, and the gas-phase chemistry module is the most time-consuming function in the air quality model. In this study, we designed a new framework for the widely used the Carbon Bond Mechanism Z (CBM-Z) gas-phase chemical kinetics kernel to adapt the single-instruction, multiple-data (SIMD) technology in next-generation processors to improve its calculation performance. The optimization implements the fine-grain level parallelization of CBM-Z by improving its vectorization ability. Through constructing loops and integrating the main branches, e.g., diverse chemistry sub-schemes, multiple spatial points in the model can be operated simultaneously on vector processing units (VPUs). Two generation CPUs – Intel Xeon E5-2680 V4 CPU and Intel Xeon Gold 6132 – and Intel Xeon Phi 7250 Knights Landing (KNL) are used as the benchmark processors. The validation of the CBM-Z module outputs indicates that the relative bias reaches a maximum of 0.025 % after 10 h integration with -fp-model fast =1 compile flag. The results of the module test show that the Multiple-Points CBM-Z (MP CBM-Z) resulted in 5.16× and 8.97× speedup on a single core of Intel Xeon E5-2680 V4 and Intel Xeon Gold 6132 CPUs, respectively, and KNL had a speedup of 3.69× compared with the performance of CBM-Z on the Intel Xeon E5-2680 V4 platform. For the single-node tests, the speedup on the two generation CPUs can reach 104.63× and 198.50× using message passing interface (MPI) and 101.02× and 194.60× using OpenMP, and the speedup on the KNL node can reach 175.23× using MPI and 167.45× using OpenMP. The speedup of the optimized CBM-Z is approximately 40 % higher on a one-socket KNL platform than on a two-socket Broadwell platform and about 13 %–16 % lower than on a two-socket Skylake platform. We also tested a three-dimensional chemistry transport model (CTM) named Nested Air Quality Prediction Model System (NAQPMS) equipped with the MP CBM-Z. The tests illustrate an obvious improvement on the performance for the CTM after adopting the MP CBM-Z. The results show that the MP CBM-Z leads to a speedup of 3.32 and 1.96 for the gas-phase chemistry module and the CTM on the Intel Xeon E5-2680 platform. Moreover, on the new Intel Xeon Gold 6132 platform, the MP CBM-Z gains 4.90× and 2.22× speedups for the gas-phase chemistry module and the whole CTM. For the KNL, the MP CBM-Z enables a 3.52× speedup for the gas-phase chemistry module, but the whole model lost 24.10 % performance compared to the CPU platform due to the poor performance of other modules. In addition, since this optimization seeks to improve the utilization of the VPU, the model is more suitable for the new generation processors adopting the more advanced SIMD technology. The results of our tests already show that the benefit of updating CPU improved by about 47 % by using the MP CBM-Z since the optimized code has better adaptability for the new hardware. This work improves the performance of the CBM-Z chemical kinetics kernel as well as the calculation efficiency of the air quality model, which can directly improve the practical value of the air quality model in scientific simulations and routine forecasting.