Project Background – EXAC2T

One of the many EU collaborative projects that NAG has been involved with is EXA2CT (EXascale Algorithms and Advanced Computational Techniques). The principle aim of the project’s research and development work was to help numerical codes scale and perform better on large numbers of cores. The project included the development of “proto-apps” to behave as proxies for parts of larger codes. This element of the work was carried out collaboratively by NAG and Intel.

 

The project began by asking a set of questions to owners of large codes to find identifiable sections of code that did not scale well as the core count increased, so limiting the scaling of the entire code. Ideally, the code owners provided access to their code so that it could be profiled. Using the findings obtained the team created the proto-app, which was a new standalone code to reproduce/extract the bottleneck from the original code. This was then used to study and improve the performance of this bottleneck without needing the original code. In some cases, the bottleneck was also present in a wide range of other codes and so improvements made to the proto-app could be just as relevant to other applications. After this improvement work, the changes were re-integrated into the original code and the proto-app was made available to the wider user community for their own development work.

In this case NAG did not have access to the original code and had to develop the proto-app from a flowchart of the method which they provided. The proto-app implemented three alternative methods: one without load-balancing, a second that used the Yales2 load-balancer and a third that implemented a work-farming method as a better alternative. The proto-app showed that this new method scaled much better than the original.

The second proto-app, exaIO, was developed to improve the initialization in the climate coupling code OASIS3-MCT, which acts as a communication layer between separate climate modelling codes.

The problem was that one process was reading in all the data from file and then broadcasting it to all the other processes, which becomes ever more expensive with increasing core counts. The proto-app implemented this change, along with some improved alternatives, where data is only sent to the process that requires it. An improved method developed in the proto-app has now been implemented in the main OASIS3-MCT code.

The third proto-app, exaMD, implements an alternative method for calculating the interactions between particles in molecular dynamics calculations, known as “the midpoint method”. The proto-app scales quite well and the developer is working on integrating the proto-app into the main code.

Overall the project showed that the method of creating such proto-apps can on occasion be a useful way of focusing on potential low performance sections of large codes, and could be a method to test and compare solutions for some real world codes. The work involved various members of the NAG team and will have spin off benefits both within and outside of NAG.