Let’s Talk Performance: Automatic Differentiation by Dynamic Code Generation Meets CFD

Published 05/04/2023 By Johannes Lotz

A collaboration of the RWTH Aachen, the German Aerospace Center, and the Numerical Algorithms Group: CODA and dco/c++.

This blog is authored by J. Lotz (NAG), U. Naumann (RWTH Aachen), F. Kasielke (DLR) and A. Stück (DLR).

The European aircraft industry is facing challenges to maintain global leadership and serve society’s needs, like lowering CO₂ emissions and perceived aircraft noise. These tasks make demands on future product performance and require huge changes in aircraft technology and design principles. This relies on the improvement of simulation capacities, computational efficiency, and scalability. As shown in this article, a boost in computational efficiency can be reached by making use of Automatic Differentiation (AD), a computer science technique for computing first and higher derivatives of implemented functions efficiently and accurately.

CODA is the computational fluid dynamics (CFD) software being developed as part of a collaboration between the French Aerospace Lab ONERA, the German Aerospace Center (DLR), Airbus, and their European research partners. CODA is jointly owned by ONERA, DLR and Airbus.

To enable the efficient analysis and optimization of aircraft on state-of-the-art HPC systems, the Navier-Stokes equations are solved for high Reynolds-number flow on unstructured grids with second-order finite-volume and higher-order Discontinuous-Galerkin discretizations. Scalable Automatic Differentiation (AD) is a key ingredient to support multidisciplinary design optimization scenarios of aircraft on a large scale.

This blog shows the potential of the latest features of NAG’s Automatic Differentiation (AD) solution, called dco/c++, to innovatively improve the computational performance of adjoint CODA computations. A case study was conducted in the framework of a Master thesis (by A. Kuyumcu) at the DLR Institute for Software Methods for Product Virtualization (DLR-SP) in collaboration with the RWTH University (Software and Tools for Computational Engineering). To evaluate the dco/c++ capabilities in conjunction with CODA, we focused on the reverse-mode AD of the discretized CFD residual with respect to the flow state, which is deemed a representative part of the overall CFD method within the analysis chain. We have observed an impressive 10 times speedup by making use of dco/c++’s dynamic code generation capabilities in comparison to the standard tape approach.

CODA is written in modern C++ and utilizes features such as templates and lambda expressions. These features facilitate the application of AD libraries with operator overloading, which are usually built on a “tape-based” approach. This means, that the implementation is evaluated with a special data type as a replacement for the arithmetic type (originally, e.g., ‘double’), overloading all intrinsic operations and functions of a programming language. By doing so, a tape is written to memory, which represents the evaluated program as a graph. In the so-called interpretation, this graph can then be used to run the adjoint computation.

CODA comes with an AD abstraction layer and AD entry points at different granularity levels, which allowed us to investigate three different strategies for the reverse-mode AD of the discrete residual, two of which being tape based while the third is tape-free:

  • Tape-based global adjoint mode:
    For the global adjoint mode, we use dco/c++ to record and interpret the complete residual function.
  • Tape-based local adjoint mode:
    As opposed to the global mode described above, the local approach takes advantage of the property, that the computational graph of the residual evaluation consists of repeated face and cell operations (such as flux evaluations) that are mutually independent. We therefore can tape and independently interpret such fine-grained face and cell operations using much smaller tapes. More importantly, the tape size is independent of the number of degrees of freedom in the system. Run time performance is similar to the tape-based global adjoint mode.
  • Tape-free local adjoint mode:
    This approach now makes use of the new dynamic code generation (referred to as codegen) capabilities of dco/c++ in conjunction with the local differentiation option supported by CODA. Though the tape indeed gets generated once as a preprocessing step, it is only used a single time to generate C++ code, which locally computes the adjoint of the residual. During the evaluation of the derivatives, we do not record or use any tape, but call the compiled version of the previously generated adjoint code. This approach nicely suits the local adjoint mode with its repeated face and cell operations.

Figure 1: Run time improvements through dco/c++ code generation. The x-axis shows the relative time, which is the run time required for the adjoint computation divided by the run time required for computing the residual function values only.

In this case study, we prototypically used a 3D free field cuboid with different mesh resolutions as well as a NACA0012 wing profile to quantify the performance potential of the above-mentioned three AD strategies. Figure 1 (above) demonstrates that the use of the generated adjoint on the level of the flux computation (tape-free local adjoint mode) reduced the computational effort of the adjoint residual evaluation by one order of magnitude compared to the dco/c++ tape-based local approach for all of the selected test cases. It is important to have in mind that the tape-based dco/c++ approach was already significantly faster than other approaches and tools when interpreting these observations.

With improvements like these, we boost computational efficiency and help make the aircraft industry fit for current and future challenges. AD and dco/c++ bring many benefits; it comes with a lot of flexibility while preserving a very low maintenance cost, good efficiency, and high accuracy. With NAG, you get access to the most efficient tools as well as targeted support.

This improvement dramatically boosts the computational efficiency of CODA. dco/c++ delivers a lot of flexibility, low maintenance cost, efficiency, and high accuracy.

For more information about dco/c++, click here.

    Please fill out all of the following questions

    What solvers are you interested in trying? (multiple choice)

    By clicking the button below you agree to our Privacy Policy

    This will close in 20 seconds