Automatic Differentiation

AD Myths Debunked: It Won’t Work on My Code

Published 06/04/2022 By Johannes Lotz

This post is part of the AD Myths Debunked series.

You’re curious about automatic differentiation (AD) and its benefits, but you’ve heard that it would be impossible to use AD on your code, because of numerous technical and mathematical difficulties… so, you don’t pursue the idea much further.

We have heard this story many times over the years. After all, what does one do about external libraries, non-differentiability, parallelization, or heterogeneous architectures, not to mention polymorphism and modern C++ language features? Those are showstoppers, right? Or is that just a myth?

At NAG, we have NEVER encountered a code to which we could not apply our tool, dco/c++. This is thanks to the flexibility and the features of dco/c++ (which form the basis for resolving all challenges), and the huge experience AD specialists at NAG have with large-scale mathematical simulation codes.

Below we have listed the various concerns we have previously encountered in regard to AD working on a specific code. We felt it was time to explain these common misconceptions, or myths, away!

NAG’s AD toolset has been developed over the last 12 years and it builds upon a further 10 years of AD R&D experience. We have collected a big stock of experience on how to apply AD and how to resolve a multitude of issues.

Myths are narratives, which might sound like truths, and by talking through these in some detail and sharing our experiences, we hope to help businesses navigate these issues. Results matter, myths should not.

Take a look at our other posts on Myths in AD, including memory use and rewriting libraries!

  • “The code that I need to differentiate does not have inputs and outputs all collected neatly together in one top-level routine such as I’ve seen in all examples. They’re scattered all over the place.”
    Though the examples usually use such a top-level routine, it is not a requirement at all. Inputs and outputs can be spread across various routines, compilation units, or libraries. The dco/c++ API does not rely on collecting all these at one location in your code.
  • “I’m using modern C++ features like polymorphism, lambdas, patterns (factory, visitor, CRTP, adaptor, …) as well as Standard Library containers and algorithms.”
    This is just the kind of code base that dco/c++ was designed to handle. We’ve spent a lot of time ensuring that dco/c++ supports modern C++, whereas many other AD tools will run into problems at some point.
  • “I am using external numerical libraries, which I cannot call with any data types other than ‘doubles’.”
    First, you need to check whether this specific external library actually needs to be differentiated; does it sit between the inputs and outputs you need derivatives of? If that is the case, indeed there is no miraculous solution for calling the external library with dco/c++ data types. There are viable alternatives, though. The obvious one is to use a numerical library, which does support dco/c++ types. If you need functionality not supported by the NAG AD Library, you can couple the overall AD solution with local finite difference approximation or hand-written derivatives.
  • “My code is parallel.”
    That’s great! dco/c++ is thread safe and has simple APIs for dealing with parallelism. Many other AD tools have neither of these.
  • “I am using multiple programming languages in my code.”
    AD is a technique, which is conceptually not constrained by the language you use. Whenever language interoperability is required, AD can work in these instances as well. dco/c++ has production-ready front ends for C++ and Fortran, and alpha releases for C# and Python. Custom solutions for other languages exist; this includes scripting languages based on source transformation, or hand-coding for rather static codes.
  • “The problem I’m solving is nondifferentiable.”
    AD requires the implementation to be differentiable, not the mathematical problem. In many circumstances, this fact circumvents the non-differentiability right away. In addition, AD by default computes a sub gradient in case of an evaluation at a non-differentiable point (e.g., left or right derivative). If the non-differentiability turns out to be an issue, there are many ways of approaching this. Potential solutions include local manual implementation (e.g., symbolic differentiation), local application of finite differences, or local smoothing. All these approaches can be integrated into an overall AD solution with dco/c++.
  • “I’m doing heterogeneous programming.”
    There are various approaches to handle distributed and shared-memory parallelism with dco/c++. The tool is thread-safe and applying it correctly preserves the parallelism. For GPU programming, we offer the general-purpose tool dco/map, but have also implemented custom solutions.