The costly part in both NUTS and basic HMC is computing the (unnormalized) log density (typically a Bayesian posterior) and its gradient for each of the 2**j leapfrog steps. Using the adjoint method for dynamic programming symbolic gradient calculations, we can bound the time needed for the gradient by the number of expressions evaluated in the log density function. For the models we’ve looked at, the gradient takes a bit longer than the log density itself to evaluate.

The use of dual averaging isn’t critical. It just worked a bit better than Robbins-Monro.

Andrew had the same intuition about using the intermediate leapfrog steps somehow in basic HMC.

NUTS is also available in (relatively, not universally) portable C++ as part of our soon-to-be-released package Stan (named after Stanislaw Ulam). Stan also includes a directed graphical model compiler that converts BUGS-like model specifications into C++ for computing unnormalized densities and their gradients (the latter using algorithmic differentiation, which implements the adjoint method based on a templated density function). We’re testing with the BUGS example models and some others now and plan to release a stable version as soon as we finish this testing, do some more performance tuning, write an R2jags-like R interface, and write user manuals.

]]>