Numerical methods challenge: Day 8 – Newton's method for optimization

Nicolás Guarín-Zapata

2017-10-09 16:15

Also available in:

During October (2017) I will write a program per day for some well-known numerical methods in both Python and Julia. It is intended to be an exercise then don't expect the code to be good enough for real use. Also, I should mention that I have almost no experience with Julia, so it probably won't be idiomatic Julia but more Python-like Julia.

Newton's method for optimization

Today we have Newton's method for optimization. The main difference of this method with gradient descent is the use of higher derivatives, namely, the Hessian matrix of the objective function. The use of higher derivatives provides information of the curvature besides the slope information used in gradient descent. The following image compare the path for gradient descent (green) and Newton's method (red).

Comparison of the path followed by gradient descent and Newton's method.

Mathematically, the update is written as

\begin{equation*} \mathbf{x}_k = \mathbf{x}_{k-1} - H(\mathbf{x}_k)^{-1} \nabla f(\mathbf{x_k}) \end{equation*}

where \(H(\mathbf{x}_k)\) is the Hessian matrix at step k.

We will test the method with the Rosenbrock's function

\begin{equation*} f(x_1, x_2) = (1-x_1)^2 + 100(x_2-{x_1}^2)^2 \end{equation*}

Following are the codes.

Python

from __future__ import division, print_function
from numpy import array
from numpy.linalg import norm, solve


def newton_opt(fun, grad, hess, x, niter=50, gtol=1e-8, verbose=False):
    msg = "Maximum number of iterations reached."
    g = grad(x)
    for cont in range(niter):
        if verbose:
            print("n: {}, x: {}, g: {}".format(cont, x, g))
        x = x - solve(hess(x), g)
        g = grad(x)
        if norm(g) < gtol:
            msg = "Extremum found with desired accuracy."
            break
    return x, fun(x), msg


def rosen(x):
    return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2


def rosen_grad(x):
    return array([
        -2*(1 - x[0]) - 400*x[0]*(x[1] - x[0]**2),
        200*(x[1] - x[0]**2)])

def rosen_hess(x):
    return array([[-400*(x[1]-x[0]**2) + 800*x[0]**2 + 2, -400*x[0]],
                 [-400*x[0], 200]])


print(newton_opt(rosen, rosen_grad, rosen_hess, [2.0, 1.0], verbose=True))

with result

n: 0, x: [2.0, 1.0], g: [ 2402.  -600.]
n: 1, x: [ 1.99833611  3.99334443], g: [  1.99888520e+00  -5.53708323e-04]
n: 2, x: [ 1.00055248  0.0055331 ], g: [ 398.44998412 -199.11443262]
n: 3, x: [ 1.00054972  1.00109974], g: [  1.09944359e-03  -1.52451385e-09]
n: 4, x: [ 1.         0.9999997], g: [  1.20876952e-04  -6.04384753e-05]
(array([ 1.,  1.]), 0.0, 'Extremum found with desired accuracy.'

Julia

function newton_opt(fun, grad, hess, x; niter=50, gtol=1e-8, verbose=false)
    msg = "Maximum number of iterations reached."
    g = grad(x)
    for cont = 1:niter
        if verbose
            println("n: $(cont), x: $(x), g: $(g)")
        end
        x = x - hess(x)\g
        g = grad(x)
        if norm(g) < gtol
            msg = "Extremum found with desired accuracy."
            break
        end
    end
    return x, fun(x), msg
end


function rosen(x)
    return (1 - x[1])^2 + 100*(x[2] - x[1]^2)^2
end


function rosen_grad(x)
    return [-2*(1 - x[1]) - 400*x[1]*(x[2] - x[1]^2);
            200*(x[2] - x[1]^2)]
end


function rosen_hess(x)
    return [-400*(x[2] - x[1]^2) + 800*x[1]^2 + 2 -400*x[1];
            -400*x[1] 200]
end



println(newton_opt(rosen, rosen_grad, rosen_hess, [2.0, 1.0], verbose=true))

with result

n: 1, x: [2.0, 1.0], g: [2402.0, -600.0]
n: 2, x: [1.99834, 3.99334], g: [1.99889, -0.000553708]
n: 3, x: [1.00055, 0.0055331], g: [398.45, -199.114]
n: 4, x: [1.00055, 1.0011], g: [0.00109944, -1.52451e-9]
n: 5, x: [1.0, 1.0], g: [0.000120877, -6.04385e-5]
([1.0, 1.0], 0.0, "Extremum found with desired accuracy.")

Comparison Python/Julia

Regarding number of lines we have: 34 in Python and 37 in Julia. The comparison in execution time is done with %timeit magic command in IPython and @benchmark in Julia.

For Python:

%timeit newton_opt(rosen, rosen_grad, rosen_hess, [2.0, 1.0])

with result

1000 loops, best of 3: 247 µs per loop

For Julia:

@benchmark newton_opt(rosen, rosen_grad, rosen_hess, [2.0, 1.0])

with result

BenchmarkTools.Trial:
  memory estimate:  5.48 KiB
  allocs estimate:  120
  --------------
  minimum time:     5.784 μs (0.00% GC)
  median time:      6.030 μs (0.00% GC)
  mean time:        6.953 μs (10.00% GC)
  maximum time:     572.279 μs (95.96% GC)
  --------------
  samples:          10000
  evals/sample:     6

In this case, we can say that the Python code is roughly 40 times slower than the Julia one.

Comparison with gradient descent

We see an improvement in the number of iterations compared with gradient descent, that is, we moved from 40 iterations to 4 iterations, even if we demand the method to have higher accuracy, \(10^{-12}\), for example.

The appearance of this faster convergence does not come for free, of course. When using Newton's method we have two major drawbacks:

We need to compute the Hessian of the function, that might prove really difficult even if we have the analytical expression for our function.
We need to solve a system of equation in each iteration.

Nicolás' blog

Newton's method for optimization

Python

Julia

Comparison Python/Julia

Comparison with gradient descent

Comments