Numerical methods challenge: Day 8
During October (2017) I will write a program per day for some well-known numerical methods in both Python and Julia. It is intended to be an exercise then don't expect the code to be good enough for real use. Also, I should mention that I have almost no experience with Julia, so it probably won't be idiomatic Julia but more Python-like Julia.
Newton's method for optimization
Today we have Newton's method for optimization. The main difference of this method with gradient descent is the use of higher derivatives, namely, the Hessian matrix of the objective function. The use of higher derivatives provides information of the curvature besides the slope information used in gradient descent. The following image compare the path for gradient descent (green) and Newton's method (red).
Mathematically, the update is written as
where \(H(\mathbf{x}_k)\) is the Hessian matrix at step k.
We will test the method with the Rosenbrock's function
Following are the codes.
Python
from __future__ import division, print_function from numpy import array from numpy.linalg import norm, solve def newton_opt(fun, grad, hess, x, niter=50, gtol=1e-8, verbose=False): msg = "Maximum number of iterations reached." g = grad(x) for cont in range(niter): if verbose: print("n: {}, x: {}, g: {}".format(cont, x, g)) x = x - solve(hess(x), g) g = grad(x) if norm(g) < gtol: msg = "Extremum found with desired accuracy." break return x, fun(x), msg def rosen(x): return (1 - x[0])**2 + 100*(x[1] - x[0]**2)**2 def rosen_grad(x): return array([ -2*(1 - x[0]) - 400*x[0]*(x[1] - x[0]**2), 200*(x[1] - x[0]**2)]) def rosen_hess(x): return array([[-400*(x[1]-x[0]**2) + 800*x[0]**2 + 2, -400*x[0]], [-400*x[0], 200]]) print(newton_opt(rosen, rosen_grad, rosen_hess, [2.0, 1.0], verbose=True))
with result
n: 0, x: [2.0, 1.0], g: [ 2402. -600.] n: 1, x: [ 1.99833611 3.99334443], g: [ 1.99888520e+00 -5.53708323e-04] n: 2, x: [ 1.00055248 0.0055331 ], g: [ 398.44998412 -199.11443262] n: 3, x: [ 1.00054972 1.00109974], g: [ 1.09944359e-03 -1.52451385e-09] n: 4, x: [ 1. 0.9999997], g: [ 1.20876952e-04 -6.04384753e-05] (array([ 1., 1.]), 0.0, 'Extremum found with desired accuracy.'
Julia
function newton_opt(fun, grad, hess, x; niter=50, gtol=1e-8, verbose=false) msg = "Maximum number of iterations reached." g = grad(x) for cont = 1:niter if verbose println("n: $(cont), x: $(x), g: $(g)") end x = x - hess(x)\g g = grad(x) if norm(g) < gtol msg = "Extremum found with desired accuracy." break end end return x, fun(x), msg end function rosen(x) return (1 - x[1])^2 + 100*(x[2] - x[1]^2)^2 end function rosen_grad(x) return [-2*(1 - x[1]) - 400*x[1]*(x[2] - x[1]^2); 200*(x[2] - x[1]^2)] end function rosen_hess(x) return [-400*(x[2] - x[1]^2) + 800*x[1]^2 + 2 -400*x[1]; -400*x[1] 200] end println(newton_opt(rosen, rosen_grad, rosen_hess, [2.0, 1.0], verbose=true))
with result
n: 1, x: [2.0, 1.0], g: [2402.0, -600.0] n: 2, x: [1.99834, 3.99334], g: [1.99889, -0.000553708] n: 3, x: [1.00055, 0.0055331], g: [398.45, -199.114] n: 4, x: [1.00055, 1.0011], g: [0.00109944, -1.52451e-9] n: 5, x: [1.0, 1.0], g: [0.000120877, -6.04385e-5] ([1.0, 1.0], 0.0, "Extremum found with desired accuracy.")
Comparison Python/Julia
Regarding number of lines we have: 34 in Python and 37 in Julia. The comparison
in execution time is done with %timeit
magic command in IPython and
@benchmark
in Julia.
For Python:
with result
For Julia:
with result
BenchmarkTools.Trial: memory estimate: 5.48 KiB allocs estimate: 120 -------------- minimum time: 5.784 μs (0.00% GC) median time: 6.030 μs (0.00% GC) mean time: 6.953 μs (10.00% GC) maximum time: 572.279 μs (95.96% GC) -------------- samples: 10000 evals/sample: 6
In this case, we can say that the Python code is roughly 40 times slower than the Julia one.
Comparison with gradient descent
We see an improvement in the number of iterations compared with gradient descent, that is, we moved from 40 iterations to 4 iterations, even if we demand the method to have higher accuracy, \(10^{-12}\), for example.
The appearance of this faster convergence does not come for free, of course. When using Newton's method we have two major drawbacks:
We need to compute the Hessian of the function, that might prove really difficult even if we have the analytical expression for our function.
We need to solve a system of equation in each iteration.
Comments
Comments powered by Disqus