NYCJUG/2010-11-09/Levenberg-MarquardtAlgorithm
Levenberg, Marquardt, non-linear least-squares, optimization, solver
This is a J implementation, originally provided by Bob O'Boyle, of the Levenberg-Marquardt algorithm for non-linear least-squares optimization.
However, this page intends to provide a basis for improving the implementation by making it more J-like. To accomplish this, we first need to ensure that it runs to completion and provides the expected results. As we'll see in the next section, this is not necessarily simple. The two parts shown below are available as discrete files File:LevenbergMarquardt.ijs and File:LMAUsageEG.ijs.
Current Status and Likely Directions
We are now able to read the code and run it to completion though the results seem to differ slightly but significantly from those expected in both examples. We need to concentrate on NLLS by winnowing it down to its core routine - to properly separate out the statistics generation code now embedded in it.
Notes on Changes and Problems
(20100120 20:49) A utility function "qprintf" is not defined. It appears to be used to output information, so I assigned it to the standard J output function
qprintf=: smoutput
This appears to allow the example to run to completion. Since the output produced by the invocation of this function appears to be fairly useless - simply the constant string 'npar ' - I eliminated it entirely. Also made the code more readable by: using indentation within function bodies; put control constructs like "if. ... do." and such on single line instead of having single lines empty except for a "do." statement.
(20100120 22:37) Re-loading the updated version brings to light a bug: the variable "initial" which supplies the initial parameters has three values in the main routine but only two in the example use and the code expects only two values. This is why we don't pass data around in global variables with common names. In fact, it's not clear why we want to assign a name at all when we use it only once at the invocation.
This also points up the unconventional use of all-caps for function names but not for global variables; usually, we want to emphasize the danger of globals by spelling them all-caps and emphasize the meaningfulness of functions by using mixed-case as appropriate.
Running the example gives results which differ from the expected result of 813.4583 960.9063 as mentioned in the comments to the example code:
'LS_PROB' NLLS 750 1200 813.90901 961.01103
(20100120 23:07) The discrepant result can be resolved if we apply NLLS to its result until it converges. Looking at doing this points up further problems with the idiosyncratic spacing of the assignments in this code which appears to aim at aligning the the assignment symbols on consecutive lines. It's hard to see how this is at all helpful and easy to see how unhelpful it is when we attempt to search for assignments to a name as opposed to uses of the same name. If we always put the assignment symbol immediately after the name being assigned, we have an unambiguous target for these kind of searches.
(20100120 23:15) Applying the optimization recursively
'LS_PROB'&NLLS^:_] 750 1200 813.85949 960.999
Which disagrees with the expected result mentioned but appears to be a convergent solution. This also points up that NLLS should really be an adverb - the argument 'LS_PROB' is the name of the objective function which is instantiated within NLLS.
(20100120 23:28) NLLS as an adverb works fine; also simplified the example objective function LS_PROB by removing unnecessary parens and a name which is used only once immediately following its assignment.
(20100120 23:34) Attempting the VON_B example in the main routine
VON_B NLLS^:_ ] 100 0.05 1 57.501351 0.31545876 0.0002625186
gives results differing from those indicated in the comment: NB. Solution with Excel's SOLVER is linf= 55.9778, k= 0.385595, t0= 0.171371. We need to better understand how to evaluate the results - perhaps the globals like stats will help us here?
Current J Implementation of the Levenberg-Marquardt Algorithm (LMA)
Here is the currently modified code in-line; it is followed by the in-line code of an example usage. One problem we face here is that the code does not quite seem to work. However, as we get it to work and improve it, we can keep the following instance updated.
Further explorations
There are a couple of inter-related issues with the program not seeming to work. It actually does when cnstrnt =: 'OFF' . When cnstrnt =: 'ON" , the solution is sensitive to the values of ucnstrnt and lcnstrnt. It is important to ensure that the upper and lower constraints bound the initial parameter estimates. When these are set to 110 0.75 1.1 and 40 0 0, which bound the initial values, the solution is closer to the unconstrained one.
Also, as originally configured, the input function (OBJ_FN) was the left argument of NLLS. The original NLLS (as noun) implementation is attached to this page. Converting the latter to an adverb allows iteration using the power function but this may be unnecessary as the iteration is done within NLLS. One could use VON_B NLLS ^: n ] 100 0.05 1 where n could be 1, 2, 3 etc to see this effect. It is not clear what the advantages of the adverb use are other than confirmation of convergence. It might be better to conduct a constrained run and use that solution in an unconstrained run to ensure stable convergence. It might also be better to use different initial values to see if the convergence is stable.
So one should use either version of NLLS with these points in mind.
NB.* LevenbergMarquardt.ijs: Von Bertalanffy fit using Marquardt nonlinear NB. least-squares. NB. Need to type 'objective function' NLLS initial parameter list to execute NB. outputs stats matrix & parm vector NB. NB. Original written by R. O'Boyle Dec 2008 NB. Inputs NB. Data from Haddon (2001) page 92 years=: 1.0 2.0 3.3 4.3 5.3 6.3 7.3 8.3 9.3 10.3 11.3 length=: 15.4 26.93 42.23 44.59 47.63 49.67 50.87 52.3 54.77 56.43 55.88 initial=: 100 0.05 1.0 NB. Constraints ucnstrnt=: 60 1 2 lcnstrnt=: 30 0 0 cnstrnt=: 'ON' alpha=: 0.001 * initial NB. influence of penalty function NB. solution with Excel's SOLVER is linf= 55.9778, k= 0.385595, t0= 0.171371 VON_B=: 3 : 0 'a b c'=. y l_pred=. a * (1 -^ - (b * years - c)) length - l_pred ) NLLS=: 1 : 0 NB. Marquardt nonlinear least-squares NB. Need to type 'objective function' NLLS initial parameter list to execute NB. Need to supply upper & lower parameter constraints NB. outputs stats matrix & parm vector identity=.=/~i.$y limit=. 200 p=. $par=. npar=. ,y rss=. e +/ .* e=. u par n=. $e pnlty=. alpha PNLTY_FN par NB. PENALTY FOR CONSTRAINTS nphi=. phi=. rss + pnlty lambda=. 0.01 con=. 10 j=. 0 whilst. NB. Start of Main Loop 1=<./(10>:i),(limit>:j),(0.0001<con=.(((n-p)*q +/ . * -v)%p*rss)^0.5), (0.00001<|(nphi-phi)%phi), (0.00001 +./ . < |(npar-par)%0.00000000000000000001 + |par) do. j=. j + 1 par=. npar phi=. nphi de=. ((0.00001*|npar) + (0.00000001*npar= 0)) u D: 1 npar q=. 2 * de +/ . * e NB. Gradient hess=. 2 * (] +/ . *"2 1 ]) de NB. Hessian if. 'ON' -: cnstrnt do. NB. Start of Difference for penalty dpnlty=. pnlty DIFF_PNLTY par q=. q + {. dpnlty hess=. hess + identity * {: dpnlty NB. Adds penalty to diag of Hessian end. NB. Difference for penalty lambda=. 0.000001 >. lambda * 0.01 hess=. hess + hess * identity * lambda=. lambda * 10 norm=. %:(+/"2 hess^2) NB. Column norms hess=. hess %" 1 1 norm NB. Scale Hessian to norm npar=. par + v=. (-q %. hess) % norm NB. Step direction; step size= 1 rss=. e +/ . * e=. u npar pnlty=. alpha PNLTY_FN par i=. 1 if. (phi < nphi=. rss + pnlty) do. lambda=. lambda * 100 while. (phi < nphi=. rss + pnlty) *. (i <: 10) do. i=. i + 1 npar=. par + v=. v * 0.1^i rss=. e +/ . * e=. u npar pnlty=. alpha PNLTY_FN par end. end. end. NB. End of Main Loop e=. u npar msr=: {.(rss=. e +/ . * e) % n - p aic=: (^.rss) + 2*p%n NB. AIC analog for least-squares: NB. (Hongzhi,A. 1989. Acta Mathematica Applicatae Sinica. 5: 60-67) NB. Invert hessian & re-normalize hess_inv=: (%.hess)%"1 0 norm NB. Covariance matrix (msr divided by hessian) cov=: 2 * msr * hess_inv NB. Standard error (diagonal of covariance matrix) par_se=: %:(+/"2 identity * cov) NB. Correlation matrix corr=: cov%(*/~par_se) cv=: par_se % par stats=: 7 2 $ 'Loops'; j; 'Lambda'; lambda; 'No. Observations'; n; 'No. Parameters'; p; 'RSS'; rss; 'MSR'; msr; 'AIC'; aic parm=: npar ) PNLTY_FN=: 4 : 0 NB. Penalty function for specified constraints NB. r= resultant penalty NB. y= parameters NB. x= vector of constants for the constraints if. 'ON' -: cnstrnt do. r=. +/(x, x)%(y-lcnstrnt),ucnstrnt-y NB. GENERIC LOWER AND UPPER CONSTRAINTS else. r=. 0 NB. Default is no penalty end. ) PNLTY_FN=: 4 : 0 NB. Penalty function for specified constraints NB. r= resultant penalty NB. y= parameters NB. x= vector of constants for the constraints if. 'ON' -: cnstrnt do. r=. +/(x, x)%(y-lcnstrnt),ucnstrnt-y NB. GENERIC LOWER AND UPPER CONSTRAINTS else. r=. 0 NB. Default is no penalty end. ) DIFF_PNLTY=: 4 : 0 r=. 2 0 $ 0 identity=.=/~i.$y delta=. (0.01*y) + (0.001*y= 0) tpar_f=. y +" 1 1 (identity * delta) tpar_b=. y -" 1 1 (identity * delta) for_a. i. $y do. r1=. (x-fpnlty=.alpha PNLTY_FN a { tpar_f)% a { delta bpnlty=. alpha PNLTY_FN a { tpar_b r=. r,.r1,(fpnlty+bpnlty-2*x)% a { delta end. )
Example Use of LMA Code
NB.* LMAUsageEG.ijs: example of using Levenberg-Marquardt algorithm to solve a NB. known problem: least-squares problem from Bard (1974) page 124. NB. Constraints and boundaries per Bard NB. Inputs time=: 0.1 0.2 0.3 0.4 0.5 0.05 0.1 0.15 0.2 0.25 0.02 0.04 0.06 0.08 0.1 temp=: 100 100 100 100 100 200 200 200 200 200 300 300 300 300 300 fract=: 0.98 0.983 0.955 0.979 0.993 0.626 0.544 0.455 0.225 0.167 0.566 0.317 0.034 0.016 0.066 initial=: 750 1200 NB. Constraints ucnstrnt=: 1000 2000 lcnstrnt=: 500 750 cnstrnt=: 'OFF' alpha=: 0.001 * initial NB. influence of penalty function NB. Unconstrained solution by Bard (1974) is a= 813.4583 and b= 960.9063 LS_PROB=: 3 : 0 'a b'=. y fract - ^-a*time * ^-b%temp ) LS_PROB NLLS 750 1200 813.909 961.011