Non-linear least squares

Non-linear least squares is the form of least squares analysis used to fit a set of m observations with a model that is non-linear in n unknown parameters (m ≥ n). It is used in some forms of nonlinear regression. The basis of the method is to approximate the model by a linear one and to refine the parameters by successive iterations. There are many similarities to linear least squares, but also some significant differences. In economic theory, the non-linear least squares method is applied in (i) the probit regression, (ii) threshold regression, (iii) smooth regression, (iv) logistic link regression, (v) Box–Cox transformed regressors ().

Theory

Consider a set of data points, and a curve (model function) that in addition to the variable also depends on parameters, with It is desired to find the vector of parameters such that the curve fits best the given data in the least squares sense, that is, the sum of squares is minimized, where the residuals (in-sample prediction errors) ri are given by for

The minimum value of S occurs when the gradient is zero. Since the model contains n parameters there are n gradient equations:

In a nonlinear system, the derivatives are functions of both the independent variable and the parameters, so in general these gradient equations do not have a closed solution. Instead, initial values must be chosen for the parameters. Then, the parameters are refined iteratively, that is, the values are obtained by successive approximation,

Here, k is an iteration number and the vector of increments, is known as the shift vector. At each iteration the model is linearized by approximation to a first-order Taylor polynomial expansion about The Jacobian matrix, J, is a function of constants, the independent variable and the parameters, so it changes from one iteration to the next. Thus, in terms of the linearized model, and the residuals are given by

Substituting these expressions into the gradient equations, they become which, on rearrangement, become n simultaneous linear equations, the normal equations

The normal equations are written in matrix notation as

These equations form the basis for the Gauss–Newton algorithm for a non-linear least squares problem.

Note the sign convention in the definition of the Jacobian matrix in terms of the derivatives. Formulas linear in may appear with factor of in other articles or the literature.

Extension by weights

When the observations are not equally reliable, a weighted sum of squares may be minimized,

Each element of the diagonal weight matrix W should, ideally, be equal to the reciprocal of the error variance of the measurement.[1] The normal equations are then, more generally,

Geometrical interpretation

In linear least squares the objective function, S, is a quadratic function of the parameters. When there is only one parameter the graph of S with respect to that parameter will be a parabola. With two or more parameters the contours of S with respect to any pair of parameters will be concentric ellipses (assuming that the normal equations matrix is positive definite). The minimum parameter values are to be found at the centre of the ellipses. The geometry of the general objective function can be described as paraboloid elliptical. In NLLSQ the objective function is quadratic with respect to the parameters only in a region close to its minimum value, where the truncated Taylor series is a good approximation to the model. The more the parameter values differ from their optimal values, the more the contours deviate from elliptical shape. A consequence of this is that initial parameter estimates should be as close as practicable to their (unknown!) optimal values. It also explains how divergence can come about as the Gauss–Newton algorithm is convergent only when the objective function is approximately quadratic in the parameters.

Computation

Initial parameter estimates

Some problems of ill-conditioning and divergence can be corrected by finding initial parameter estimates that are near to the optimal values. A good way to do this is by computer simulation. Both the observed and calculated data are displayed on a screen. The parameters of the model are adjusted by hand until the agreement between observed and calculated data is reasonably good. Although this will be a subjective judgment, it is sufficient to find a good starting point for the non-linear refinement. Initial parameter estimates can be created using transformations or linearizations. Better still evolutionary algorithms such as the Stochastic Funnel Algorithm can lead to the convex basin of attraction that surrounds the optimal parameter estimates.[citation needed] Hybrid algorithms that use randomization and elitism, followed by Newton methods have been shown to be useful and computationally efficient[citation needed].

Solution

Any method among the ones described below can be applied to find a solution.

Convergence criteria

The common sense criterion for convergence is that the sum of squares does not increase from one iteration to the next. However this criterion is often difficult to implement in practice, for various reasons. A useful convergence criterion is The value 0.0001 is somewhat arbitrary and may need to be changed. In particular it may need to be increased when experimental errors are large. An alternative criterion is

Again, the numerical value is somewhat arbitrary; 0.001 is equivalent to specifying that each parameter should be refined to 0.1% precision. This is reasonable when it is less than the largest relative standard deviation on the parameters.

Calculation of the Jacobian by numerical approximation

There are models for which it is either very difficult or even impossible to derive analytical expressions for the elements of the Jacobian. Then, the numerical approximation is obtained by calculation of for and . The increment,, size should be chosen so the numerical derivative is not subject to approximation error by being too large, or round-off error by being too small.

Parameter errors, confidence limits, residuals etc.

Some information is given in the corresponding section on the Weighted least squares page.

Multiple minima

Multiple minima can occur in a variety of circumstances some of which are:

  • A parameter is raised to a power of two or more. For example, when fitting data to a Lorentzian curve where is the height, is the position and is the half-width at half height, there are two solutions for the half-width, and which give the same optimal value for the objective function.
  • Two parameters can be interchanged without changing the value of the model. A simple example is when the model contains the product of two parameters, since will give the same value as .
  • A parameter is in a trigonometric function, such as , which has identical values at . See Levenberg–Marquardt algorithm for an example.

Not all multiple minima have equal values of the objective function. False minima, also known as local minima, occur when the objective function value is greater than its value at the so-called global minimum. To be certain that the minimum found is the global minimum, the refinement should be started with widely differing initial values of the parameters. When the same minimum is found regardless of starting point, it is likely to be the global minimum.

When multiple minima exist there is an important consequence: the objective function will have a maximum value somewhere between two minima. The normal equations matrix is not positive definite at a maximum in the objective function, as the gradient is zero and no unique direction of descent exists. Refinement from a point (a set of parameter values) close to a maximum will be ill-conditioned and should be avoided as a starting point. For example, when fitting a Lorentzian the normal equations matrix is not positive definite when the half-width of the band is zero.[2]

Transformation to a linear model

A non-linear model can sometimes be transformed into a linear one. Such an approximation is, for instance, often applicable in the vicinity of the best estimator, and it is one of the basic assumption in most iterative minimization algorithms. When a linear approximation is valid, the model can directly be used for inference with a generalized least squares, where the equations of the Linear Template Fit[3] apply.

Another example of a linear approximation would be when the model is a simple exponential function, which can be transformed into a linear model by taking logarithms. Graphically this corresponds to working on a semi-log plot. The sum of squares becomes This procedure should be avoided unless the errors are multiplicative and log-normally distributed because it can give misleading results. This comes from the fact that whatever the experimental errors on y might be, the errors on log y are different. Therefore, when the transformed sum of squares is minimized, different results will be obtained both for the parameter values and their calculated standard deviations. However, with multiplicative errors that are log-normally distributed, this procedure gives unbiased and consistent parameter estimates.

Another example is furnished by Michaelis–Menten kinetics, used to determine two parameters and : The Lineweaver–Burk plot of against is linear in the parameters and but very sensitive to data error and strongly biased toward fitting the data in a particular range of the independent variable .

Algorithms

Gauss–Newton method

The normal equations may be solved for by Cholesky decomposition, as described in linear least squares. The parameters are updated iteratively where k is an iteration number. While this method may be adequate for simple models, it will fail if divergence occurs. Therefore, protection against divergence is essential.

Shift-cutting

If divergence occurs, a simple expedient is to reduce the length of the shift vector, , by a fraction, f For example, the length of the shift vector may be successively halved until the new value of the objective function is less than its value at the last iteration. The fraction, f could be optimized by a line search.[4] As each trial value of f requires the objective function to be re-calculated it is not worth optimizing its value too stringently.

When using shift-cutting, the direction of the shift vector remains unchanged. This limits the applicability of the method to situations where the direction of the shift vector is not very different from what it would be if the objective function were approximately quadratic in the parameters,

Marquardt parameter

If divergence occurs and the direction of the shift vector is so far from its "ideal" direction that shift-cutting is not very effective, that is, the fraction, f required to avoid divergence is very small, the direction must be changed. This can be achieved by using the Marquardt parameter.[5] In this method the normal equations are modified where is the Marquardt parameter and I is an identity matrix. Increasing the value of has the effect of changing both the direction and the length of the shift vector. The shift vector is rotated towards the direction of steepest descent when is the steepest descent vector. So, when becomes very large, the shift vector becomes a small fraction of the steepest descent vector.

Various strategies have been proposed for the determination of the Marquardt parameter. As with shift-cutting, it is wasteful to optimize this parameter too stringently. Rather, once a value has been found that brings about a reduction in the value of the objective function, that value of the parameter is carried to the next iteration, reduced if possible, or increased if need be. When reducing the value of the Marquardt parameter, there is a cut-off value below which it is safe to set it to zero, that is, to continue with the unmodified Gauss–Newton method. The cut-off value may be set equal to the smallest singular value of the Jacobian.[6] A bound for this value is given by where tr is the trace function.[7]

QR decomposition

The minimum in the sum of squares can be found by a method that does not involve forming the normal equations. The residuals with the linearized model can be written as The Jacobian is subjected to an orthogonal decomposition; the QR decomposition will serve to illustrate the process. where Q is an orthogonal matrix and R is an matrix which is partitioned into an block, , and a zero block. is upper triangular.

The residual vector is left-multiplied by .

This has no effect on the sum of squares since because Q is orthogonal. The minimum value of S is attained when the upper block is zero. Therefore, the shift vector is found by solving

These equations are easily solved as R is upper triangular.

Singular value decomposition

A variant of the method of orthogonal decomposition involves singular value decomposition, in which R is diagonalized by further orthogonal transformations.

where is orthogonal, is a diagonal matrix of singular values and is the orthogonal matrix of the eigenvectors of or equivalently the right singular vectors of . In this case the shift vector is given by

The relative simplicity of this expression is very useful in theoretical analysis of non-linear least squares. The application of singular value decomposition is discussed in detail in Lawson and Hanson.[6]

Gradient methods

There are many examples in the scientific literature where different methods have been used for non-linear data-fitting problems.

  • Inclusion of second derivatives in The Taylor series expansion of the model function. This is Newton's method in optimization. The matrix H is known as the Hessian matrix. Although this model has better convergence properties near to the minimum, it is much worse when the parameters are far from their optimal values. Calculation of the Hessian adds to the complexity of the algorithm. This method is not in general use.
  • Davidon–Fletcher–Powell method. This method, a form of pseudo-Newton method, is similar to the one above but calculates the Hessian by successive approximation, to avoid having to use analytical expressions for the second derivatives.
  • Steepest descent. Although a reduction in the sum of squares is guaranteed when the shift vector points in the direction of steepest descent, this method often performs poorly. When the parameter values are far from optimal the direction of the steepest descent vector, which is normal (perpendicular) to the contours of the objective function, is very different from the direction of the Gauss–Newton vector. This makes divergence much more likely, especially as the minimum along the direction of steepest descent may correspond to a small fraction of the length of the steepest descent vector. When the contours of the objective function are very eccentric, due to there being high correlation between parameters, the steepest descent iterations, with shift-cutting, follow a slow, zig-zag trajectory towards the minimum.
  • Conjugate gradient search. This is an improved steepest descent based method with good theoretical convergence properties, although it can fail on finite-precision digital computers even when used on quadratic problems.[8]

Direct search methods

Direct search methods depend on evaluations of the objective function at a variety of parameter values and do not use derivatives at all. They offer alternatives to the use of numerical derivatives in the Gauss–Newton method and gradient methods.

  • Alternating variable search.[4] Each parameter is varied in turn by adding a fixed or variable increment to it and retaining the value that brings about a reduction in the sum of squares. The method is simple and effective when the parameters are not highly correlated. It has very poor convergence properties, but may be useful for finding initial parameter estimates.
  • Nelder–Mead (simplex) search. A simplex in this context is a polytope of n + 1 vertices in n dimensions; a triangle on a plane, a tetrahedron in three-dimensional space and so forth. Each vertex corresponds to a value of the objective function for a particular set of parameters. The shape and size of the simplex is adjusted by varying the parameters in such a way that the value of the objective function at the highest vertex always decreases. Although the sum of squares may initially decrease rapidly, it can converge to a nonstationary point on quasiconvex problems, by an example of M. J. D. Powell.

More detailed descriptions of these, and other, methods are available, in Numerical Recipes, together with computer code in various languages.

See also

References

  1. ^ This implies that the observations are uncorrelated. If the observations are correlated, the expression applies. In this case the weight matrix should ideally be equal to the inverse of the error variance-covariance matrix of the observations.
  2. ^ In the absence of round-off error and of experimental error in the independent variable the normal equations matrix would be singular
  3. ^ Britzger, Daniel (2022). "The Linear Template Fit". Eur. Phys. J. C. 82 (8): 731. arXiv:2112.01548. Bibcode:2022EPJC...82..731B. doi:10.1140/epjc/s10052-022-10581-w.
  4. ^ a b M.J. Box, D. Davies and W.H. Swann, Non-Linear optimisation Techniques, Oliver & Boyd, 1969
  5. ^ This technique was proposed independently by Levenberg (1944), Girard (1958), Wynne (1959), Morrison (1960) and Marquardt (1963). Marquardt's name alone is used for it in much of the scientific literature. See the main article for citation references.
  6. ^ a b C.L. Lawson and R.J. Hanson, Solving Least Squares Problems, Prentice–Hall, 1974
  7. ^ R. Fletcher, UKAEA Report AERE-R 6799, H.M. Stationery Office, 1971
  8. ^ M. J. D. Powell, Computer Journal, (1964), 7, 155.

Further reading

Read other articles:

WinongKecamatanNegara IndonesiaProvinsiJawa TengahKabupatenPatiPemerintahan • Camat-Populasi • Total- jiwaKode Kemendagri33.18.04 Kode BPS3318040 Luas- km²Desa/kelurahan-Winong Untuk kegunaan lain, lihat Winong (disambiguasi). Rumah kampung di Winong Winong (Jawa: ꦮꦶꦤꦺꦴꦁ) adalah suatu kecamatan di Kabupaten Pati, Jawa Tengah, Indonesia. Kecamatan Winong terletak di tenggara Kabupaten Pati. Sebagian wilayahnya berada di Pegunungan Kapur Utara. Dahu...

 

Lambang Valence-en-Brie. Valence-en-BrieNegaraPrancisArondisemenMelunKantonLe Châtelet-en-BrieAntarkomuneCommunauté de communes de la Région du Châtelet-en-BriePemerintahan • Wali kota (2008-2014) Serge Vaucouleur • Populasi1583Kode INSEE/pos77480 / 2 Population sans doubles comptes: penghitungan tunggal penduduk di komune lain (e.g. mahasiswa dan personil militer). Valence-en-Brie merupakan sebuah komune di departemen Seine-et-Marne di region Île-de-France d...

 

Laut Selatan beralih ke halaman ini. Untuk kegunaan lain, lihat Laut Selatan (disambiguasi). Samudra Antartika, seperti yang digambarkan oleh draf edisi ke-4 dari Limits of Oceans and Seas Organisasi Hidrografi Internasional (2002) Gambaran umum Konvergensi Antarktika, terkadang digunakan oleh para ilmuwan sebagai demarkasi Samudra Selatan Samudra di Bumi Arktik Atlantik Hindia Pasifik Selatan Samudra Dunialbs Samudra Selatan atau Samudra Antarktika atau Lautan Selatan[1] adalah massa...

Third-party Amiga hardware company Great Valley ProductsGVP A530 Turbo with EC030 @ 40 MHz, 68882, 2x 4 MB 60 ns, PC-286IndustryComputer HardwareDefunctJuly 1995FateLiquidatedSuccessorGVP-MKey peopleGerard Bucas (President)ProductsAmiga 500 and Amiga 2000 hardware, GVP A530 Turbo Great Valley Products is a former third-party Amiga hardware supplier. The company was known for CPU accelerators and SCSI host adapters for the Amiga 500 and Amiga 2000 computer series. The company liquidated itself...

 

Dalam mitologi Yunani, Horai (Yunani: Ὧραι) adalah tiga dewi yang mengendalikan ketertiban kehidupan. Mereka adalah putri dari Zeus dan Themis, dan saudara tiri dari Moirai. Dalam mitologi Eirene, salah satu Horai Para Horai disebut di Illiad sebagai penjaga gerbang awan Zeus. Sedangkan menurut Hesiodos, Zeus menikahi Themis yang kemudian melahirkan Eunomia, Dike, dan Eirene, dewi-dewi hukum dan ketertiban yang menjaga stabilitas masyarakat. Mereka disembah terutama di kota Athena, A...

 

Boing S.p.ALogo Stato Italia Forma societariaSocietà per azioni Fondazione5 novembre 2004 Sede principaleCologno Monzese GruppoMediaset (51%) Warner Bros. Discovery (49%) Persone chiaveMarcello Dolores (amministratore delegato) Silvio Carini (presidente) SettoreMedia Prodotticanali televisivi Dipendenti11-50 (2023) Sito webwww.boingtv.it/ Modifica dati su Wikidata · Manuale Boing S.p.A. è una società italiana in joint venture tra Mediaset e Turner Broadcasting System, costituita...

School in Kadayiruppu, Kolenchery, IndiaSt. Peters School, KadayiruppuLocationKadayiruppu, Kolenchery 682311IndiaCoordinates9°59′45″N 76°27′54″E / 9.9958°N 76.4651°E / 9.9958; 76.4651InformationEstablished1967 (1967)PrincipalSonia Susan VarkeyFaculty107GradesK - 12GenderCoeducationalEnrollment2500LanguageEnglish, Hindi, Malayalam, FrenchCampus size15 acres (6.1 ha), 28 centAffiliationCentral Board of Secondary EducationWebsitestpeterskadayiruppu....

 

Historic railway in Ontario, Canada Several terms redirect here. For other uses, see Great Western Railway (disambiguation). Great Western RailwayThe Great Western Railway's Spitfire locomotive.OverviewHeadquartersHamilton, OntarioLocaleSouthwestern Ontario, Niagara PeninsulaDates of operation1853 (1853)–1882 (1882)TechnicalTrack gauge4 ft 8+1⁄2 in (1,435 mm) standard gaugePrevious gaugeBuilt to 5 ft 6 in (1,676 mm) but converted b...

 

This is a list of Malayalam films that released in 2022. Malayalam cinema Before 1960 1960s 1960 1961 1962 1963 19641965 1966 1967 1968 1969 1970s 1970 1971 1972 1973 19741975 1976 1977 1978 1979 1980s 1980 1981 1982 1983 19841985 1986 1987 1988 1989 1990s 1990 1991 1992 1993 19941995 1996 1997 1998 1999 2000s 2000 2001 2002 2003 20042005 2006 2007 2008 2009 2010s 2010 2011 2012 2013 20142015 2016 2017 2018 2019 2020s 2020 2021 2022 2023 2024 vte January–March Opening Title Director Cast P...

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

 

London theatre Cockpit-in-Court from an engraving by Mazell in Pennant's London, reproduced in the London Topographical Record (1903) The Cockpit-in-Court (also known as the Royal Cockpit) was an early theatre in London, located at the Palace of Whitehall, next to St. James's Park, now the site of 70 Whitehall, in Westminster. The structure was originally built by Henry VIII, after he had acquired Cardinal Wolsey's York Place to the north of the Palace of Westminster, following the Cardinal's...

 

2001 single by A-TeensSugar RushSingle by A-Teensfrom the album Teen Spirit B-sideGive It UpReleased18 June 2001Recorded2000GenreLatin popLength3:03 (Album version) 3:02 (Radio edit)LabelUniversal Music GroupSongwriter(s)ThomanderWikströmProducer(s)Tremolo (Tobias Lindell and Peter Björklund)A-Teens singles chronology Halfway Around the World(2001) Sugar Rush(2001) Heartbreak Lullaby(2001) Music videoA*Teens - Sugar Rush on YouTube Sugar Rush is the third single from the album Teen Spir...

24e cérémonie des Golden Globes Golden Globes Organisée par la Hollywood Foreign Press Association Détails Date 15 février 1967 Lieu Los Angeles États-Unis Résumé Meilleur film dramatique Un homme pour l'éternité Meilleur film musical ou de comédie Les Russes arrivent Meilleure série Les Espions Chronologie 23e cérémonie des Golden Globes 25e cérémonie des Golden Globes modifier  La 24e cérémonie des Golden Globes a eu lieu le 15 février 1967, réc...

 

Shanina Shaik2008Lahir11 Februari 1991 (umur 33)Melbourne, AustraliaKebangsaanAustraliaTahun aktif2008–sekarangInformasi modelingWarna rambutCoklat GelapWarna mataHijau HazelManajerIMG Models (Australia, NY, London)Traffic Models (Barcelona) Shanina Shaik (kelahiran 11 Februari 1991)[1][2] adalah seorang peragawati dari Melbourne, Australia. Kehidupan awal Ibu Shanina adalah orang keturunan Lithuania dan Australian, sementara ayahnya merupakan keturunan Pakistan d...

 

American singer Frank CrumitBorn(1889-09-26)September 26, 1889Jackson, Ohio, U.S.DiedSeptember 7, 1943(1943-09-07) (aged 53)New York City, New York, U.S.OccupationsSingercomposervaudevillianradio entertainer Frank Crumit (September 26, 1889 – September 7, 1943) was an American singer, composer, radio entertainer and vaudeville star. He shared his radio programs with his wife, Julia Sanderson, and the two were sometimes called the ideal couple of the air.[1] Biography Crumit...

Bagian dari seriAgama di Jawa Jawa Jawa Kebudayaan Jawa Orang Jawa Agama di Indonesia Keagamaan asli Kejawen (Pangestu • Perjalanan • Sapta Darma • Subud • Sumarah • lainnya) Hinduisme Hinduisme di Jawa Buddhisme Buddhisme di Indonesia Sanghyang Adi Buddha Ashin Jinarakkhita Islam Penyebaran Islam di Indonesia Santri Abangan Wali Sanga Nahdlatul Ulama Muhammadiyah Kekristenan Kekristenan di Indonesia Misionaris Sabda Allah Gereja Ganjuran Khonghucu Agama Khonghucu pada zaman Orde Ba...

 

Questa voce sull'argomento reti televisive filippine è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Radio Philippines NetworkPaeseFilippine LinguaInglese (principale) e filippino Tipogeneralista Data di lancio29 giugno 1960 EditoreNine Media CorporationFar East Managers and Investors (32%)Governo filippino (20%) SitoSito ufficiale Radio Philippines Network, Inc. (RPN) è una rete televisiva terrestre delle Filippine in la cui proprietà è condivisa ...

 

Richard Ayoade Richard Ellef Ayoade (IPA: [aɪ.oʊˈɑːdiː]) (Londra, 23 maggio 1977) è un attore, regista e comico britannico. È noto soprattutto per aver interpretato il ruolo di Maurice Moss nella serie televisiva IT Crowd. Indice 1 Biografia 2 Filmografia parziale 2.1 Regista 2.1.1 Cinema 2.1.2 Televisione 2.2 Attore 2.2.1 Cinema 2.2.2 Televisione 2.2.3 Videoclip 2.3 Doppiatore 3 Doppiatori italiani 4 Premi e candidature 5 Note 6 Altri progetti 7 Collegamenti esterni Biografia Nasce a...

German civil engineer and inventor Not to be confused with Heinrich Gerber (architect) (1831-1920), the German architect This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Heinrich Gerber civil engineer – news · newspapers · books · scholar · JSTOR (January 2014) (Learn how and when to remove this message)...

 

Конец доказательства ∎ Изображение ◄ ∊ ∋ ∌ ∍ ∎ ∏ ∐ ∑ − ► Характеристики Название end of proof Юникод U+220E HTML-код ∎ или ∎ UTF-16 0x220E URL-код %E2%88%8E Символ конца доказательства (∎, «символ Халмоша», англ. Halmos, tombstone «надгробный камень») — типографский симв�...