Weighted least squares

Weighted least squares (WLS), also known as weighted linear regression,[1][2] is a generalization of ordinary least squares and linear regression in which knowledge of the unequal variance of observations (heteroscedasticity) is incorporated into the regression. WLS is also a specialization of generalized least squares, when all the off-diagonal entries of the covariance matrix of the errors, are null.

Formulation

The fit of a model to a data point is measured by its residual, , defined as the difference between a measured value of the dependent variable, and the value predicted by the model, :

If the errors are uncorrelated and have equal variance, then the function is minimised at , such that .

The Gauss–Markov theorem shows that, when this is so, is a best linear unbiased estimator (BLUE). If, however, the measurements are uncorrelated but have different uncertainties, a modified approach might be adopted. Aitken showed that when a weighted sum of squared residuals is minimized, is the BLUE if each weight is equal to the reciprocal of the variance of the measurement

The gradient equations for this sum of squares are

which, in a linear least squares system give the modified normal equations, The matrix above is as defined in the corresponding discussion of linear least squares.

When the observational errors are uncorrelated and the weight matrix, W=Ω−1, is diagonal, these may be written as

If the errors are correlated, the resulting estimator is the BLUE if the weight matrix is equal to the inverse of the variance-covariance matrix of the observations.

When the errors are uncorrelated, it is convenient to simplify the calculations to factor the weight matrix as . The normal equations can then be written in the same form as ordinary least squares:

where we define the following scaled matrix and vector:

This is a type of whitening transformation; the last expression involves an entrywise division.

For non-linear least squares systems a similar argument shows that the normal equations should be modified as follows.

Note that for empirical tests, the appropriate W is not known for sure and must be estimated. For this feasible generalized least squares (FGLS) techniques may be used; in this case it is specialized for a diagonal covariance matrix, thus yielding a feasible weighted least squares solution.

If the uncertainty of the observations is not known from external sources, then the weights could be estimated from the given observations. This can be useful, for example, to identify outliers. After the outliers have been removed from the data set, the weights should be reset to one.[3]

Motivation

In some cases the observations may be weighted—for example, they may not be equally reliable. In this case, one can minimize the weighted sum of squares: where wi > 0 is the weight of the ith observation, and W is the diagonal matrix of such weights.

The weights should, ideally, be equal to the reciprocal of the variance of the measurement. (This implies that the observations are uncorrelated. If the observations are correlated, the expression applies. In this case the weight matrix should ideally be equal to the inverse of the variance-covariance matrix of the observations).[3] The normal equations are then:

This method is used in iteratively reweighted least squares.

Solution

Parameter errors and correlation

The estimated parameter values are linear combinations of the observed values

Therefore, an expression for the estimated variance-covariance matrix of the parameter estimates can be obtained by error propagation from the errors in the observations. Let the variance-covariance matrix for the observations be denoted by M and that of the estimated parameters by Mβ. Then

When W = M−1, this simplifies to

When unit weights are used (W = I, the identity matrix), it is implied that the experimental errors are uncorrelated and all equal: M = σ2I, where σ2 is the a priori variance of an observation. In any case, σ2 is approximated by the reduced chi-squared :

where S is the minimum value of the weighted objective function:

The denominator, , is the number of degrees of freedom; see effective degrees of freedom for generalizations for the case of correlated observations.

In all cases, the variance of the parameter estimate is given by and the covariance between the parameter estimates and is given by . The standard deviation is the square root of variance, , and the correlation coefficient is given by . These error estimates reflect only random errors in the measurements. The true uncertainty in the parameters is larger due to the presence of systematic errors, which, by definition, cannot be quantified. Note that even though the observations may be uncorrelated, the parameters are typically correlated.

Parameter confidence limits

It is often assumed, for want of any concrete evidence but often appealing to the central limit theorem—see Normal distribution#Occurrence and applications—that the error on each observation belongs to a normal distribution with a mean of zero and standard deviation . Under that assumption the following probabilities can be derived for a single scalar parameter estimate in terms of its estimated standard error (given here):

  • 68% that the interval encompasses the true coefficient value
  • 95% that the interval encompasses the true coefficient value
  • 99% that the interval encompasses the true coefficient value

The assumption is not unreasonable when n >> m. If the experimental errors are normally distributed the parameters will belong to a Student's t-distribution with n − m degrees of freedom. When n ≫ m Student's t-distribution approximates a normal distribution. Note, however, that these confidence limits cannot take systematic error into account. Also, parameter errors should be quoted to one significant figure only, as they are subject to sampling error.[4]

When the number of observations is relatively small, Chebychev's inequality can be used for an upper bound on probabilities, regardless of any assumptions about the distribution of experimental errors: the maximum probabilities that a parameter will be more than 1, 2, or 3 standard deviations away from its expectation value are 100%, 25% and 11% respectively.

Residual values and correlation

The residuals are related to the observations by

where H is the idempotent matrix known as the hat matrix:

and I is the identity matrix. The variance-covariance matrix of the residuals, M r is given by

Thus the residuals are correlated, even if the observations are not.

When ,

The sum of weighted residual values is equal to zero whenever the model function contains a constant term. Left-multiply the expression for the residuals by XT WT:

Say, for example, that the first term of the model is a constant, so that for all i. In that case it follows that

Thus, in the motivational example, above, the fact that the sum of residual values is equal to zero is not accidental, but is a consequence of the presence of the constant term, α, in the model.

If experimental error follows a normal distribution, then, because of the linear relationship between residuals and observations, so should residuals,[5] but since the observations are only a sample of the population of all possible observations, the residuals should belong to a Student's t-distribution. Studentized residuals are useful in making a statistical test for an outlier when a particular residual appears to be excessively large.

See also

References

  1. ^ "Weighted regression".
  2. ^ "Visualize a weighted regression".
  3. ^ a b Strutz, T. (2016). "3". Data Fitting and Uncertainty (A practical introduction to weighted least squares and beyond). Springer Vieweg. ISBN 978-3-658-11455-8.
  4. ^ Mandel, John (1964). The Statistical Analysis of Experimental Data. New York: Interscience.
  5. ^ Mardia, K. V.; Kent, J. T.; Bibby, J. M. (1979). Multivariate analysis. New York: Academic Press. ISBN 0-12-471250-9.

Read other articles:

Koordinat: 37°1′37.37″N 21°41′41.41″E / 37.0270472°N 21.6948361°E / 37.0270472; 21.6948361 Tempat pemandian di Istana Nestor Istana Nestor (bahasa Yunani: Ανάκτορο του Νέστορα (Dhimotiki); Ἀνάκτορον Νέστορος[1] (Katharevousa)) adalah sebuah pusat penting pada zaman Peradaban Mikenai, yang tercatat di dua wiracarita Homeros berjudul Odisseia dan Ilias sebagai kerajaan berpasir Pylos yang dimiliki oleh Nestor.&#...

 

 

HappinessCharge PreCure!ハピネスチャージプリキュア!(Hapinesu Chāji Purikyua!)GenreMahou shoujo Seri animeSutradaraTatsuya NagamineProduserShigehaki DohiTomoko TakahashiHiroaki ShibataSkenarioYoshimi NaritaMusikHiroshi TakakiStudioToei AnimationPelisensiNA Toei Animation[1]SaluranasliANN (ABC)Tayang 2 Februari 2014 – 25 Januari 2015Episode49 (Daftar episode) MangaPengarangIzumi TodoIlustratorFutago KamikitaPenerbitKodanshaMajalahNakayoshiDemografiShōjoTerbitMaret 2014 ...

 

 

Perang UruguayBenteng di kota Uruguay Paysandú setelah diserbu oleh tentara Brasil dan Colorado, 1865Tanggal10 Agustus 1864 – 20 Februari 1865 (6 bulan, 1 minggu dan 3 hari)Lokasi Uruguay Kekaisaran Brasil Hasil Kemenangan Brasil–ColoradoPihak terlibat Kekaisaran Brasil Colorado Partai Unitarian Argentina (keterlibatan tacit) Uruguay Partai Blanco Partai Federalis Tokoh dan pemimpin Pedro II Viscount Tamandaré Baron São Gabriel Venancio Flores Bartolo...

Chilean football manager In this Spanish name, the first or paternal surname is Núñez and the second or maternal family name is Rojas. Sebastián Núñez Núñez as manager of Always Ready in 2019Personal informationFull name Sebastián Emilio Núñez RojasDate of birth (1982-03-13) 13 March 1982 (age 42)Place of birth Concón, ChileManagerial careerYears Team2008–2013 Santiago Wanderers (youth)2014 Huehueteco [es] (assistant)2015 Deportivo Xinabajul (assistant)2...

 

 

Air Mata Terakhir BundaPoster filmSutradaraEndri PelitaProduserErna PelitaSkenarioEndri PelitaDanial RifkiBerdasarkanAir Mata Terakhir Bundaoleh Kirana KejoraPemeranVino G. BastianHappy SalmaRizky HanggonoEndy ArfianMamiek PrakosoTabah PenemuanKhalifa HisyamSean HasyimIlman LazulfaReza FarhanDistributorRK 23 PicturesTanggal rilis3 Oktober 2013 (2013-10-03)Durasi100 menitNegara IndonesiaBahasaBahasa IndonesiaBahasa Jawa Air Mata Terakhir Bunda (AMTB) adalah film drama Indonesia yang diril...

 

 

Francesco Moranino Deputato dell'Assemblea CostituenteGruppoparlamentareComunista CollegioTorino I Incarichi parlamentariSOTTOSEGRETARIO DI STATO ALLA DIFESA Sito istituzionale Deputato della Repubblica ItalianaLegislaturaI, II GruppoparlamentareComunista CoalizioneFronte Democratico Popolare CollegioTorino Incarichi parlamentariI COMMISSIONE (AFFARI INTERNI), V COMMISSIONE (DIFESA), VI COMMISSIONE (ISTRUZIONE E BELLE ARTI), X COMMISSIONE (INDUSTRIA E COMMERCIO) Sito istituzionale Senatore de...

Brandon RouthRouth in 2007LahirBrandon James RouthTahun aktif1999—Suami/istriCourtney Ford (2007— )PenghargaanSaturn Award for Best Actor 2006 Superman ReturnsSitus webhttp://www.brandonrouth.com Brandon Routh (lahir 9 Oktober 1979) merupakan seorang aktor berkebangsaan Amerika Serikat yang menjadi terkenal saat bermain di film utamanya seperti Superman Returns. Dia dilahirkan di Des Moines, Iowa. Dia berkarier di dunia film sejak tahun 1999. Filmografi Karla (2006) — Tim Peters D...

 

 

Синелобый амазон Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:ЧетвероногиеКлада:АмниотыКлада:ЗавропсидыКласс:Пт�...

 

 

Sceaux 行政国 フランス地域圏 (Région) イル=ド=フランス地域圏県 (département) オー=ド=セーヌ県郡 (arrondissement) アントニー郡小郡 (canton) 小郡庁所在地INSEEコード 92071郵便番号 92330市長(任期) フィリップ・ローラン(2008年-2014年)自治体間連合 (fr) メトロポール・デュ・グラン・パリ人口動態人口 19,679人(2007年)人口密度 5466人/km2住民の呼称 Scéens地理座標 北緯48度4...

فلوكسوريدين الاسم النظامي 5-Fluoro-1-[4-hydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl]-1H-pyrimidine-2,4-dione يعالج سرطان الأمعاء الغليظة  [لغات أخرى]‏،  وسرطان المعدة  اعتبارات علاجية ASHPDrugs.com أفرودة مدلاين بلس a682006 فئة السلامة أثناء الحمل D (الولايات المتحدة) طرق إعطاء الدواء حقن في الشريان مع...

 

 

Elizabeth Dole Mary Elizabeth Liddy Alexander Hanford Dole (lahir 29 Juli 1936)[1] adalah seorang politikus dan penulis Amerika Serikat yang menjabat dalam masa pemerintahan presiden Richard Nixon, Ronald Reagan, dan George H. W. Bush. Ia juga menjabat dalam Senat Amerika Serikat dari 2003 sampai 2009. Catatan kaki ^ Mary Ella Cathey Hanford, Asbury and Hanford Families: Newly Discovered Genealogical Information The Historical Trail 33 (1996), pp. 44–45, 49. Pranala luar Cari tahu m...

 

 

Росіяни, які полягли за Україну під час російсько-української війни з 2014 року. Зміст 1 Війна на сході України (з 2014) 2 Російське вторгнення в Україну (з лютого 2022) 3 Див. також 4 Примітки Війна на сході України (з 2014) п/п Ім'я Місце смерті Військове формування Час Причина 1 Ящук П�...

「離島」、「飛地」、あるいは「自治州」とは異なります。 海外領土・自治領の一覧(かいがいりょうど・じちりょうのいちらん)は、世界に存在する「独立国家以外の地域」の一覧である。 海外領土・自治領を示した世界地図       豪       智       丁       仏     ...

 

 

NBA Summer League Généralités Sport Basket-ball Création 2004 Organisateur(s) NBA Périodicité annuelle Lieu(x) États-Unis Participants 30 équipes NBA Statut des participants Professionnel Palmarès Tenant du titre Cavaliers de Cleveland Pour la compétition en cours voir : NBA Summer League 2023 modifier La National Basketball Association Summer League (également raccourcie NBA Summer League pour la différencier des autres ligues d'été professionnelles) est une période...

 

 

Nord-Korean long-distance runner and politician In this Korean name, the family name is Kim. Kim Kum-okKim at the 2012 London OlympicsPersonal informationBorn (1988-12-09) December 9, 1988 (age 35)Pyongyang, North KoreaHeight1.6 m (5 ft 3 in)Weight48 kg (106 lb)SportCountry North KoreaSportAthleticsEventMarathonTeamApril 25 Sports TeamKorean nameChosŏn'gŭl김금옥Revised RomanizationGim GeumokMcCune–ReischauerKim Kŭmok Medal record Summer Universiade 2...

Le positivisme est un courant philosophique fondé au XIXe siècle par Auguste Comte, à la fois héritier et critique des Lumières du XVIIIe siècle et qui soumet de manière rigoureuse les connaissances acquises à l'épreuve des faits. Le positivisme scientifique d'Auguste Comte s'en tient donc aux relations entre les phénomènes et ne cherche pas à connaître leur nature intrinsèque : il met l'accent sur les lois scientifiques et refuse la recherche des causes première...

 

 

储江性别男出生1916年3月 中華民國江苏省宜兴縣逝世2011年1月18日 中华人民共和国江苏省南京市国籍 中华人民共和国职业政治人物政党 中国共产党 储江(1917年3月—2011年1月18日),原名储兆瀛,男,江苏宜兴人,曾任中共苏州地委书记、中共南京市委书记,南京市革委会主任。[1] 生平 储江于1917年出生于宜兴市官林镇南庄村。1938年5月加入中国共产党,...

 

 

Pathogen-derived preparation that provides acquired immunity to an infectious disease For other uses, see Vaccine (disambiguation). VaccineSmallpox vaccine and equipment for administering itMeSHD014612[edit on Wikidata] Part of a series onVaccination General information Vaccination Vaccinator Vaccine Vaccine trial COVID-19 vaccine card Vaccines Smallpox Polio Chicken pox Measles/Mumps/Rubella Influenza Ebola COVID-19 Issues Vaccine equity Vaccine hesitancy Vaccine misinformation Anti-vacc...

Oldest known human settlement in Estonia This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Pulli settlement – news · newspapers · books · scholar · JSTOR (January 2022) (Learn how and when to remove this message) Location of Pulli (near Sindi), in Estonia Location of Pulli, in Pärnu county Drone video of sett...

 

 

Miller This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these messages) This article's tone or style may not reflect the encyclopedic tone used on Wikipedia. See Wikipedia's guide to writing better articles for suggestions. (December 2023) (Learn how and when to remove this message) This article may require copy editing for grammar, style, cohesion, tone, or spelling. You can assist by editing it. (December 2023) ...