Resampling (statistics)

In statistics, resampling is the creation of new samples based on one observed sample. Resampling methods are:

  1. Permutation tests (also re-randomization tests)
  2. Bootstrapping
  3. Cross validation
  4. Jackknife

Permutation tests

Permutation tests rely on resampling the original data assuming the null hypothesis. Based on the resampled data it can be concluded how likely the original data is to occur under the null hypothesis.

Bootstrap

The best example of the plug-in principle, the bootstrapping method.

Bootstrapping is a statistical method for estimating the sampling distribution of an estimator by sampling with replacement from the original sample, most often with the purpose of deriving robust estimates of standard errors and confidence intervals of a population parameter like a mean, median, proportion, odds ratio, correlation coefficient or regression coefficient. It has been called the plug-in principle,[1] as it is the method of estimation of functionals of a population distribution by evaluating the same functionals at the empirical distribution based on a sample.

For example,[1] when estimating the population mean, this method uses the sample mean; to estimate the population median, it uses the sample median; to estimate the population regression line, it uses the sample regression line.

It may also be used for constructing hypothesis tests. It is often used as a robust alternative to inference based on parametric assumptions when those assumptions are in doubt, or where parametric inference is impossible or requires very complicated formulas for the calculation of standard errors. Bootstrapping techniques are also used in the updating-selection transitions of particle filters, genetic type algorithms and related resample/reconfiguration Monte Carlo methods used in computational physics.[2][3] In this context, the bootstrap is used to replace sequentially empirical weighted probability measures by empirical measures. The bootstrap allows to replace the samples with low weights by copies of the samples with high weights.

Cross-validation

Cross-validation is a statistical method for validating a predictive model. Subsets of the data are held out for use as validating sets; a model is fit to the remaining data (a training set) and used to predict for the validation set. Averaging the quality of the predictions across the validation sets yields an overall measure of prediction accuracy. Cross-validation is employed repeatedly in building decision trees.

One form of cross-validation leaves out a single observation at a time; this is similar to the jackknife. Another, K-fold cross-validation, splits the data into K subsets; each is held out in turn as the validation set.

This avoids "self-influence". For comparison, in regression analysis methods such as linear regression, each y value draws the regression line toward itself, making the prediction of that value appear more accurate than it really is. Cross-validation applied to linear regression predicts the y value for each observation without using that observation.

This is often used for deciding how many predictor variables to use in regression. Without cross-validation, adding predictors always reduces the residual sum of squares (or possibly leaves it unchanged). In contrast, the cross-validated mean-square error will tend to decrease if valuable predictors are added, but increase if worthless predictors are added.[4]

Monte Carlo cross-validation

Subsampling is an alternative method for approximating the sampling distribution of an estimator. The two key differences to the bootstrap are:

  1. the resample size is smaller than the sample size and
  2. resampling is done without replacement.

The advantage of subsampling is that it is valid under much weaker conditions compared to the bootstrap. In particular, a set of sufficient conditions is that the rate of convergence of the estimator is known and that the limiting distribution is continuous. In addition, the resample (or subsample) size must tend to infinity together with the sample size but at a smaller rate, so that their ratio converges to zero. While subsampling was originally proposed for the case of independent and identically distributed (iid) data only, the methodology has been extended to cover time series data as well; in this case, one resamples blocks of subsequent data rather than individual data points. There are many cases of applied interest where subsampling leads to valid inference whereas bootstrapping does not; for example, such cases include examples where the rate of convergence of the estimator is not the square root of the sample size or when the limiting distribution is non-normal. When both subsampling and the bootstrap are consistent, the bootstrap is typically more accurate. RANSAC is a popular algorithm using subsampling.

Jackknife cross-validation

Jackknifing (jackknife cross-validation), is used in statistical inference to estimate the bias and standard error (variance) of a statistic, when a random sample of observations is used to calculate it. Historically, this method preceded the invention of the bootstrap with Quenouille inventing this method in 1949 and Tukey extending it in 1958.[5][6] This method was foreshadowed by Mahalanobis who in 1946 suggested repeated estimates of the statistic of interest with half the sample chosen at random.[7] He coined the name 'interpenetrating samples' for this method.

Quenouille invented this method with the intention of reducing the bias of the sample estimate. Tukey extended this method by assuming that if the replicates could be considered identically and independently distributed, then an estimate of the variance of the sample parameter could be made and that it would be approximately distributed as a t variate with n−1 degrees of freedom (n being the sample size).

The basic idea behind the jackknife variance estimator lies in systematically recomputing the statistic estimate, leaving out one or more observations at a time from the sample set. From this new set of replicates of the statistic, an estimate for the bias and an estimate for the variance of the statistic can be calculated.

Instead of using the jackknife to estimate the variance, it may instead be applied to the log of the variance. This transformation may result in better estimates particularly when the distribution of the variance itself may be non normal.

For many statistical parameters the jackknife estimate of variance tends asymptotically to the true value almost surely. In technical terms one says that the jackknife estimate is consistent. The jackknife is consistent for the sample means, sample variances, central and non-central t-statistics (with possibly non-normal populations), sample coefficient of variation, maximum likelihood estimators, least squares estimators, correlation coefficients and regression coefficients.

It is not consistent for the sample median. In the case of a unimodal variate the ratio of the jackknife variance to the sample variance tends to be distributed as one half the square of a chi square distribution with two degrees of freedom.

The jackknife, like the original bootstrap, is dependent on the independence of the data. Extensions of the jackknife to allow for dependence in the data have been proposed.

Another extension is the delete-a-group method used in association with Poisson sampling.

Jackknife is equivalent to the random (subsampling) leave-one-out cross-validation, it only differs in the goal.[8]

Comparison of bootstrap and jackknife

Both methods, the bootstrap and the jackknife, estimate the variability of a statistic from the variability of that statistic between subsamples, rather than from parametric assumptions. For the more general jackknife, the delete-m observations jackknife, the bootstrap can be seen as a random approximation of it. Both yield similar numerical results, which is why each can be seen as approximation to the other. Although there are huge theoretical differences in their mathematical insights, the main practical difference for statistics users is that the bootstrap gives different results when repeated on the same data, whereas the jackknife gives exactly the same result each time. Because of this, the jackknife is popular when the estimates need to be verified several times before publishing (e.g., official statistics agencies). On the other hand, when this verification feature is not crucial and it is of interest not to have a number but just an idea of its distribution, the bootstrap is preferred (e.g., studies in physics, economics, biological sciences).

Whether to use the bootstrap or the jackknife may depend more on operational aspects than on statistical concerns of a survey. The jackknife, originally used for bias reduction, is more of a specialized method and only estimates the variance of the point estimator. This can be enough for basic statistical inference (e.g., hypothesis testing, confidence intervals). The bootstrap, on the other hand, first estimates the whole distribution (of the point estimator) and then computes the variance from that. While powerful and easy, this can become highly computationally intensive.

"The bootstrap can be applied to both variance and distribution estimation problems. However, the bootstrap variance estimator is not as good as the jackknife or the balanced repeated replication (BRR) variance estimator in terms of the empirical results. Furthermore, the bootstrap variance estimator usually requires more computations than the jackknife or the BRR. Thus, the bootstrap is mainly recommended for distribution estimation."[attribution needed][9]

There is a special consideration with the jackknife, particularly with the delete-1 observation jackknife. It should only be used with smooth, differentiable statistics (e.g., totals, means, proportions, ratios, odd ratios, regression coefficients, etc.; not with medians or quantiles). This could become a practical disadvantage. This disadvantage is usually the argument favoring bootstrapping over jackknifing. More general jackknifes than the delete-1, such as the delete-m jackknife or the delete-all-but-2 Hodges–Lehmann estimator, overcome this problem for the medians and quantiles by relaxing the smoothness requirements for consistent variance estimation.

Usually the jackknife is easier to apply to complex sampling schemes than the bootstrap. Complex sampling schemes may involve stratification, multiple stages (clustering), varying sampling weights (non-response adjustments, calibration, post-stratification) and under unequal-probability sampling designs. Theoretical aspects of both the bootstrap and the jackknife can be found in Shao and Tu (1995),[10] whereas a basic introduction is accounted in Wolter (2007).[11] The bootstrap estimate of model prediction bias is more precise than jackknife estimates with linear models such as linear discriminant function or multiple regression.[12]

See also

References

  1. ^ a b Logan, J. David and Wolesensky, Willian R. Mathematical methods in biology. Pure and Applied Mathematics: a Wiley-interscience Series of Texts, Monographs, and Tracts. John Wiley& Sons, Inc. 2009. Chapter 6: Statistical inference. Section 6.6: Bootstrap methods
  2. ^ Del Moral, Pierre (2004). Feynman-Kac formulae. Genealogical and interacting particle approximations. Probability and its Applications. Springer. p. 575. doi:10.1007/978-1-4684-9393-1. ISBN 978-1-4419-1902-1. Series: Probability and Applications
  3. ^ Del Moral, Pierre (2013). Mean field simulation for Monte Carlo integration. Chapman & Hall/CRC Press. p. 626. Monographs on Statistics & Applied Probability
  4. ^ Verbyla, D. (1986). "Potential prediction bias in regression and discriminant analysis". Canadian Journal of Forest Research. 16 (6): 1255–1257. doi:10.1139/x86-222.
  5. ^ Quenouille, M. H. (1949). "Approximate Tests of Correlation in Time-Series". Journal of the Royal Statistical Society, Series B. 11 (1): 68–84. doi:10.1111/j.2517-6161.1949.tb00023.x. JSTOR 2983696.
  6. ^ Tukey, J. W. (1958). "Bias and Confidence in Not-quite Large Samples (Preliminary Report)". Annals of Mathematical Statistics. 29 (2): 614. JSTOR 2237363.
  7. ^ Mahalanobis, P. C. (1946). "Proceedings of a Meeting of the Royal Statistical Society held on July 16th, 1946". Journal of the Royal Statistical Society. 109 (4): 325–370. JSTOR 2981330.
  8. ^ Encyclopedia of Bioinformatics and Computational Biology: ABC of Bioinformatics. Elsevier. 2018-08-21. p. 544. ISBN 978-0-12-811432-2.
  9. ^ Shao, J. and Tu, D. (1995). The Jackknife and Bootstrap. Springer-Verlag, Inc. pp. 281.
  10. ^ Shao, J.; Tu, D. (1995). The Jackknife and Bootstrap. Springer.
  11. ^ Wolter, K. M. (2007). Introduction to Variance Estimation (Second ed.). Springer.
  12. ^ Verbyla, D.; Litvaitis, J. (1989). "Resampling methods for evaluating classification accuracy of wildlife habitat models". Environmental Management. 13 (6): 783–787. Bibcode:1989EnMan..13..783V. doi:10.1007/bf01868317. S2CID 153448048.

Literature

  • Good, P. (2006) Resampling Methods. 3rd Ed. Birkhauser.
  • Wolter, K.M. (2007). Introduction to Variance Estimation. 2nd Edition. Springer, Inc.
  • Pierre Del Moral (2004). Feynman-Kac formulae. Genealogical and Interacting particle systems with applications, Springer, Series Probability and Applications. ISBN 978-0-387-20268-6
  • Pierre Del Moral (2013). Del Moral, Pierre (2013). Mean field simulation for Monte Carlo integration. Chapman & Hall/CRC Press, Monographs on Statistics and Applied Probability. ISBN 9781466504059
  • Jiang W, Simon R. A comparison of bootstrap methods and an adjusted bootstrap approach for estimating the prediction error in microarray classification. Stat Med. 2007 Dec 20;26(29):5320-34. doi: 10.1002/sim.2968. PMID: 17624926. https://brb.nci.nih.gov/techreport/prederr_rev_0407.pdf

Software

Read other articles:

SpaceX Crew Dragon spacecraft EndeavourEndeavour at Cape Canaveral in April 2020.TypeSpace capsuleClassDragon 2EponymSpace Shuttle EndeavourSerial no.C206OwnerSpaceXManufacturerSpaceXSpecificationsDimensions4.4 m × 3.7 m (14 ft × 12 ft)PowerSolar panelRocketFalcon 9 Block 5HistoryLocationInternational Space StationFirst flight30 May – 2 August 2020SpaceX Demo-2Last flight4 March 2024 - PresentSpaceX Crew-8Flights5Flight timeCurrently in orbitDragon 2s←&...

 

 

Les Celtes en Europe, dans le passé et aujourd'hui : Noyau territorial Hallstatt, au VIe siècle av. J.-C. Expansion celtique maximale, en 275 av. J.-C. Domaine lusitanien de l'Ibérie où la présence celtique est incertaine Zones où les langues celtiques restent largement parlées aujourd'hui La mythologie celtique est constitutive de la religion des Celtes de la Protohistoire/Antiquité. Nos connaissances sont lacunaires La problématique des sources Sur le chaudron de Gun...

 

 

Lokasi kota Odense di Denmark Odense merupakan kota terbesar ketiga Denmark. Penduduknya berjumlah 152.060 jiwa (2006). Penulis terkenal asal Denmark, Hans Christian Andersen, lahir di kota ini. Kota ini terletak di Munisipalitas Odense. Didirikan pertama kali pada tahun 988. Lihat pula Bandara Odense Pranala luar Wikimedia Commons memiliki media mengenai Odense. Odense - City of Hans Christian Andersen Diarsipkan 2007-09-13 di Wayback Machine. The City of Odense Odense City Museums Odense Sy...

Safe Schools Coalition AustraliaAbbreviationSSCA[1]Formation21 October 2010Founded atVictoriaDissolved1 December 2017; 6 years ago (2017-12-01)TypeNGOPurposeThe safety and well-being of same sex attracted, intersex and gender diverse students, staff and families.HeadquartersMelbourneMembership (2017) of SSCA, 52[2]SSCA Program DirectorCraig Comrie[3][4] The Safe Schools Coalition Australia (SSCA) was a group of organisations in Australia focu...

 

 

ХристианствоБиблия Ветхий Завет Новый Завет Евангелие Десять заповедей Нагорная проповедь Апокрифы Бог, Троица Бог Отец Иисус Христос Святой Дух История христианства Апостолы Хронология христианства Раннее христианство Гностическое христианство Вселенские соборы Н...

 

 

Pour les articles homonymes, voir capucin. Frères mineurs capucins Devise : Pax et bonum (Paix et bien) Ordre de droit pontifical Approbation pontificale 1528 par le pape Clément VII Institut apostolique Type ordre mendiant Règle Règle de saint François Structure et histoire Fondation 1525 Fondateur Matthieu de Baschi Abréviation O.F.M. Cap. Site web Site international Liste des ordres religieux Les Frères mineurs capucins (en latin : Ordo Fratrum Minorum Capuccinorum, abré...

Навчально-науковий інститут інноваційних освітніх технологій Західноукраїнського національного університету Герб навчально-наукового інституту інноваційних освітніх технологій ЗУНУ Скорочена назва ННІІОТ ЗУНУ Основні дані Засновано 2013 Заклад Західноукраїнський �...

 

 

Organic compound (C6H5OH) This article is about the molecule. For the group of chemicals that contains a phenol group, see Phenols. Carbolic acid redirects here. Not to be confused with carbonic acid or carboxylic acid. Phenol Names Preferred IUPAC name Phenol[1] Systematic IUPAC name Benzenol Other names Carbolic acidPhenolic acidPhenylic acidHydroxybenzenePhenic acidPhenyl alcoholPhenyl hydroxide Identifiers CAS Number 108-95-2 Y 3D model (JSmol) Interactive image ChEBI CHEBI:1...

 

 

20th-century Catholic nuns and martyrs Blessed Martyrs of DrinaPainting of the Holy Martyrs of Drina, with the Holy Ghost above them.MartyrsBornAustria-HungaryDiedGoražde, Independent State of CroatiaVenerated inCatholic ChurchBeatified24 September 2011, Sarajevo, Bosnia and Herzegovina by Cardinal Angelo Amato (on behalf of Pope Benedict XVI)Major shrineDrina River, by the town of GoraždeFeast15 DecemberAttributesDoveMartyr's palmReligious habitPatronageNuns, the sick, Slovenia, Croat...

Non-metropolitan district in EnglandVale of White Horse Vale of White Horse DistrictNon-metropolitan districtCharacteristic landscape of farmland, hills and woodlandsVale of White Horse shown within OxfordshireSovereign stateUnited KingdomConstituent countryEnglandRegionSouth East EnglandNon-metropolitan countyOxfordshireHistoric countyBerkshireStatusNon-metropolitan districtAdmin HQMilton, Vale of White HorseIncorporated1 April 1974; 50 years ago (1 April 1974)Government �...

 

 

نادي سبال اللقب Gli Spallini I Biancazzurri (الأبيض والأزرق)[1] Gli Estensi تأسس عام 1907 (بإسم Società Polisportiva Ars et Labor) 2005 (أعيد تأسيسه) 2012 (أعيد تأسيسه) الملعب ملعب باولو مازا، فرارة، إيطاليا البلد إيطاليا  الدوري الدوري الإيطالي الدرجة الثانية 2019–20 الدوري الإيطالي، المركز 20 من أصل 20 (هبط) ال...

 

 

Johann Wilhelm Hittorf (27 Maret 1824 – 28 November 1914) adalah seorang fisikawan Jerman yang lahir di Bonn dan meninggal di Münster, Jerman. Johann Wilhelm HittorfJohann Wilhelm Hittorf ca 1904Lahir27 Maret 1824Bonn, Provinsi Rhein, Kerajaan PrusiaMeninggal28 November 1914 (umur 90)Münster, Provinsi Westphalia, Kekaisaran JermanKebangsaanJermanDikenal atasTabung Crookes–HittorfFosfor logam HittorfNomor transpor ionPenghargaanMedal Hughes (1903)Karier ilmiahBidangFisika Hittorf ad...

Head of the Catholic Church from 417 to 418 Pope SaintZosimusBishop of RomeChurchCatholic ChurchPapacy began18 March 417Papacy ended26 December 418PredecessorInnocent ISuccessorBoniface IPersonal detailsBornMesoraca, Roman EmpireDied(418-12-26)26 December 418Rome, Western Roman EmpireSainthoodFeast day27 December Pope Zosimus was the bishop of Rome from 18 March 417 to his death on 26 December 418.[1] He was born in Mesoraca, Calabria.[2] Zosimus took a decided part in the pro...

 

 

Segment of time corresponding to a specific number of beats For other uses, see Bar (disambiguation). Types of bar lines In musical notation, a bar (or measure) is a segment of music bounded by vertical lines, known as bar lines (or barlines), usually indicating one of more recurring beats. The length of the bar, measured by the number of note values it contains, is normally indicated by the time signature. Types of bar lines Regular bar lines consist of a thin vertical line extending from th...

 

 

尚其亨大清福建布政使籍貫奉天海城旗籍漢軍鑲藍旗字號字會臣、惠丞,號達庵出生咸丰八年十二月十三日(1859年1月16日)逝世民国九年八月初十日(1920年9月21日)出身 光緒十八年壬辰科同進士出身 尚其亨(1859年1月16日—1920年9月21日),字惠丞,一字伯恒,号会臣,晚号达庵,奉天海城人,隸漢軍鑲藍旗。平南敬亲王尚可喜第七子和硕额驸尚之隆的八世孙。清末政治人�...

Group of anti-Habsburg insurgents in the Kingdom of Hungary (1671–1711) Kurucs redirects here. For other uses, see Kurucs (disambiguation). Kuruc and Labanc, by Viktor Madarász (depicting brothers fighting on opposite sides) Kuruc (Hungarian: [ˈkurut͡s], plural kurucok[a]), also spelled kurutz,[2][3][4] refers to a group of armed anti-Habsburg insurgents in the Kingdom of Hungary between 1671 and 1711. Over time, the term kuruc has come to designat...

 

 

Oldest licensed pub in Shrewsbury This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Golden Cross, Shrewsbury – news · newspapers · books · scholar · JSTOR (March 2013) (Learn how and when to remove this message) 52°42′24″N 2°45′11″W / 52.7067°N 2.7531°W / 52.7067; -2.7531 Go...

 

 

Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article ne respecte pas la neutralité de point de vue (février 2024). Considérez son contenu avec précaution et/ou discutez-en. Il est possible de préciser les sections non neutres en utilisant {{section non neutre}} et de souligner les passages problématiques avec {{passage non neutre}}. Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article ne cite pas suffisammen...

International airport in Spain For the airport serving Santiago, Chile, see Arturo Merino Benítez International Airport. For the airport serving Santiago, Brazil, see Santiago Airport (Brazil). Santiago–Rosalía de Castro AirportAeropuerto de Santiago–Rosalía de CastroAeroporto de Santiago–Rosalía de CastroIATA: SCQICAO: LESTWMO: 8041SummaryAirport typePublic/MilitaryOwner/OperatorAenaServesSantiago, Galicia, SpainLocationSantiago de CompostelaFocus city for Ryanair Vueling Built1932...

 

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Vegen Åt Deg – news · newspapers · books · scholar · JSTOR (March 2021) (Learn how and when to remove this message) 2012 studio album by Heidi SkjerveVegen Åt DegStudio album by Heidi SkjerveReleasedNovember 28, 2012 (2012-11-28)GenreJ...