Early stopping

In machine learning, early stopping is a form of regularization used to avoid overfitting when training a model with an iterative method, such as gradient descent. Such methods update the model to make it better fit the training data with each iteration. Up to a point, this improves the model's performance on data outside of the training set (e.g., the validation set). Past that point, however, improving the model's fit to the training data comes at the expense of increased generalization error. Early stopping rules provide guidance as to how many iterations can be run before the learner begins to over-fit. Early stopping rules have been employed in many different machine learning methods, with varying amounts of theoretical foundation.

Background

This section presents some of the basic machine-learning concepts required for a description of early stopping methods.

Overfitting

Figure 1.  The green line represents an overfitted model and the black line represents a regularized model. While the green line best follows the training data, it is too dependent on that data and it is likely to have a higher error rate on new unseen data illustrated by black-outlined dots, compared to the black line.

Machine learning algorithms train a model based on a finite set of training data. During this training, the model is evaluated based on how well it predicts the observations contained in the training set. In general, however, the goal of a machine learning scheme is to produce a model that generalizes, that is, that predicts previously unseen observations. Overfitting occurs when a model fits the data in the training set well, while incurring larger generalization error.

Regularization

Regularization, in the context of machine learning, refers to the process of modifying a learning algorithm so as to prevent overfitting. This generally involves imposing some sort of smoothness constraint on the learned model.[1] This smoothness may be enforced explicitly, by fixing the number of parameters in the model, or by augmenting the cost function as in Tikhonov regularization. Tikhonov regularization, along with principal component regression and many other regularization schemes, fall under the umbrella of spectral regularization, regularization characterized by the application of a filter. Early stopping also belongs to this class of methods.

Gradient descent methods

Gradient descent methods are first-order, iterative, optimization methods. Each iteration updates an approximate solution to the optimization problem by taking a step in the direction of the negative of the gradient of the objective function. By choosing the step-size appropriately, such a method can be made to converge to a local minimum of the objective function. Gradient descent is used in machine-learning by defining a loss function that reflects the error of the learner on the training set and then minimizing that function.

Early stopping based on analytical results

Early stopping in statistical learning theory

Early-stopping can be used to regularize non-parametric regression problems encountered in machine learning. For a given input space, , output space, , and samples drawn from an unknown probability measure, , on , the goal of such problems is to approximate a regression function, , given by

where is the conditional distribution at induced by .[2] One common choice for approximating the regression function is to use functions from a reproducing kernel Hilbert space.[2] These spaces can be infinite dimensional, in which they can supply solutions that overfit training sets of arbitrary size. Regularization is, therefore, especially important for these methods. One way to regularize non-parametric regression problems is to apply an early stopping rule to an iterative procedure such as gradient descent.

The early stopping rules proposed for these problems are based on analysis of upper bounds on the generalization error as a function of the iteration number. They yield prescriptions for the number of iterations to run that can be computed prior to starting the solution process.[3] [4]

Example: Least-squares loss

(Adapted from Yao, Rosasco and Caponnetto, 2007[3])

Let and Given a set of samples

drawn independently from , minimize the functional

where, is a member of the reproducing kernel Hilbert space . That is, minimize the expected risk for a Least-squares loss function. Since depends on the unknown probability measure , it cannot be used for computation. Instead, consider the following empirical risk

Let and be the t-th iterates of gradient descent applied to the expected and empirical risks, respectively, where both iterations are initialized at the origin, and both use the step size . The form the population iteration, which converges to , but cannot be used in computation, while the form the sample iteration which usually converges to an overfitting solution.

We want to control the difference between the expected risk of the sample iteration and the minimum expected risk, that is, the expected risk of the regression function:

This difference can be rewritten as the sum of two terms: the difference in expected risk between the sample and population iterations and that between the population iteration and the regression function:

This equation presents a bias-variance tradeoff, which is then solved to give an optimal stopping rule that may depend on the unknown probability distribution. That rule has associated probabilistic bounds on the generalization error. For the analysis leading to the early stopping rule and bounds, the reader is referred to the original article.[3] In practice, data-driven methods, e.g. cross-validation can be used to obtain an adaptive stopping rule.

Early stopping in boosting

Boosting refers to a family of algorithms in which a set of weak learners (learners that are only slightly correlated with the true process) are combined to produce a strong learner. It has been shown, for several boosting algorithms (including AdaBoost), that regularization via early stopping can provide guarantees of consistency, that is, that the result of the algorithm approaches the true solution as the number of samples goes to infinity.[5] [6] [7] [8]

L2-boosting

Boosting methods have close ties to the gradient descent methods described above can be regarded as a boosting method based on the loss: L2Boost.[3]

Validation-based early stopping

These early stopping rules work by splitting the original training set into a new training set and a validation set. The error on the validation set is used as a proxy for the generalization error in determining when overfitting has begun. These methods are employed in the training of many iterative machine learning algorithms including neural networks. Prechelt gives the following summary of a naive implementation of holdout-based early stopping as follows:[9]

  1. Split the training data into a training set and a validation set, e.g. in a 2-to-1 proportion.
  2. Train only on the training set and evaluate the per-example error on the validation set once in a while, e.g. after every fifth epoch.
  3. Stop training as soon as the error on the validation set is higher than it was the last time it was checked.
  4. Use the weights the network had in that previous step as the result of the training run.
    — Lutz Prechelt, Early Stopping – But When?

Cross-validation is an alternative that is applicable to non time-series scenarios. Cross-validation involves splitting multiple partitions of the data into training set and validation set – instead of a single partition into a training set and validation set. Even this simple procedure is complicated in practice by the fact that the validation error may fluctuate during training, producing multiple local minima. This complication has led to the creation of many ad hoc rules for deciding when overfitting has truly begun.[9]

See also

References

  1. ^ Girosi, Federico; Michael Jones; Tomaso Poggio (1995-03-01). "Regularization Theory and Neural Networks Architectures". Neural Computation. 7 (2): 219–269. CiteSeerX 10.1.1.48.9258. doi:10.1162/neco.1995.7.2.219. ISSN 0899-7667. S2CID 49743910.
  2. ^ a b Smale, Steve; Ding-Xuan Zhou (2007-08-01). "Learning Theory Estimates via Integral Operators and Their Approximations". Constructive Approximation. 26 (2): 153–172. CiteSeerX 10.1.1.210.722. doi:10.1007/s00365-006-0659-y. ISSN 0176-4276. S2CID 5977083.
  3. ^ a b c d Yao, Yuan; Lorenzo Rosasco; Andrea Caponnetto (2007-08-01). "On Early Stopping in Gradient Descent Learning". Constructive Approximation. 26 (2): 289–315. CiteSeerX 10.1.1.329.2482. doi:10.1007/s00365-006-0663-2. ISSN 0176-4276. S2CID 8323954.
  4. ^ Raskutti, G.; M.J. Wainwright; Bin Yu (2011). "Early stopping for non-parametric regression: An optimal data-dependent stopping rule". 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton). 2011 49th Annual Allerton Conference on Communication, Control, and Computing (Allerton). pp. 1318–1325. doi:10.1109/Allerton.2011.6120320.
  5. ^ Wenxin Jiang (February 2004). "Process consistency for AdaBoost". The Annals of Statistics. 32 (1): 13–29. doi:10.1214/aos/1079120128. ISSN 0090-5364.
  6. ^ Bühlmann, Peter; Bin Yu (2003-06-01). "Boosting with the L₂ Loss: Regression and Classification". Journal of the American Statistical Association. 98 (462): 324–339. doi:10.1198/016214503000125. ISSN 0162-1459. JSTOR 30045243. S2CID 123059267.
  7. ^ Tong Zhang; Bin Yu (2005-08-01). "Boosting with Early Stopping: Convergence and Consistency". The Annals of Statistics. 33 (4): 1538–1579. arXiv:math/0508276. Bibcode:2005math......8276Z. doi:10.1214/009053605000000255. ISSN 0090-5364. JSTOR 3448617. S2CID 13158356.
  8. ^ Stankewitz, Bernhard (2024-04-01). "Early stopping for L2-boosting in high-dimensional linear models". The Annals of Statistics. 52 (2): 491–518. arXiv:2210.07850. doi:10.1214/24-AOS2356.
  9. ^ a b Prechelt, Lutz; Geneviève B. Orr (2012-01-01). "Early Stopping — But When?". In Grégoire Montavon; Klaus-Robert Müller (eds.). Neural Networks: Tricks of the Trade. Lecture Notes in Computer Science. Springer Berlin Heidelberg. pp. 53–67. doi:10.1007/978-3-642-35289-8_5. ISBN 978-3-642-35289-8.

Read other articles:

Seorang kimiawan sedang mendinginkan air menggunakan kondensor dalam proses alkimia. Kondensor atau pendingin balik adalah peralatan gelas laboratorium yang digunakan untuk proses repruk (pemanasan dengan pendingin balik) dalam proses distilasi. Alat ini dapat mengubah uap-uap yang masuk dari kondensor menjadi cairan dengan melalui proses pendinginan didalamnya, kemudian cairan tersebut akan keluar dari sisi lain kondensor. lbsPeralatan laboratoriumUmumPemanasPengeringPembakar alkohol ·...

 

 

Hyderabad ePrixSirkuit Jalan Raya HyderabadInformasi lombaPertama digelar2023Terbanyak menang (pembalap) Jean-Éric Vergne (1x)Terbanyak menang (konstruktor) DS Penske (1x)Panjang sirkuit2.835 km (1.761 mi)Balapan terakhir (2023)Pole position Mitch EvansJaguar1:13.228Podium 1. Jean-Éric VergneDS Penske46:01.099 2. Nick CassidyEnvision-Jaguar+0.400 3. Antonio Felix da CostaPorsche+1.859 Lap tercepat Nico MüllerABT-Mahindra1:14.656 e-Prix Hyderabad merupakan balapan mobil kursi tun...

 

 

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: SMA Santo Thomas 1 Medan – berita · surat kabar · buku · cendekiawan · JSTOR SMA Santo Thomas 1 MedanInformasiDidirikan16 Oktober 1955JenisSwasta Katolik RomaAkreditasiAJurusan atau peminatanIPA,IPSRenta...

Sheikh Mujib berpidato pada pertemuan pendirian BAKSAL. Bangladesh Krishak Sramik Awami League (bahasa Bengali: বাংলাদেশ কৃষক শ্রমিক আওয়ামী লীগ Bangladesh Krishôk Sromik Aoami Lig) adalah amalgamasi Liga Awami dengan Partai Krishak Sramik yang mendukung Presiden Sheikh Mujibur Rahman dalam memerintah Bangladesh. Partai ini didirikan pada Juni 1975 dan partai politik lain dilarang pada saat itu.[1] Partai ini menganjurkan so...

 

 

Halaman ini berisi artikel tentang film tahun 2020. Untuk film tahun 1962, lihat King Kong vs. Godzilla. Godzilla vs. KongPoster rilis bioskopSutradaraAdam WingardProduser Thomas Tull Jon Jashni Brian Rogers Mary Parent Alex Garcia Eric McLeod Ditulis oleh Eric Pearson Max Borenstein Cerita Terry Rossio Michael Dougherty Zach Shields Berdasarkan Godzilla dan Mechagodzillaoleh Toho King Kongoleh Edgar Wallace dan Merian C. Cooper Pemeran Alexander Skarsgård Millie Bobby Brown Rebecca Hall Bri...

 

 

أكراد سوريامعلومات عامةالشتات تعداد الكرد التعداد الكليالتعداد تقديرات من 1.6 مليون إلى 2.5 مليون[1][2][3][4][5]مناطق الوجود المميزةالبلد سوريا الحسكة، القامشلي، عامودا، عين العرب، عفريناللغات أساسا الكردية (كرمنجي)، ولكن أيضا العربية والسريانيةالدين أهل �...

State-owned enterprises owned by the Crown, the Sovereign of Canada Monarchy of Canada This article is part of a series Canadian monarchical history Sovereigns of Canada and their consorts Monarchy by provinces BC AB SK MB ON QC NB NS PE NL Viceroys Governor General List of governors general Lieutenant governors LGs by provinces BC AB SK MB ON QC NB NS PE NL Residences and household Rideau Hall La Citadelle Provincial residences or offices BC AB SK MB ON QC NB NS PE NL Secretary to the King S...

 

 

Pour les articles homonymes, voir DPJ. Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article ne cite pas suffisamment ses sources (juillet 2021). Si vous disposez d'ouvrages ou d'articles de référence ou si vous connaissez des sites web de qualité traitant du thème abordé ici, merci de compléter l'article en donnant les références utiles à sa vérifiabilité et en les liant à la section « Notes et références ». En pratique :...

 

 

FusionArts MuseumLocationManhattan, New York, United States; Prague, Czech Republic; Easton, Pennsylvania, United StatesDirectorDeborah FriesWebsiteFusionArts Museum FusionArts Museum(s), first founded at 57 Stanton Street on Manhattan's Lower East Side are a series of curated exhibition spaces dedicated to the exhibition and archiving of fusion art. The museum was and remains at its successive locations a not-for-profit gallery operated by Converging Arts Media Organization, a not-for-profi...

Ilustrasi dari Ensiklopedia Yahudi Brockhaus dan Efron (1906–1913) Bagian dari serial tentangAgama Yahudi Mazhab Ortodoks (HarediHasidikModern) Konservatif Reformasi (Klasik) Karaite Rekonstruksionis Pembaharuan Humanistik Haymanot Filsafat Prinsip kepercayaan Kabbalah Mesias Etik Bangsa pilihan Nama-nama Tuhan Gerakan Musar Pustaka Tanakh (TorahNevi'imKetuvim) Ḥumash Siddur Piyutim Zohar Rabbinik (TalmudMidrashTosefta) Syariat Taurat Mishneh Tur Shulchan Aruch Mishnah Berurah Aruch ...

 

 

У этого термина существуют и другие значения, см. Юань. Юань Командования Красной армии紅軍圓 / 紅軍票 (кит. трад.)Hóngjūnyuán / Hóngjūnpiào (пиньинь) 100 юаней 1946 года с контрольной маркой Территория обращения Страна-эмитент  СССР Официально Северо-Восточный Китай Монеты и бан�...

 

 

Pete Kennedy redirects here. For other uses, see Peter Kennedy. This biography of a living person needs additional citations for verification. Please help by adding reliable sources. Contentious material about living persons that is unsourced or poorly sourced must be removed immediately from the article and its talk page, especially if potentially libelous.Find sources: The Kennedys band – news · newspapers · books · scholar · JSTOR (August 2021)...

NGC 1Citra Galaksi NGC 1 oleh SDSSData pengamatan (J2000 epos)Rasi bintangPegasusAsensio rekta 00j 07m 15.86d[1]Deklinasi +27° 42′ 29.7″[1]Pergeseran merah0.015177[1]Kecepatan radial helio4550 ± 1 km/s[1]Jarak206 ± 29 Mly(63.2 ± 9 Mpc)[2]Magnitudo semu (V)13.65[1]Ciri-ciriJenisSA(s)b[1]Ukuran semu (V)1'.549 x 1'.023Penamaan lainUGC 57, PGC 564, Holm 2A, Z 477.54, MCG +4-1-25, IRAS 00047+272...

 

 

Aeroland Airways IATA ICAO Callsign 3S AEN AEROLAND Founded2005Ceased operations2012AOC #GR-022[1]HubsAthens International AirportFleet size5HeadquartersAthens, GreeceWebsitehttps://web.archive.org/web/20080602174050/http://www.aeroland.gr:80/ Aeroland Airways was a Greek charter airline based in Athens, Greece. It operated cargo flights between Athens and several domestic destinations. Aeroland was founded in 2005 and suspended operations in late 2012. SX-ARW is one of the four ...

 

 

Galatasaray 1923–24 football seasonGalatasaray1923–24 seasonPresident Yusuf Ziya ÖnişManager Adil GirayStadiumTaksim StadıIstanbul Lig2nd Home colours Away colours ← 1922–231924–25 → The 1923–24 season was Galatasaray SK's 20th in existence and the club's 14th consecutive season in the Istanbul Football League. Nihat Bekdik and Slavia Prague Captain Squad statistics No. Pos. Name IFL Total Apps Goals Apps Goals - GK Nüzhet Abbas Öniş 5 0 5 0 - GK Adil Gir...

For other uses, see 23rd Street. Port Authority Trans-Hudson rail station 23rd Street PATH rapid transit stationThe New Jersey-bound platform at 23rd Street in August 2014.General informationLocation23rd Street and Sixth AvenueManhattan, New YorkCoordinates40°44′34″N 73°59′34″W / 40.742893°N 73.992865°W / 40.742893; -73.992865Owned byPort Authority of New York and New JerseyLine(s)Uptown Hudson TubesPlatforms2 side platformsTracks2Connections New York ...

 

 

Part of a series on theCarbon cycle By regions Terrestrial Marine Atmospheric Deep carbon Soil Permafrost Boreal forest Geochemistry Carbon dioxide In the atmosphere Ocean acidification Removal Satellite measurements Forms of carbon Total carbon (TC) Total organic carbon (TOC) Total inorganic carbon (TIC) Dissolved organic carbon (DOC) Dissolved inorganic carbon (DIC) Particulate organic carbon (POC) Particulate inorganic carbon (PIC) Primary production marine Black carbon Blue carbon Kerogen...

 

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Lists of oldest cricketers – news · newspapers · books · scholar · JSTOR (December 2015) (Learn how and when to remove this message) This is a set of lists of the oldest Test and first-class cricketers. Oldest living Test cricketers Name Country Date of birth ...

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (مارس 2021) استخدام الموارد الطبيعية في تنزانيا الموارد الطبيعية الرئيسية في تنزانيا هي الأرض والأنهار والبحيرات والمحيط والغابات. تستخدم الموارد الطبيعية لزراعة المح�...

 

 

此條目没有列出任何参考或来源。 (2013年10月28日)維基百科所有的內容都應該可供查證。请协助補充可靠来源以改善这篇条目。无法查证的內容可能會因為異議提出而被移除。 雷克雷尤Recreio市镇雷克雷尤在巴西的位置坐标:21°31′30″S 42°28′08″W / 21.525°S 42.4689°W / -21.525; -42.4689国家巴西州米纳斯吉拉斯州面积 • 总计234.24 平方公里(90.44 平...