Energy distance

Energy distance is a statistical distance between probability distributions. If X and Y are independent random vectors in Rd with cumulative distribution functions (cdf) F and G respectively, then the energy distance between the distributions F and G is defined to be the square root of

where (X, X', Y, Y') are independent, the cdf of X and X' is F, the cdf of Y and Y' is G, is the expected value, and || . || denotes the length of a vector. Energy distance satisfies all axioms of a metric thus energy distance characterizes the equality of distributions: D(F,G) = 0 if and only if F = G. Energy distance for statistical applications was introduced in 1985 by Gábor J. Székely, who proved that for real-valued random variables is exactly twice Harald Cramér's distance:[1]

For a simple proof of this equivalence, see Székely (2002).[2]

In higher dimensions, however, the two distances are different because the energy distance is rotation invariant while Cramér's distance is not. (Notice that Cramér's distance is not the same as the distribution-free Cramér–von Mises criterion.)

Generalization to metric spaces

One can generalize the notion of energy distance to probability distributions on metric spaces. Let be a metric space with its Borel sigma algebra . Let denote the collection of all probability measures on the measurable space . If μ and ν are probability measures in , then the energy-distance of μ and ν can be defined as the square root of

This is not necessarily non-negative, however. If is a strongly negative definite kernel, then is a metric, and conversely.[3] This condition is expressed by saying that has negative type. Negative type is not sufficient for to be a metric; the latter condition is expressed by saying that has strong negative type. In this situation, the energy distance is zero if and only if X and Y are identically distributed. An example of a metric of negative type but not of strong negative type is the plane with the taxicab metric. All Euclidean spaces and even separable Hilbert spaces have strong negative type.[4]

In the literature on kernel methods for machine learning, these generalized notions of energy distance are studied under the name of maximum mean discrepancy. Equivalence of distance based and kernel methods for hypothesis testing is covered by several authors.[5][6]

Energy statistics

A related statistical concept, the notion of E-statistic or energy-statistic[7] was introduced by Gábor J. Székely in the 1980s when he was giving colloquium lectures in Budapest, Hungary and at MIT, Yale, and Columbia. This concept is based on the notion of Newton’s potential energy.[8] The idea is to consider statistical observations as heavenly bodies governed by a statistical potential energy which is zero only when an underlying statistical null hypothesis is true. Energy statistics are functions of distances between statistical observations.

Energy distance and E-statistic were considered as N-distances and N-statistic in Zinger A.A., Kakosyan A.V., Klebanov L.B. Characterization of distributions by means of mean values of some statistics in connection with some probability metrics, Stability Problems for Stochastic Models. Moscow, VNIISI, 1989,47-55. (in Russian), English Translation: A characterization of distributions by mean values of statistics and certain probabilistic metrics A. A. Zinger, A. V. Kakosyan, L. B. Klebanov in Journal of Soviet Mathematics (1992). In the same paper there was given a definition of strongly negative definite kernel, and provided a generalization on metric spaces, discussed above. The book[3] gives these results and their applications to statistical testing as well. The book contains also some applications to recovering the measure from its potential.

Testing for equal distributions

Consider the null hypothesis that two random variables, X and Y, have the same probability distributions: . For statistical samples from X and Y:

and ,

the following arithmetic averages of distances are computed between the X and the Y samples:

.

The E-statistic of the underlying null hypothesis is defined as follows:

One can prove[8][9] that and that the corresponding population value is zero if and only if X and Y have the same distribution (). Under this null hypothesis the test statistic

converges in distribution to a quadratic form of independent standard normal random variables. Under the alternative hypothesis T tends to infinity. This makes it possible to construct a consistent statistical test, the energy test for equal distributions.[10]

The E-coefficient of inhomogeneity can also be introduced. This is always between 0 and 1 and is defined as

where denotes the expected value. H = 0 exactly when X and Y have the same distribution.

Goodness-of-fit

A multivariate goodness-of-fit measure is defined for distributions in arbitrary dimension (not restricted by sample size). The energy goodness-of-fit statistic is

where X and X' are independent and identically distributed according to the hypothesized distribution, and . The only required condition is that X has finite moment under the null hypothesis. Under the null hypothesis , and the asymptotic distribution of Qn is a quadratic form of centered Gaussian random variables. Under an alternative hypothesis, Qn tends to infinity stochastically, and thus determines a statistically consistent test. For most applications the exponent 1 (Euclidean distance) can be applied. The important special case of testing multivariate normality[9] is implemented in the energy package for R. Tests are also developed for heavy tailed distributions such as Pareto (power law), or stable distributions by application of exponents in (0,1).

Applications

Applications include:

Gneiting and Raftery[19] apply energy distance to develop a new and very general type of proper scoring rule for probabilistic predictions, the energy score.
  • Robust statistics[20]
  • Scenario reduction[21]
  • Gene selection[22]
  • Microarray data analysis[23]
  • Material structure analysis[24]
  • Morphometric and chemometric data[25]

Applications of energy statistics are implemented in the open source energy package[26] for R.

References

  1. ^ Cramér, H. (1928) On the composition of elementary errors, Skandinavisk Aktuarietidskrift, 11, 141–180.
  2. ^ E-Statistics: The energy of statistical samples (2002) PDF Archived 2016-04-20 at the Wayback Machine
  3. ^ a b Klebanov, L. B. (2005) N-distances and their Applications, Karolinum Press, Charles University, Prague.
  4. ^ Lyons, R. (2013). "Distance Covariance in Metric Spaces". The Annals of Probability. 41 (5): 3284–3305. arXiv:1106.5758. doi:10.1214/12-aop803. S2CID 73677891.
  5. ^ Sejdinovic, D.; Sriperumbudur, B.; Gretton, A. & Fukumizu, K. (2013). "Equivalence of distance-based and RKHS-based statistics in hypothesis testing". The Annals of Statistics. 41 (5): 2263–2291. arXiv:1207.6076. doi:10.1214/13-aos1140. S2CID 8308769.
  6. ^ Shen, Cencheng; Vogelstein, Joshua T. (2021). "The exact equivalence of distance and kernel methods in hypothesis testing". AStA Advances in Statistical Analysis. 105 (3): 385–403. arXiv:1806.05514. doi:10.1007/s10182-020-00378-1. S2CID 49210956.
  7. ^ G. J. Szekely and M. L. Rizzo (2013). Energy statistics: statistics based on distances. Journal of Statistical Planning and Inference Volume 143, Issue 8, August 2013, pp. 1249-1272. [1]
  8. ^ a b Székely, G.J. (2002) E-statistics: The Energy of Statistical Samples, Technical Report BGSU No 02-16.
  9. ^ a b c Székely, G. J.; Rizzo, M. L. (2005). "A new test for multivariate normality". Journal of Multivariate Analysis. 93 (1): 58–80. doi:10.1016/j.jmva.2003.12.002. Reprint Archived 2011-08-05 at the Wayback Machine
  10. ^ G. J. Szekely and M. L. Rizzo (2004). Testing for Equal Distributions in High Dimension, InterStat, Nov. (5). Reprint Archived 2011-08-05 at the Wayback Machine.
  11. ^ Székely, G. J. and Rizzo, M. L. (2005) Hierarchical Clustering via Joint Between-Within Distances: Extending Ward's Minimum Variance Method, Journal of Classification, 22(2) 151–183
  12. ^ Varin, T., Bureau, R., Mueller, C. and Willett, P. (2009). "Clustering files of chemical structures using the Szekely-Rizzo generalization of Ward's method" (PDF). Journal of Molecular Graphics and Modelling. 28 (2): 187–195. doi:10.1016/j.jmgm.2009.06.006. PMID 19640752.{{cite journal}}: CS1 maint: multiple names: authors list (link) "eprint".
  13. ^ M. L. Rizzo and G. J. Székely (2010). DISCO Analysis: A Nonparametric Extension of Analysis of Variance, Annals of Applied Statistics Vol. 4, No. 2, 1034–1055. arXiv:1011.2288
  14. ^ Szekely, G. J. and Rizzo, M. L. (2004) Testing for Equal Distributions in High Dimension, InterStat, Nov. (5). Reprint Archived 2011-08-05 at the Wayback Machine.
  15. ^ Ledlie, Jonathan and Pietzuch, Peter and Seltzer, Margo (2006). "Stable and Accurate Network Coordinates". 26th IEEE International Conference on Distributed Computing Systems (ICDCS'06). ICDCS '06. Washington, DC, USA: IEEE Computer Society. pp. 74–83. CiteSeerX 10.1.1.68.4006. doi:10.1109/ICDCS.2006.79. ISBN 978-0-7695-2540-2. PMID 1154085. S2CID 6770731. {{cite book}}: |journal= ignored (help)CS1 maint: multiple names: authors list (link) PDF Archived 2011-07-08 at the Wayback Machine
  16. ^ Albert Y. Kim; Caren Marzban; Donald B. Percival; Werner Stuetzle (2009). "Using labeled data to evaluate change detectors in a multivariate streaming environment". Signal Processing. 89 (12): 2529–2536. Bibcode:2009SigPr..89.2529K. CiteSeerX 10.1.1.143.6576. doi:10.1016/j.sigpro.2009.04.011. ISSN 0165-1684. [2] Preprint:TR534.
  17. ^ Székely, G. J., Rizzo M. L. and Bakirov, N. K. (2007). "Measuring and testing independence by correlation of distances", The Annals of Statistics, 35, 2769–2794. arXiv:0803.4101
  18. ^ Székely, G. J. and Rizzo, M. L. (2009). "Brownian distance covariance", The Annals of Applied Statistics, 3/4, 1233–1308. arXiv:1010.0297
  19. ^ T. Gneiting; A. E. Raftery (2007). "Strictly Proper Scoring Rules, Prediction, and Estimation". Journal of the American Statistical Association. 102 (477): 359–378. doi:10.1198/016214506000001437. S2CID 1878582. Reprint
  20. ^ Klebanov L.B. A class of Probability Metrics and its Statistical Applications, Statistics in Industry and Technology: Statistical Data Analysis, Yadolah Dodge, Ed. Birkhauser, Basel, Boston, Berlin, 2002,241-252.
  21. ^ F. Ziel (2021). "The energy distance for ensemble and scenario reduction". Philosophical Transactions of the Royal Society A. 379 (2202): 20190431. arXiv:2005.14670. Bibcode:2021RSPTA.37990431Z. doi:10.1098/rsta.2019.0431. ISSN 1364-503X. PMID 34092100. S2CID 219124032.
  22. ^ Statistics and Data Analysis, 2006, 50, 12, 3619-3628Rui Hu, Xing Qiu, Galina Glazko, Lev Klebanov, Andrei Yakovlev Detecting intergene correlation changes in microarray analysis: a new approach to gene selection, BMCBioinformatics, Vol.10, 20 (2009), 1-15.
  23. ^ Yuanhui Xiao, Robert Frisina, Alexander Gordon, Lev Klebanov, Andrei Yakovlev Multivariate Search for Differentially Expressed Gene Combinations BMC Bioinformatics, 2004, 5:164; Antoni Almudevar, Lev Klebanov, Xing Qiu, Andrei Yakovlev Utility of correlation measures in analysis of gene expression, In: NeuroRX, 2006, 3, 3, 384-395; Klebanov Lev, Gordon Alexander, Land Hartmut, Yakovlev Andrei A permutation test motivated by microarray data analysis
  24. ^ Viktor Benes, Radka Lechnerova, Lev Klebanov, Margarita Slamova, Peter Slama Statistical comparison of the geometry of second-phase particles, Materials Characterization, Vol. 60 (2009 ), 1076 - 1081.
  25. ^ E. Vaiciukynas, A. Verikas, A. Gelzinis, M. Bacauskiene, and I. Olenina (2015) Exploiting statistical energy test for comparison of multiple groups in morphometric and chemometric data, Chemometrics and Intelligent Laboratory Systems, 146, 10-23.
  26. ^ "energy: R package version 1.6.2". Retrieved 30 January 2015.

Read other articles:

Overview of video games in the Netherlands The Netherlands' mainstream video games market, not taking into consideration the serious and casual games, is the sixth largest in Europe. In 2008, the Dutch market took up 3.95% of the entire European market in total sales and 4.19% in software sales.[1] A significant part of the Netherlands' gaming industry is in serious games, in which Dutch companies make a significant part of the worldwide industry.[2][3] In the Netherla...

 

 

Indian Premier League Twenty20 cricket team Pune Warriors IndiaLeagueIndian Premier LeaguePersonnelCaptainSourav GangulyCoachGeoff MarshOwnerSubrata RoyTeam informationCityPune, Maharashtra, IndiaFounded5 September 2010; 13 years ago (5 September 2010)Dissolved26 October 2013; 10 years ago (26 October 2013)Home groundDY Patil Stadium (2011)[1]Maharashtra Cricket Association Stadium (2012–2013) Pune Warriors India were a franchise Twenty20 cricket team that ...

 

 

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Oktober 2016. Halaman ini berisi artikel tentang topan 2012. Untuk topan lain dengan nama yang sama, lihat Topan Vicente (disambiguasi). Topan Vicente (Ferdie)Taifun (skala JMA)Taifun kategori 4 (SSHWS)Topan Vicente melakukan pendaratan di Guangdong, Tiongkok p...

Place in Central Uganda, UgandaNtindaNtindaMap of Kampala showing the location of Ntinda.Coordinates: 00°21′18″N 32°36′52″E / 0.35500°N 32.61444°E / 0.35500; 32.61444Country UgandaRegionCentral UgandaDistrictKampala Capital City AuthorityDivisionNakawa DivisionElevation1,200 m (3,900 ft)Time zoneUTC+3 (EAT) Ntinda is a location in northeastern Kampala, the capital city of Uganda. Location Ntinda lies in Nakawa Division, one of the five admini...

 

 

العلاقات البولندية المارشالية بولندا جزر مارشال   بولندا   جزر مارشال تعديل مصدري - تعديل   العلاقات البولندية المارشالية هي العلاقات الثنائية التي تجمع بين بولندا وجزر مارشال.[1][2][3][4][5] مقارنة بين البلدين هذه مقارنة عامة ومرجعية للدولتين...

 

 

Капельная кофеварка Кофеварка — устройство для приготовления кофе. Содержание 1 Виды 1.1 Кофеварка по-турецки 1.2 Капельная 1.3 Гейзерная 1.4 Капсульная 1.5 Эспрессо кофемашина (рожковая) 2 См. также 3 Литература Виды Капельные неаполитанские кофеварки (электрическая с керам...

† Человек прямоходящий Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:ЧетвероногиеКлада:АмниотыКлада:Синапсиды�...

 

 

Hospital in Gauteng, South AfricaHelen Joseph HospitalGeographyLocationAuckland Park, Johannesburg, Gauteng, South AfricaOrganisationFundingPublic hospitalTypeTeachingAffiliated universityUniversity of WitwatersrandNetworkGauteng Department of HealthHistoryOpened1967LinksWebsiteHelen Joseph HospitalListsHospitals in South Africa Helen Joseph Hospital is a public hospital based in Auckland Park, Johannesburg, South Africa. Prior to 1997, it was known as the J.G. Strijdom Hospital. As a teachin...

 

 

Statistical regionGorizia Statistical Region goriška statistična regijaStatistical regionMunicipalities13Largest cityNova GoricaArea • Total2,325 km2 (898 sq mi)Population (2020) • Total118,041 • Density51/km2 (130/sq mi)Statistics • Households46255 • Employed39307 • Registered unemployed5722 • College/university students5136 • Regional GDP (2019):EUR 2,443 bn(EUR 20,707 ...

Antibiotic medication GentamicinClinical dataPronunciation/ˌdʒɛntəˈmaɪsən/ Trade namesCidomycin, Genticyn, Garamycin, othersAHFS/Drugs.comMonographMedlinePlusa682275License data US DailyMed: Gentamicin Pregnancycategory AU: D[1] Routes ofadministrationIntravenous, eye drop, Intramuscular injection, Topical administration, ear dropDrug classAminoglycoside antibioticATC codeD06AX07 (WHO) J01GB03 (WHO) S01AA11 (WHO) S02AA14 (WHO) S03AA0...

 

 

Cet article est une ébauche concernant l’architecture ou l’urbanisme, le sport et la Suisse. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Halle Saint-JacquesGénéralitésAdresse St. Jakobs-Strasse 3904052 Bâle, SuisseConstruction et ouvertureDébut de construction 19 avril 1971Ouverture 26 septembre 1976Architecte Giovanni PanozzoRénovation 2015-2018UtilisationClubs résidents Hockey Club Bâle (1976 ...

 

 

This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article's factual accuracy may be compromised due to out-of-date information. Please help update this article to reflect recent events or newly available information. (January 2013) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may ...

American college football season 1970 Stanford Indians footballPac-8 championRose Bowl championRose Bowl, W 27–17 vs. Ohio StateConferencePacific-8 ConferenceRankingCoachesNo. 10APNo. 8Record9–3 (6–1 Pac-8)Head coachJohn Ralston (8th season)Home stadiumStanford Stadium (c. 85,500, grass)Seasons← 19691971 → 1970 Pacific-8 Conference football standings vte Conf Overall Team W   L   T W   L   T No. 8 Stanford $ 6 – 1 ̵...

 

 

2020年夏季奥林匹克运动会波兰代表團波兰国旗IOC編碼POLNOC波蘭奧林匹克委員會網站olimpijski.pl(英文)(波兰文)2020年夏季奥林匹克运动会(東京)2021年7月23日至8月8日(受2019冠状病毒病疫情影响推迟,但仍保留原定名称)運動員206參賽項目24个大项旗手开幕式:帕维尔·科热尼奥夫斯基(游泳)和马娅·沃什乔夫斯卡(自行车)[1]闭幕式:卡罗利娜·纳亚(皮划艇)&#...

 

 

ヨハネス12世 第130代 ローマ教皇 教皇就任 955年12月16日教皇離任 964年5月14日先代 アガペトゥス2世次代 レオ8世個人情報出生 937年スポレート公国(中部イタリア)スポレート死去 964年5月14日 教皇領、ローマ原国籍 スポレート公国親 父アルベリーコ2世(スポレート公)、母アルダその他のヨハネステンプレートを表示 ヨハネス12世(Ioannes XII、937年 - 964年5月14日)は、ロ...

Epik är, tillsammans med lyrik och dramatik, en av de tre klassiska litteraturgenrerna med anor från forntiden och antiken. Epik är detsamma som berättande litteratur, till skillnad från lyrik, som är en enskild människas subjektiva beskrivning av något, och dramatik som är avsedd att framföras av skådespelare snarare än att återberättas ordagrant eller läsas i enskildhet. Till epiken hör romaner och noveller. Historia Den äldsta epiken tillhör genren epos, längre berättan...

 

 

تحتاج هذه المقالة إلى الاستشهاد بمصادر إضافية لتحسين وثوقيتها. فضلاً ساهم في تطوير هذه المقالة بإضافة استشهادات من مصادر موثوق بها. من الممكن التشكيك بالمعلومات غير المنسوبة إلى مصدر وإزالتها. (ديسمبر 2018) جزء من سلسلة مقالات حولالإسلام حسب البلد الإسلام في إفريقيا أنغولا �...

 

 

English actor (1927–2013) Not to be confused with Arthur Maulet or Arthur Malette. This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article's lead section may be too short to adequately summarize the key points. Please consider expanding the lead to provide an accessible overview of all important aspects of the article. (January 2022) This article needs additional citations for veri...

1844 Paris newspaper Not to be confused with the Jewish-American weekly newspaper Forverts (The Forward), or Vorwärts, the main journal of the Social Democratic Party of Germany, or Vorwärts! Vorwärts!, the Nazi-youth song. Vorwärts! front page, 10 June 1844 Vorwärts! (German pronunciation: [ˈfɔʁvɛʁts], Forward!) was a biweekly newspaper published in Paris from January to December 1844.[1][2] The journal was seen as the most radical in contemporary Europe. ...

 

 

House in Boca Grande, Florida Housing in Florida consists of apartments, condominiums, hotels, retirement communities, and houses. Common types of housing in the state include Cracker style homes, Ranch-style homes, Caribbean style homes, and Condominiums with styles including Spanish Colonial Revival architecture, Victorian architecture, Mediterranean Revival architecture, Art Deco, Modern architecture, and Pueblo Revival architecture. Types Hotels In 2020, there were 3,903 hotel properties ...