Earth mover's distance

In computer science, the earth mover's distance (EMD)[1] is a measure of dissimilarity between two frequency distributions, densities, or measures, over a metric space D. Informally, if the distributions are interpreted as two different ways of piling up earth (dirt) over D, the EMD captures the minimum cost of building the smaller pile using dirt taken from the larger, where cost is defined as the amount of dirt moved multiplied by the distance over which it is moved.

Over probability distributions, the earth mover's distance is also known as the Wasserstein metric , KantorovichRubinstein metric, or Mallows's distance.[2] It is the solution of the optimal transport problem, which in turn is also known as the Monge-Kantorovich problem, or sometimes the HitchcockKoopmans transportation problem;[3] when the measures are uniform over a set of discrete elements, the same optimization problem is known as minimum weight bipartite matching.

Formal definitions

The EMD between probability distributions and can be defined as an infimum over joint probabilities:

where is the set of all joint distributions whose marginals are and .

By Kantorovich-Rubinstein duality, this can also be expressed as:

where the supremum is taken over all 1-Lipschitz continuous functions, i.e. .

EMD between signatures

In some applications, it is convenient to represent a distribution as a signature, or a collection of clusters, where the -th cluster represents a feature of mass centered at . In this formulation, consider signatures and . Let be the ground distance between clusters and . Then the EMD between and is given by the optimal flow , with the flow between and , that minimizes the overall cost.

subject to the constraints:

The optimal flow is found by solving this linear optimization problem. The earth mover's distance is defined as the work normalized by the total flow:

Variants and extensions

Unequal probability mass

Some applications may require the comparison of distributions with different total masses. One approach is to allow for partial matching,[1] where dirt from the more massive distribution is rearranged to make the less massive, and any leftover "dirt" is discarded at no cost. Formally, let be the total weight of , and be the total weight of . We have:

where is the set of all measures whose projections are and . Note that this generalization of EMD is not a true distance between distributions, as it does not satisfy the triangle inequality.

An alternative approach is to allow for mass to be created or destroyed, on a global or local level, as an alternative to transportation, but with a cost penalty. In that case one must specify a real parameter , the ratio between the cost of creating or destroying one unit of "dirt", and the cost of transporting it by a unit distance. This is equivalent to minimizing the sum of the earth moving cost plus times the L1 distance between the rearranged pile and the second distribution. The resulting measure is a true distance function.[4]

More than two distributions

The EMD can be extended naturally to the case where more than two distributions are compared. In this case, the "distance" between the many distributions is defined as the optimal value of a linear program. This generalized EMD may be computed exactly using a greedy algorithm, and the resulting functional has been shown to be Minkowski additive and convex monotone.[5]

Computing the EMD

The EMD can be computed by solving an instance of transportation problem, using any algorithm for minimum-cost flow problem, e.g. the network simplex algorithm.

The Hungarian algorithm can be used to get the solution if the domain D is the set {0, 1}. If the domain is integral, it can be translated for the same algorithm by representing integral bins as multiple binary bins.

As a special case, if D is a one-dimensional array of "bins" of length n, the EMD can be efficiently computed by scanning the array and keeping track of how much dirt needs to be transported between consecutive bins. Here the bins are zero-indexed:

EMD-based similarity analysis

EMD-based similarity analysis (EMDSA) is an important and effective tool in many multimedia information retrieval[6] and pattern recognition[7] applications. However, the computational cost of EMD is super-cubic to the number of the "bins" given an arbitrary "D". Efficient and scalable EMD computation techniques for large scale data have been investigated using MapReduce,[8][9] as well as bulk synchronous parallel and resilient distributed dataset.[10]

Applications

An early application of the EMD in computer science was to compare two grayscale images that may differ due to dithering, blurring, or local deformations.[11] In this case, the region is the image's domain, and the total amount of light (or ink) is the "dirt" to be rearranged.

The EMD is widely used in content-based image retrieval to compute distances between the color histograms of two digital images.[citation needed] In this case, the region is the RGB color cube, and each image pixel is a parcel of "dirt". The same technique can be used for any other quantitative pixel attribute, such as luminance, gradient, apparent motion in a video frame, etc..

More generally, the EMD is used in pattern recognition to compare generic summaries or surrogates of data records called signatures.[1] A typical signature consists of list of pairs ((x1,m1), ... (xn,mn)), where each xi is a certain "feature" (e.g., color in an image, letter in a text, etc.), and mi is "mass" (how many times that feature occurs in the record). Alternatively, xi may be the centroid of a data cluster, and mi the number of entities in that cluster. To compare two such signatures with the EMD, one must define a distance between features, which is interpreted as the cost of turning a unit mass of one feature into a unit mass of the other. The EMD between two signatures is then the minimum cost of turning one of them into the other.

EMD analysis has been used for quantitating multivariate changes in biomarkers measured by flow cytometry, with potential applications to other technologies that report distributions of measurements.[12]

History

The concept was first introduced by Gaspard Monge in 1781,[13] in the context of transportation theory. The use of the EMD as a distance measure for monochromatic images was described in 1989 by S. Peleg, M. Werman and H. Rom.[11] The name "earth mover's distance" was proposed by J. Stolfi in 1994,[14] and was used in print in 1998 by Y. Rubner, C. Tomasi and L. G. Guibas.[1]

See also

References

  1. ^ a b c d Rubner, Y.; Tomasi, C.; Guibas, L.J. (1998). "A metric for distributions with applications to image databases". Sixth International Conference on Computer Vision (IEEE Cat. No.98CH36271). Narosa Publishing House. pp. 59–66. doi:10.1109/iccv.1998.710701. ISBN 81-7319-221-9. S2CID 18648233.
  2. ^ C. L. Mallows (1972). "A note on asymptotic joint normality". Annals of Mathematical Statistics. 43 (2): 508–515. doi:10.1214/aoms/1177692631.
  3. ^ Singiresu S. Rao (2009). Engineering Optimization: Theory and Practice (4th ed.). John Wiley & Sons. p. 221. ISBN 978-0-470-18352-6.
  4. ^ Pele, Ofir; Werman, Michael (2008). "A Linear Time Histogram Metric for Improved SIFT Matching". Computer Vision – ECCV 2008. Lecture Notes in Computer Science. Vol. 5304. Springer Berlin Heidelberg. pp. 495–508. doi:10.1007/978-3-540-88690-7_37. eISSN 1611-3349. ISBN 978-3-540-88689-1. ISSN 0302-9743.
  5. ^ Kline, Jeffery (2019). "Properties of the d-dimensional earth mover's problem". Discrete Applied Mathematics. 265: 128–141. doi:10.1016/j.dam.2019.02.042. S2CID 127962240.
  6. ^ Mark A. Ruzon; Carlo Tomasi (2001). "Edge, Junction, and Corner Detection Using Color Distributions". IEEE Transactions on Pattern Analysis and Machine Intelligence.
  7. ^ Kristen Grauman; Trevor Darrel (2004). "Fast contour matching using approximate earth mover's distance". Proceedings of CVPR 2004.
  8. ^ Jin Huang; Rui Zhang; Rajkumar Buyya; Jian Chen (2014). "MELODY-Join: Efficient Earth Mover's Distance Similarity Joins Using MapReduce". Proceedings of IEEE International Conference on Data Engineering.
  9. ^ Jia Xu; Bin Lei; Yu Gu; Winslett, M.; Ge Yu; Zhenjie Zhang (2015). "Efficient Similarity Join Based on Earth Mover's Distance Using MapReduce". IEEE Transactions on Knowledge and Data Engineering.
  10. ^ Jin Huang; Rui Zhang; Rajkumar Buyya; Jian Chen, M.; Yongwei Wu (2015). "Heads-Join: Efficient Earth Mover's Distance Join on Hadoop". IEEE Transactions on Parallel and Distributed Systems.
  11. ^ a b S. Peleg; M. Werman; H. Rom (1989). "A unified approach to the change of resolution: Space and gray-level". IEEE Transactions on Pattern Analysis and Machine Intelligence. 11 (7): 739–742. doi:10.1109/34.192468. S2CID 18415340.
  12. ^ Orlova, DY; Zimmerman, N; Meehan, C; Meehan, S; Waters, J; Ghosn, EEB (23 March 2016). "Earth Mover's Distance (EMD): A True Metric for Comparing Biomarker Expression Levels in Cell Populations". PLOS One. 11 (3): e0151859. Bibcode:2016PLoSO..1151859O. doi:10.1371/journal.pone.0151859. PMC 4805242. PMID 27008164.
  13. ^ "Mémoire sur la théorie des déblais et des remblais". Histoire de l'Académie Royale des Science, Année 1781, avec les Mémoires de Mathématique et de Physique. 1781.
  14. ^ J. Stolfi, personal communication to L. J. Guibas, 1994, as cited by Rubner, Yossi; Tomasi, Carlo; Guibas, Leonidas J. (2000). "The earth mover's distance as a metric for image retrieval" (PDF). International Journal of Computer Vision. 40 (2): 99–121. doi:10.1023/A:1026543900054. S2CID 14106275.

Read other articles:

1994 soundtrack album by Various artistsA Low Down Dirty Shame (soundtrack)Soundtrack album by Various artistsReleasedNovember 8, 1994Recorded1993–1994Length74:45LabelJiveHollywoodProducer Barry Hankerson (exec.) Adam A-Plus Carter Mike Chapman Cold 187um K. Fingers Fu-Schnickens James Jimmy Jam Harris Terry Lewis Robert Kelly Lyvio G. Organized Konfusion Pimp C Erick Sermon Trent Thomas Touré Singles from A Low Down Dirty Shame Down 4 WhatevaReleased: October 12, 1994 Get the Girl...

 

 

Lena HallengrenLena Hallengren pada tahun 2011 Menteri Kesehatan dan Urusan SosialPetahanaMulai menjabat 21 Januari 2019Penguasa monarkiCarl XVI GustafPerdana MenteriStefan Löfven PendahuluAnnika StrandhällPenggantiPetahanaMenteri Anak-Anak dan LansiaMasa jabatan8 Maret 2018 – 21 Januari 2019Penguasa monarkiCarl XVI GustafPerdana MenteriStefan Löfven PendahuluÅsa RegnérPenggantiJabatan ditiadakanMenteri Kesetaraan GenderMasa jabatan8 Maret 2018 – 21 Januari 2019...

 

 

Edith FellowsEdith Fellows pada 1937LahirEdith Marilyn Fellows(1923-05-20)20 Mei 1923Boston, Massachusetts, ASMeninggal26 Juni 2011(2011-06-26) (umur 88)Woodland Hills, California, ASPekerjaanAktrisTahun aktif1928–1994Suami/istriFreddie Fields(m. 1946–1956, bercerai)AnakKathy Fields Edith Marilyn Fellows (20 Mei 1923 – 26 Juni 2011)[1] adalah seorang aktris Amerika Serikat yang menjadi bintang cilik pada 1930an. Dikenal karena memerankan yatim piatu dan ...

Order of chivalry of Spain You can help expand this article with text translated from the corresponding article in Spanish. (December 2009) Click [show] for important translation instructions. View a machine-translated version of the Spanish article. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-translate...

 

 

Pour les articles homonymes, voir Louis Ier. Cet article est une ébauche concernant une personnalité monégasque. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Louis Ier Titre Prince de Monaco 10 janvier 1662 – 3 janvier 1701(38 ans, 11 mois et 24 jours) Prédécesseur Honoré II Successeur Antoine Ier Biographie Dynastie Maison Grimaldi Date de naissance 25 juillet 1642 Lieu de naissance Pa...

 

 

Atlético MadridNama lengkapClub Atlético de MadridJulukanLos Colchoneros (Pembuat Kasur)Los Rojiblancos (Merah dan Putih)Los Indios (Orang Indian)Nama singkatATMBerdiri26 April 1903; 120 tahun lalu (1903-04-26) sebagai Athletic Club Sucursal de MadridStadionStadion Wanda Metropolitano(Kapasitas: 68.456[1])Pemilik Miguel Ángel Gil Marin (51%) Idan Ofer (30%) Enrique Cerezo (19%)[2][3][4]Presiden Enrique CerezoPelatih kepala Diego SimeoneLigaLa Liga2022�...

Bahasa BurusuDituturkan diIndonesiaWilayahKalimantan UtaraEtnisDayak PunanPenutur4.400 (2007)[1] Rumpun bahasaAustronesia Melayu-PolinesiaKalimantan UtaraSabahan Barat DayaMurutiBagian UtaraBahasa Burusu Kode bahasaISO 639-3bqrGlottologburu1304[2]QIDQ5001028 Status konservasi C10Kategori 10Kategori ini menunjukkan bahwa bahasa telah punah (Extinct)C9Kategori 9Kategori ini menunjukkan bahwa bahasa sudah ditinggalkan dan hanya segelintir yang menuturkannya (Dormant)C8b...

 

 

Spanish politician In this Catalan name, the first or paternal surname is Nuet and the second or maternal family name is Pujals; both are generally joined by the conjunction i. Joan Josep NuetMPNuet in May 2018Member of the Congress of DeputiesIn office16 May 2019 – 4 May 2021ConstituencyBarcelonaIn office2 December 2011 – 25 October 2015Member of the Senate of SpainIn office21 December 2006 – 8 February 2011Preceded byJaume BoschSucceeded byJoan Saura...

 

 

Curry goat and riceCabritoGoat meat pepper soup served with bread This is a list of notable goat dishes, which use goat meat as a primary ingredient. Goat meat is the meat of the domestic goat (Capra aegagrus hircus). It is often called chevon or mutton when the meat comes from adults, and cabrito, capretto, or kid when from young animals. Worldwide, goat meat is less widely consumed than pork, beef, and poultry.[1] Goat dishes This is a dynamic list and may never be able to satisfy ...

Douglas B-66 Destroyer adalah pesawat bomber ringan pengintai (reconnaissance aircraft) Angkatan Udara Taktis Komando Udara AS berdasarkan A-3 Skywarrior Angkatan Laut Amerika Serikat. Hal itu dimaksudkan untuk menggantikan Douglas A-26 Invader. BPR-66 versi foto-pengintai dipesan secara bersamaan. B-66 USAF mempertahankan awak tiga orang dari USN A-3, tetapi juga dimasukkan kursi ejeksi. Referensi Baugher, Joe. Douglas B-66 Destroyer. USAAC/USAAF/USAF Bomber Aircraft: Third Series of USAAC/...

 

 

University's ice hockey club Oxford University Blues[1]CityOxford, United KingdomLeagueBUIHAConferenceSouthern ConferenceDivisionDivision 1 (Checking)Founded1885Home arenaOxford Ice RinkCapacity: 1,025Ice size: 184 x 85 feetColoursDark Blue and WhitePresidentJakub UlikCaptainMen's Blues: Shaan Baig;[2] Women's Blues: Emma Walker-Silverman and Leanne IorioWebsitehttp://oxforduniversityicehockey.com/ The Oxford University Ice Hockey Club (OUIHC) is home to the Men’s and Women�...

 

 

2016年美國總統選舉 ← 2012 2016年11月8日 2020 → 538個選舉人團席位獲勝需270票民意調查投票率55.7%[1][2] ▲ 0.8 %   获提名人 唐納·川普 希拉莉·克林頓 政党 共和黨 民主党 家鄉州 紐約州 紐約州 竞选搭档 迈克·彭斯 蒂姆·凱恩 选举人票 304[3][4][註 1] 227[5] 胜出州/省 30 + 緬-2 20 + DC 民選得票 62,984,828[6] 65,853,514[6]...

土库曼斯坦总统土库曼斯坦国徽土库曼斯坦总统旗現任谢尔达尔·别尔德穆哈梅多夫自2022年3月19日官邸阿什哈巴德总统府(Oguzkhan Presidential Palace)機關所在地阿什哈巴德任命者直接选举任期7年,可连选连任首任萨帕尔穆拉特·尼亚佐夫设立1991年10月27日 土库曼斯坦土库曼斯坦政府与政治 国家政府 土库曼斯坦宪法 国旗 国徽 国歌 立法機關(英语:National Council of Turkmenistan) ...

 

 

 本表是動態列表,或許永遠不會完結。歡迎您參考可靠來源來查漏補缺。 潛伏於中華民國國軍中的中共間諜列表收錄根據公開資料來源,曾潛伏於中華民國國軍、被中國共產黨聲稱或承認,或者遭中華民國政府調查審判,為中華人民共和國和中國人民解放軍進行間諜行為的人物。以下列表以現今可查知時間為準,正確的間諜活動或洩漏機密時間可能早於或晚於以下所歸�...

 

 

2011 film by David Yates For other uses, see Harry Potter and the Deathly Hallows – Part 2 (disambiguation). Harry Potter 8 redirects here. For the play, see Harry Potter and the Cursed Child. Harry Potter and the Deathly Hallows – Part 2Theatrical release posterDirected byDavid YatesScreenplay bySteve KlovesBased onHarry Potter and the Deathly Hallowsby J. K. RowlingProduced by David Heyman David Barron J. K. Rowling Starring Daniel Radcliffe Rupert Grint Emma Watson Helena Bonham Carter...

Change of minerals in pre-existing rocks without melting into liquid magma For other uses, see Metamorphism (disambiguation). Schematic representation of a metamorphic reaction. Abbreviations of minerals: act = actinolite; chl = chlorite; ep = epidote; gt = garnet; hbl = hornblende; plag = plagioclase. Two minerals represented in the figure do not participate in the reaction, they can be quartz and K-feldspar. This reaction takes place in nature when a mafic rock goes from amphibolite facies ...

 

 

Latvian woodwind instrument You can help expand this article with text translated from the corresponding article in Latvian. (May 2024) Click [show] for important translation instructions. View a machine-translated version of the Latvian article. Machine translation, like DeepL or Google Translate, is a useful starting point for translations, but translators must revise errors as necessary and confirm that the translation is accurate, rather than simply copy-pasting machine-translated te...

 

 

French social reformer Barthélemy-Prosper EnfantinBorn(1796-02-08)8 February 1796Paris, FranceDied1 September 1864(1864-09-01) (aged 68)Paris, Second French EmpireCitizenshipFranceAlma materÉcole PolytechniqueOccupation(s)Explorer, Engineer, Socio-Economic Movement FounderKnown forSuez Canal Proponent, Saint-Simonism Founder Barthélemy-Prosper Enfantin (8 February 1796 – 1 September 1864) was a French social reformer, one of the founders of Saint-Simonianism. ...

1957 film This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.Find sources: Spring in Berlin – news · newspapers · books · scholar · JSTOR (September 2021) Spring in BerlinFilm posterGermanFrühling in Berlin Directed byArthur Maria RabenaltWritten byCurt J. BraunProduced byKarl MitschkeKurt UlrichHeinz WillegStarri...

 

 

مملكة أكسوم መነገሠ ፡ አከሰመ مملكة أكسوم 100م – 960م   أكسوم في أقصى إتساع لها خلال القرن السادس الميلادي عاصمة أكسوم نظام الحكم ملكية لغات مشتركة اللغة الجعزية اللغة السبئية اللغة اليونانية التاريخ التأسيس 100م الزوال 960م بيانات أخرى العملة العملة الأكسومية اليوم جزء...