Ball tree

In computer science, a ball tree, balltree[1] or metric tree, is a space partitioning data structure for organizing points in a multi-dimensional space. A ball tree partitions data points into a nested set of balls. The resulting data structure has characteristics that make it useful for a number of applications, most notably nearest neighbor search.

Informal description

A ball tree is a binary tree in which every node defines a D-dimensional ball containing a subset of the points to be searched. Each internal node of the tree partitions the data points into two disjoint sets which are associated with different balls. While the balls themselves may intersect, each point is assigned to one or the other ball in the partition according to its distance from the ball's center. Each leaf node in the tree defines a ball and enumerates all data points inside that ball.

Each node in the tree defines the smallest ball that contains all data points in its subtree. This gives rise to the useful property that, for a given test point t outside the ball, the distance to any point in a ball B in the tree is greater than or equal to the distance from t to the surface of the ball. Formally: [2]

Where is the minimum possible distance from any point in the ball B to some point t.

Ball-trees are related to the M-tree, but only support binary splits, whereas in the M-tree each level splits to fold, thus leading to a shallower tree structure, therefore need fewer distance computations, which usually yields faster queries. Furthermore, M-trees can better be stored on disk, which is organized in pages. The M-tree also keeps the distances from the parent node precomputed to speed up queries.

Vantage-point trees are also similar, but they binary split into one ball, and the remaining data, instead of using two balls.

Construction

A number of ball tree construction algorithms are available.[1] The goal of such an algorithm is to produce a tree that will efficiently support queries of the desired type (e.g. nearest-neighbor) in the average case. The specific criteria of an ideal tree will depend on the type of question being answered and the distribution of the underlying data. However, a generally applicable measure of an efficient tree is one that minimizes the total volume of its internal nodes. Given the varied distributions of real-world data sets, this is a difficult task, but there are several heuristics that partition the data well in practice. In general, there is a tradeoff between the cost of constructing a tree and the efficiency achieved by this metric. [2]

This section briefly describes the simplest of these algorithms. A more in-depth discussion of five algorithms was given by Stephen Omohundro.[1]

k-d construction algorithm

The simplest such procedure is termed the "k-d Construction Algorithm", by analogy with the process used to construct k-d trees. This is an offline algorithm, that is, an algorithm that operates on the entire data set at once. The tree is built top-down by recursively splitting the data points into two sets. Splits are chosen along the single dimension with the greatest spread of points, with the sets partitioned by the median value of all points along that dimension. Finding the split for each internal node requires linear time in the number of samples contained in that node, yielding an algorithm with time complexity , where n is the number of data points.

Pseudocode

function construct_balltree is
    input: D, an array of data points.
    output: B, the root of a constructed ball tree.

    if a single point remains then
        create a leaf B containing the single point in D
        return B
    else
        let c be the dimension of greatest spread
        let p be the central point selected considering c
        let L, R be the sets of points lying to the left and right of the median along dimension c
        create B with two children: 
            B.pivot := p
            B.child1 := construct_balltree(L),
            B.child2 := construct_balltree(R),
            let B.radius be maximum distance from p among children
        return B
    end if
end function

An important application of ball trees is expediting nearest neighbor search queries, in which the objective is to find the k points in the tree that are closest to a given test point by some distance metric (e.g. Euclidean distance). A simple search algorithm, sometimes called KNS1, exploits the distance property of the ball tree. In particular, if the algorithm is searching the data structure with a test point t, and has already seen some point p that is closest to t among the points encountered so far, then any subtree whose ball is further from t than p can be ignored for the rest of the search.

Description

The ball tree nearest-neighbor algorithm examines nodes in depth-first order, starting at the root. During the search, the algorithm maintains a max-first priority queue (often implemented with a heap), denoted Q here, of the k nearest points encountered so far. At each node B, it may perform one of three operations, before finally returning an updated version of the priority queue:

  1. If the distance from the test point t to the current node B is greater than the furthest point in Q, ignore B and return Q.
  2. If B is a leaf node, scan through every point enumerated in B and update the nearest-neighbor queue appropriately. Return the updated queue.
  3. If B is an internal node, call the algorithm recursively on B's two children, searching the child whose center is closer to t first. Return the queue after each of these calls has updated it in turn.

Performing the recursive search in the order described in point 3 above increases likelihood that the further child will be pruned entirely during the search.

Pseudocode

function knn_search is
    input: 
        t, the target point for the query
        k, the number of nearest neighbors of t to search for
        Q, max-first priority queue containing at most k points
        B, a node, or ball, in the tree
    output: 
        Q, containing the k nearest neighbors from within B

    if distance(t, B.pivot) - B.radius ≥ distance(t, Q.first) then
        return Q unchanged
    else if B is a leaf node then
        for each point p in B do
            if distance(t, p) < distance(t, Q.first) then
                add p to Q
                if size(Q) > k then
                    remove the furthest neighbor from Q
                end if
            end if
        repeat
    else
        let child1 be the child node closest to t
        let child2 be the child node furthest from t
        knn_search(t, k, Q, child1)
        knn_search(t, k, Q, child2)
    end if
    return Q
end function[2]

Performance

In comparison with several other data structures, ball trees have been shown to perform fairly well on the nearest-neighbor search problem, particularly as their number of dimensions grows.[3][4] However, the best nearest-neighbor data structure for a given application will depend on the dimensionality, number of data points, and underlying structure of the data.

References

  1. ^ a b c Omohundro, Stephen M. (1989) "Five Balltree Construction Algorithms"
  2. ^ a b c Liu, T.; Moore, A. & Gray, A. (2006). "New Algorithms for Efficient High-Dimensional Nonparametric Classification" (PDF). Journal of Machine Learning Research. 7: 1135–1158.
  3. ^ Kumar, N.; Zhang, L.; Nayar, S. (2008). "What is a Good Nearest Neighbors Algorithm for Finding Similar Patches in Images?". Computer Vision – ECCV 2008 (PDF). Lecture Notes in Computer Science. Vol. 5303. p. 364. CiteSeerX 10.1.1.360.7582. doi:10.1007/978-3-540-88688-4_27. ISBN 978-3-540-88685-3.
  4. ^ Kibriya, A. M.; Frank, E. (2007). "An Empirical Comparison of Exact Nearest Neighbour Algorithms". Knowledge Discovery in Databases: PKDD 2007 (PDF). Lecture Notes in Computer Science. Vol. 4702. p. 140. doi:10.1007/978-3-540-74976-9_16. ISBN 978-3-540-74975-2.

Read other articles:

Dimple KapadiaLahirDimple Chunnibhai KapadiaPekerjaanAktrisTahun aktif1973; 1984–sekarangSuami/istriRajesh Khanna (1973-1984) (cerai)AnakTwinkle Khanna Rinke Khanna Dimple Chunnibhai Kapadia (lahir 8 Juni 1957) adalah aktris Bollywood, mantan istri dari aktor Rajesh Khanna dan ibu dari aktris Twinkle Khanna. Film Yang Dibintanginya Bobby (1973) Zakhmi Sher (1984) Manzil Manzil (1984) Saagar (1985) Pataal Bhairavi (1985) Lavaa (1985) Aitbaar (1985) Arjun (1985) Vikram (1986) (Tamil fil...

 

1958 1967 Élections législatives de 1962 dans les Basses-Alpes 2 sièges de députés à l'Assemblée nationale 18 et 25 novembre 1962 Corps électoral et résultats Inscrits 59 254 Votants au 1er tour 40 718   68,72 %  7,7 Votes exprimés au 1er tour 39 375 Votants au 2d tour 41 876   70,69 % Votes exprimés au 2d tour 39 551 Majorité présidentielle Liste Union pour la nouvelle République (UDT)Républicains indépendantsModérés Vo...

 

Danish-German organist and composer (1637–1707) Dieterich BuxtehudeThe only surviving portrait of Buxtehude, playing a viol, from Musical Company by Johannes Voorhout (1674)BornDiderich Hansen BuxtehudeHelsingborg, Scania, Denmark–NorwayBaptised1637Died9 May 1707(1707-05-09) (aged 69–70)Free City of Lübeck, Holy Roman EmpireOccupationsComposerorganistWorksList of compositionsSignature Dieterich Buxtehude (German: [ˈdiːtəʁɪç bʊkstəˈhuːdə]; born Diderich Hansen Bu...

Questa voce sull'argomento centri abitati del Rondônia è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Ji-Paranácomune Ji-Paraná – Veduta LocalizzazioneStato Brasile Stato federato Rondônia MesoregioneLeste Rondoniense MicroregioneJi-Paraná AmministrazioneSindacoJesuakdo Pires Ferreira Junior TerritorioCoordinate10°52′46″S 61°56′54″W / 10.879444°S 61.948333°W-10.879444; -61.948333 (Ji-Paraná)Coordinate: ...

 

This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: List of auxiliary NTHS Expressways – news · newspapers · books · scholar · JSTOR (June 2021) (Learn how and when to remove this message) National Trunk Highway SystemNTHS expressway markerNTHS expressway in ChinaSystem informationFormed13 January 2005;...

 

Pour les articles homonymes, voir Hrádek. Cet article est une ébauche concernant une localité tchèque. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Hrádek   Administration Pays Tchéquie Région Moravie-du-Sud District Znojmo Région historique Moravie Maire Ondřej Kubic Code postal 671 27 Démographie Population 930 hab. (2019) Densité 43 hab./km2 Géographie Coordonnées 48° 46�...

Questa voce sull'argomento calciatori tunisini è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Segui i suggerimenti del progetto di riferimento. Mokhtar Hasni Nazionalità  Tunisia Calcio Ruolo Attaccante Termine carriera 1982 CarrieraSquadre di club1 1971-1976 El-Makarem Mahdia? (?)1976-1977 UBS Auvelais? (?)1977-1982 La Louvière? (?)Nazionale 1974-1978 Tunisia? (?) 1 I due numeri indicano le presenze e le reti segnate, per le sole partite...

 

Artikel ini memiliki beberapa masalah. Tolong bantu memperbaikinya atau diskusikan masalah-masalah ini di halaman pembicaraannya. (Pelajari bagaimana dan kapan saat yang tepat untuk menghapus templat pesan ini) Artikel biografi ini ditulis menyerupai resume atau daftar riwayat hidup (Curriculum Vitae). Tolong bantu perbaiki agar netral dan ensiklopedis. Biografi ini tidak memiliki referensi atau sumber sehingga isinya tidak dapat dipastikan. Bantu memperbaiki artikel ini dengan menambahkan su...

 

جزء من سلسلة مقالات حولعلم الفقه مواضيع الفقه الفروع الأصول القواعد أدلة مدارس بوابة الفقه علم فروع الفقه فقه العبادات الطهارة الصلاة الزكاة الصيام الحج والعمرة فقه المعاملات فقه المواريث علم أصول الفقه الأصول والقواعد علم أصول الفقه أدلة الفقه الأحكام الشرعية فرض فرض ك�...

Multi-sport event in London, England 2012 Olympics and London 2012 redirect here. For the Summer Paralympics, see 2012 Summer Paralympics. For the Winter Youth Olympics in Innsbruck, Austria, see 2012 Winter Youth Olympics. For the video game, see London 2012 (video game). Games of the XXX OlympiadEmblem of the 2012 Summer Olympics; other colour variants are shown belowHost cityLondon, England, United KingdomMottoInspire a GenerationNations204+2 (including 2 IOA teams)Athletes10,518 (5,863 me...

 

佩德罗·阿尔莫多瓦尔2008年攝於馬德里导演出生Pedro Almodóvar Caballero (1949-09-25) 1949年9月25日(74歲)西班牙國雷阿爾城职业電影導演活跃年代1974年-現今各地用词差异中国大陆佩德罗·阿尔莫多瓦尔佩德罗·阿莫多瓦臺灣佩卓·阿莫多瓦香港貝德羅·艾慕杜華 奖项 奥斯卡金像奖最佳外語片1999年 《我的母親》最佳原創劇本2002年 《悄悄告訴她》戛纳电影节最佳導演獎1999年 《我�...

 

British author and colonial administrator Sir William Lee-Warner Sir William Lee-Warner GCSI (18 April 1846 – 18 January 1914) was a British author and colonial administrator in the Indian Civil Service. He was Chief Commissioner of Coorg in 1895.[1][2][3] In 1907 he headed the eponymous Lee Warner Committee that examined Indians receiving education in Britain. Early life and education Lee-Warner was born in Little Walsingham[4] into a prominent Norfolk famil...

У этого термина существуют и другие значения, см. Урарту (значения). Историческое государствоУрарту Урарту в период наибольшей территориальной экспансии в 743 г. до н. э. ←   → IX век до н. э. — 590 до н. э. Столица Арзашкун, Тушпа Язык(и) урартский, хурритский, лувийский, п...

 

Mappa della Gallia prima della conquista di Cesare nel 59 a.C. I Pictoni furono un popolo gallico che, a seconda delle fonti e delle epoche, sono stati anche chiamati Pitti (che è pure il nome di un antico popolo scozzese, i Pitti), o, ben dopo la conquista romana della Gallia, Pictavi. Indice 1 Il periodo di indipendenza 1.1 Territorio 1.2 Città ed economia 1.3 Organizzazione politica e culti 1.4 Monetazione 2 La guerra di Gallia e la fine dell'indipendenza 3 La pace romana 3.1 Note 4 Bibl...

 

Type of internal combustion engine Honda MVX250F two-stroke engine The V3 engine is a V engine with two cylinders in one bank and one cylinder in the other bank. It is a rare configuration, which has been mostly used in two-stroke engines for motorcycles competing in Grand Prix motorcycle racing. The first example was the 1955 DKW 350.[1] The 1968 Suzuki RP68 was intended to compete in the 1968 season, however a rule change mandating single-cylinder engines meant that the 50 cc (...

Seasonal NFL playoffs 1995–96 NFL playoffsDatesDecember 30, 1995–January 28, 1996Season1995Teams12Games played11Super Bowl XXX siteSun Devil StadiumTempe, ArizonaDefending championsSan Francisco 49ersChampionsDallas CowboysRunners-upPittsburgh SteelersConferencerunners-upGreen Bay PackersIndianapolis Colts NFL playoffs ← 1994–95 1996–97 → The National Football League playoffs for the 1995 season began on December 30, 1995. The postseason tournament concluded with the Dallas Cowboy...

 

سفارة روسيا في أوكرانيا روسيا أوكرانيا الإحداثيات 50°26′01″N 30°28′19″E / 50.4336°N 30.471825°E / 50.4336; 30.471825   البلد أوكرانيا  المكان كييف الاختصاص أوكرانيا  الموقع الالكتروني الموقع الرسمي تعديل مصدري - تعديل   سفارة روسيا في أوكرانيا هي أرفع تمثيل دبلوماسي[1]...

 

Osborn Warren DeignanCoxswain Osborn W. DeignanBorn(1877-02-24)February 24, 1877Stuart, Iowa, U.S.DiedApril 16, 1916(1916-04-16) (aged 39)Cañon City, Colorado, U.S.Place of burialForest Lawn Memorial Park, Glendale, California, U.S.Allegiance United States of AmericaService/branch United States NavyYears of service1894–1906RankBoatswainUnitUSS MerrimacBattles/warsSpanish–American War Sinking of the Merrimac AwardsMedal of Honor Osborn Warren Deignan (February 24, 1877...

Die schärfsten Kritiker der Elche /waren früher selber welcheBronze sculpture by Hans Traxlerin front of the Caricatura Museum Frankfurt Neue Frankfurter Schule (NFS, New Frankfurt School) is a group of writers and artists which was founded by former members of the editorial staff of the German satirical magazine pardon. They have published the magazine Titanic from 1979. History Among the founding members of the group, which first had no name, were: F. W. Bernstein (1938–2018) Bernd Eile...

 

Demisse WoldeBiographieNaissance 12 juin 1932OromiaDécès 26 mai 2002 (à 69 ans)Addis-AbebaNationalité éthiopienneActivités Marathonien, coureur de fond, athlèteAutres informationsTaille 1,7 mPoids 54 kgSport AthlétismeDiscipline sportive Marathonmodifier - modifier le code - modifier Wikidata Demisse « Mamo » Wolde, né le 12 juin 1932 à Diri Jille et décédé le 26 mai 2002 à Addis-Abeba, est un athlète éthiopien, pratiquant le marathon. Biographie Il participe ...