Median of medians

Median of Medians
Illustration showing vertical rectangles with 5 circles in each. The middle circle is black and the others are white/empty. The very middle rectangle has an arrow pointing to its black circle and an 'x' at the end of the arrow.
Median of medians
ClassSelection algorithm
Data structureArray
Worst-case performance
Best-case performance
Worst-case space complexity auxiliary
OptimalYes

In computer science, the median of medians is an approximate median selection algorithm, frequently used to supply a good pivot for an exact selection algorithm, most commonly quickselect, that selects the kth smallest element of an initially unsorted array. Median of medians finds an approximate median in linear time. Using this approximate median as an improved pivot, the worst-case complexity of quickselect reduces from quadratic to linear, which is also the asymptotically optimal worst-case complexity of any selection algorithm. In other words, the median of medians is an approximate median-selection algorithm that helps building an asymptotically optimal, exact general selection algorithm (especially in the sense of worst-case complexity), by producing good pivot elements.

Median of medians can also be used as a pivot strategy in quicksort, yielding an optimal algorithm, with worst-case complexity .[citation needed] Although this approach optimizes the asymptotic worst-case complexity quite well, it is typically outperformed in practice by instead choosing random pivots for its average complexity for selection and average complexity for sorting, without any overhead of computing the pivot.

Similarly, Median of medians is used in the hybrid introselect algorithm as a fallback for pivot selection at each iteration until kth smallest is found. This again ensures a worst-case linear performance, in addition to average-case linear performance: introselect starts with quickselect (with random pivot, default), to obtain good average performance, and then falls back to modified quickselect with pivot obtained from median of medians if the progress is too slow. Even though asymptotically similar, such a hybrid algorithm will have a lower complexity than a straightforward introselect up to a constant factor (both in average-case and worst-case), at any finite length.

The algorithm was published in Blum et al. (1973), and thus is sometimes called BFPRT after the last names of the authors. In the original paper the algorithm was referred to as PICK, referring to quickselect as "FIND".

Motivation

Quickselect is linear-time on average, but it can require quadratic time with poor pivot choices. This is because quickselect is a divide and conquer algorithm, with each step taking time in the size of the remaining search set. If the search set decreases exponentially quickly in size (by a fixed proportion), this yields a geometric series times the factor of a single step, and thus linear overall time. However, if the search set decreases slowly in size, such as linearly (by a fixed number of elements, in the worst case only reducing by one element each time), then a linear sum of linear steps yields quadratic overall time (formally, triangular numbers grow quadratically). For example, the worst-case occurs when pivoting on the smallest element at each step, such as applying quickselect for the maximum element to already sorted data and taking the first element as pivot each time.

If one instead consistently chooses "good" pivots, this is avoided and one always gets linear performance even in the worst case. A "good" pivot is one for which we can establish that a constant proportion of elements fall both below and above it, as then the search set decreases at least by a constant proportion at each step, hence exponentially quickly, and the overall time remains linear. The median is a good pivot – the best for sorting, and the best overall choice for selection – decreasing the search set by half at each step. Thus if one can compute the median in linear time, this only adds linear time to each step, and thus the overall complexity of the algorithm remains linear.

The median-of-medians algorithm computes an approximate median, namely a point that is guaranteed to be between the 30th and 70th percentiles (in the middle 4 deciles). Thus the search set decreases by at least 30%. The problem is reduced to 70% of the original size, which is a fixed proportion smaller. Applying the same algorithm on the now smaller set recursively until only one or two elements remain results in a cost of

Algorithm

As stated before, median-of-medians is used as a pivot selection strategy in the quickselect algorithm, which in pseudocode looks as follows.

function nthSmallest(list, n)
    index := select(list, 1, length of list, n) // Use select(list, 0, length of list - 1, n - 1) if list is zero-based numbering
    return list[index]

Be careful to handle left, right and n when implementing. The following pseudocode assumes that left, right, and the list use one-based numbering and that select is initially called with 1 as the argument to left and the length of the list as the argument to right. Note that this returns the index of the n'th smallest number after rearranging the list, rather than the actual value of the n'th smallest number.

function select(list, left, right, n)
    loop
        if left = right then
            return left
        pivotIndex := pivot(list, left, right)
        pivotIndex := partition(list, left, right, pivotIndex, n)
        if n = pivotIndex then
            return n
        else if n < pivotIndex then
            right := pivotIndex - 1
        else
            left := pivotIndex + 1

Subroutine pivot is the actual median-of-medians algorithm. It divides its input (a list of length n) into groups of at most five elements, computes the median of each of those groups using some subroutine, then recursively computes the true median of the medians found in the previous step:.[1] Note that pivot calls select; this is an instance of mutual recursion.

function pivot(list, left, right)
    // for 5 or less elements just get median
    if right − left < 5 then
        return partition5(list, left, right)
    // otherwise move the medians of five-element subgroups to the first n/5 positions
    for i from left to right in steps of 5
        // get the median position of the i'th five-element subgroup
        subRight := i + 4
        if subRight > right then
            subRight := right
        median5 := partition5(list, i, subRight)
        swap list[median5] and list[left + floor((i − left) / 5)]

    // compute the median of the n/5 medians-of-five
    mid := floor((right − left) / 10) + left + 1
    return select(list, left, left + floor((right − left) / 5), mid)

Partition helper functions

There is a subroutine called partition that can, in linear time, group a list (ranging from indices left to right) into three parts, those less than a certain element, those equal to it, and those greater than the element (a three-way partition). The grouping into three parts ensures that the median-of-medians maintains linear execution time in a case of many or all coincident elements. Here is pseudocode that performs a partition about the element list[pivotIndex]:

function partition(list, left, right, pivotIndex, n)
    pivotValue := list[pivotIndex]
    swap list[pivotIndex] and list[right]  // Move pivot to end
    storeIndex := left
    // Move all elements smaller than the pivot to the left of the pivot
    for i from left to right − 1 do
        if list[i] < pivotValue then
            swap list[storeIndex] and list[i]
            increment storeIndex
    // Move all elements equal to the pivot right after
    // the smaller elements
    storeIndexEq := storeIndex
    for i from storeIndex to right − 1 do
        if list[i] = pivotValue then
            swap list[storeIndexEq] and list[i]
            increment storeIndexEq
    swap list[right] and list[storeIndexEq]  // Move pivot to its final place
    // Return location of pivot considering the desired location n
    if n < storeIndex then
        return storeIndex  // n is in the group of smaller elements
    if n ≤ storeIndexEq then
        return n  // n is in the group equal to pivot
    return storeIndexEq // n is in the group of larger elements

The partition5 subroutine selects the median of a group of at most five elements; an easy way to implement this is insertion sort, as shown below.[1] It can also be implemented as a decision tree.

function partition5(list, left, right)
    i := left + 1
    while i ≤ right do
        j := i
        while j > left and list[j−1] > list[j] do
            swap list[j−1] and list[j]
            j := j − 1
        i :=  i + 1

    return left + (right - left) / 2

Properties of pivot

Of the groups, half the number of groups have their median less than the pivot (Median of Medians). Also, another half the number of groups have their median greater than the pivot. In each of the groups with median less than the pivot, there are two elements that are smaller than their respective medians, which are smaller than the pivot. Thus, each of the groups have at least 3 elements that are smaller than the pivot. Similarly, in each of the groups with median greater than the pivot, there are two elements that are greater than their respective medians, which are greater than the pivot. Thus, each of the groups have at least 3 elements that are greater than the pivot. Hence, the pivot is less than elements and greater than another elements. Thus the chosen median splits the ordered elements somewhere between 30%/70% and 70%/30%, which assures worst-case linear behavior of the algorithm. To visualize:

One iteration on a randomized set of 100 elements from 0 to 99
12 15 11 2 9 5 0 7 3 21 44 40 1 18 20 32 19 35 37 39
13 16 14 8 10 26 6 33 4 27 49 46 52 25 51 34 43 56 72 79
Medians 17 23 24 28 29 30 31 36 42 47 50 55 58 60 63 65 66 67 81 83
22 45 38 53 61 41 62 82 54 48 59 57 71 78 64 80 70 76 85 87
96 95 94 86 89 69 68 97 73 92 74 88 99 84 75 90 77 93 98 91

(red = "(one of the two possible) median of medians", gray = "number < red", white = "number > red")

5-tuples are shown here sorted by median, for clarity. Sorting the tuples is not necessary because we only need the median for use as pivot element.

Note that all elements above/left of the red (30% of the 100 elements) are less, and all elements below/right of the red (another 30% of the 100 elements) are greater.

Proof of linear running time

The median-calculating recursive call does not exceed worst-case linear behavior because the list of medians has size , while the other recursive call recurses on at most 70% of the list. Let be the time it takes to run a median-of-medians Quickselect algorithm on an array of size . Then we know this time is:

where

  • the part is for finding the true median of the medians, by running an (independent) Quickselect on them (since finding the median is just a special case of selecting a k-smallest element)
  • the term is for the partitioning work to create the two sides, one of which our Quickselect will recurse (we visited each element a constant number of times in order to form them into n/5 groups and take each median in time).
  • the part is for the actual Quickselect recursion (for the worst case, in which the k-th element is in the bigger partition that can be of size maximally)

By induction:

Analysis

The key step is reducing the problem to selecting in two lists whose total length is shorter than the original list, plus a linear factor for the reduction step. This allows a simple induction to show that the overall running time is linear.

The specific choice of groups of five elements is explained as follows. Firstly, computing median of an odd list is faster and simpler; while one could use an even list, this requires taking the average of the two middle elements, which is slower than simply selecting the single exact middle element. Secondly, five is the smallest odd number such that median of medians works. With groups of only three elements, the resulting list of medians to search in is length , and reduces the list to recurse into length , since it is greater than 1/2 × 2/3 = 1/3 of the elements and less than 1/2 × 2/3 = 1/3 of the elements. Thus this still leaves elements to search in, not reducing the problem sufficiently. The individual lists are shorter, however, and one can bound the resulting complexity to by the Akra–Bazzi method, but it does not prove linearity.

Conversely, one may instead group by = seven, nine, or more elements, and this does work. This reduces the size of the list of medians to , and the size of the list to recurse into asymptotes at 3n/4 (75%), as the quadrants in the above table approximate 25%, as the size of the overlapping lines decreases proportionally. This reduces the scaling factor from 10 asymptotically to 4, but accordingly raises the term for the partitioning work. Finding the median of a larger group takes longer, but is a constant factor (depending only on ), and thus does not change the overall performance as n grows. In fact, considering the number of comparisons in the worst case, the constant factor is .

If one instead groups the other way, say dividing the element list into 5 lists, computing the median of each, and then computing the median of these – i.e., grouping by a constant fraction, not a constant number – one does not as clearly reduce the problem, since it requires computing 5 medians, each in a list of elements, and then recursing on a list of length at most . As with grouping by 3, the individual lists are shorter, but the overall length is no shorter – in fact longer – and thus one can only prove superlinear bounds. Grouping into a square of lists of length is similarly complicated.

References

  1. ^ a b Cormen, Thomas H.; Leiserson, Charles E.; Rivest, Ronald L.; Stein, Clifford (2009) [1990]. Introduction to Algorithms (3rd ed.). MIT Press and McGraw-Hill. p. 220. ISBN 0-262-03384-4.

Read other articles:

New Brompton 1894–95 football seasonNew Brompton1894–95 seasonChairmanHorace CroneenSouthern League Division Two1stFA CupThird Qualifying RoundTop goalscorerLeague: Arthur Rule (18)All: Arthur Rule (22)Highest home attendancec. 8,000 vs Chatham (3 November 1894)Lowest home attendancec. 300 vs Old St Stephen's (1 December 1894) Home colours ← 1893–941895–96 → During the 1894–95 English football season, New Brompton F.C. (since 1912 called Gillingham F.C.) compe...

 

Study of soils in their natural environment This article is about soil science. For the study of child behavior and development, see Paedology. For child medicine, see Pediatrics. For the medical specialty relating to the foot, see Podiatry. For the theory and practice of learning, see Pedagogy. Soil Profile on Chalk at Seven Sisters Country Park, England Pedology (from Greek: πέδον, pedon, soil; and λόγος, logos, study) is a discipline within soil science which focuses on understan...

 

City in Hays County, Texas, US For the city in California, see San Marcos, California. CitySan Marcos, TexasCityHays County Courthouse Historic District SealNickname: San MarvelousCoordinates: 29°52′46″N 97°56′20″W / 29.87944°N 97.93889°W / 29.87944; -97.93889Country United StatesState TexasCountiesHays, Caldwell, GuadalupeGovernment • TypeCouncil-Manager • MayorJane Hughson  • City ManagerBert LumbrerasAre...

Державний комітет телебачення і радіомовлення України (Держкомтелерадіо) Приміщення комітетуЗагальна інформаціяКраїна  УкраїнаДата створення 2003Керівне відомство Кабінет Міністрів УкраїниРічний бюджет 1 964 898 500 ₴[1]Голова Олег НаливайкоПідвідомчі ор...

 

تغير المناخ والفقر حدثان مترابطان. ففي حين يؤثر الاحتباس الحراري على البيئة الطبيعية، ولا سيما الزراعة، فهو يؤثر أيضا على البشر. يزيد تغير المناخ العالمي من الفقر، وخاصة في المجتمعات ذات الدخل المنخفض. كرس المجتمع الدولي القضايا المتشابكة المتعلقة بالتنمية الاقتصادية في...

 

Shaving of head hair as a sign of religious devotion Roman tonsure (Catholicism) Tonsure (/ˈtɒnʃər/) is the practice of cutting or shaving some or all of the hair on the scalp as a sign of religious devotion or humility. The term originates from the Latin word tonsura (meaning clipping or shearing[1]) and referred to a specific practice in medieval Catholicism, abandoned by papal order in 1972.[citation needed] Tonsure can also refer to the secular practice of shaving all ...

Jonathan BorléeJonathan Borlée ai mondiali indoor di Doha 2010Nazionalità Belgio Altezza178 cm Peso62 kg Atletica leggera SpecialitàVelocità Record 200 m 2031 (2012) 300 m 3187 (2012) 400 m 4443 (2012) Palmarès  Belgio Competizione Ori Argenti Bronzi Mondiali indoor 0 1 1 World Relays 0 0 2 Europei 3 1 2 Europei indoor 2 0 1  Vallonia Competizione Ori Argenti Bronzi Giochi della Francofonia 0 0 1 Vedi maggiori dettagliStatistiche aggiornate al 20 agosto 2022 Modifica ...

 

Historic church in Wisconsin, United States United States historic placeFirst Unitarian Society Meeting HouseU.S. National Register of Historic PlacesU.S. National Historic Landmark First Unitarian Meeting HouseShow map of WisconsinShow map of the United StatesInteractive map showing the First Unitarian Society Meeting House’s locationLocation900 University Bay Dr.,Shorewood Hills, WisconsinCoordinates43°4′33.2″N 89°26′6.65″W / 43.075889°N 89.4351806°W / ...

 

New York City Subway service New York City Subway serviceQueens Boulevard Express/Sixth Avenue LocalManhattan-bound F local train of R160s arriving at Avenue PConey Island-bound F express train of R160s passing Fourth AvenueNorthern endJamaica–179th StreetSouthern endConey Island–Stillwell Avenue or Kings Highway (limited)Stations55 (overnight)45 (local service)39 (express service)Rolling stockR160[1][2](Rolling stock assignments subject to change)DepotJamai...

Pemilihan umum Wali Kota Cimahi 20172012202215 Februari 2017Kandidat   Calon Atty Suharti Tochija Asep Hadad Didjaya Ajay Muhammad Priatna Partai Partai Golkar Partai Demokrat PDI-P Pendamping Achmad Zulkarnain Irma Indriyani Ngatiyana Suara rakyat 76.340 79.931 106.583 Persentase 29,04% 30,41% 40,55% Peta persebaran suara Peta lokasi Kota Cimahi Wali Kota petahanaAtty Suharti Tochija Golkar Wali Kota terpilih Ajay Muhammad Priatna PDI-P Pemilihan umum Wali Kota Cimahi dilaksanakan...

 

Stasiun Namezu滑津駅Stasiun Namezu, Oktober 2009Lokasi2496 Namezu, Saku-shi, Nagano-ken 385-0051 JepangKoordinat36°14′20″N 138°28′30″E / 36.2388°N 138.4751°E / 36.2388; 138.4751Ketinggian665.2 meter[1]Operator JR EastJalur■ Jalur KoumiLetak66.5 km dari KobuchizawaJumlah peron1 peron sisiInformasi lainStatusTanpa staffSitus webSitus web resmiSejarahDibuka6 Juni 1916PenumpangFY201186 Lokasi pada petaStasiun Namezu mmLokasi di Nagano Prefectu...

 

PS Lambhuk 1948Nama lengkapPersatuan Sepakbola Lambhuk 1948Berdiri1948; 76 tahun lalu (1948)StadionStadion Mini Lambhuk Banda Aceh, IndonesiaPemilikAskot PSSI Banda AcehKetuaSyahrul FuadiLigaLiga 3 PS Lambhuk 1948 (atau singkatan dari Persatuan Sepakbola Lambhuk 1948) adalah tim sepak bola amatir yang bermarkas di Stadion Mini Lambhuk, Gampong Lambhuk, Banda Aceh, Provinsi Aceh.[1] Sejarah PS Lambhuk 1948 didirikan pada tahun 1948 oleh beberapa tokoh masyarakat Gampong Lambhuk ya...

← березень → Пн Вт Ср Чт Пт Сб Нд         1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 2024 рік 17 березня — 76-й день року (77-й у високосні роки) в григоріанському календарі. До кінця року залишається 289 днів. Цей день в історії: 16 березня—17 березня—18 березня Зм�...

 

FIFAワールドカップ > 2026 この記事はサッカーに関して将来予定されるイベントを扱っています。 内容は最新の情報を反映していない可能性があります。 2026 FIFAワールドカップFIFA World Cup 26™大会概要開催国 カナダ メキシコ アメリカ合衆国日程 2026年6月11日 - 7月19日チーム数 48 (6連盟)開催地数 16 (16都市)大会統計試合数 104試合  < 20222030 >  ...

 

HanDati amministrativiPoliticaTerritorio e popolazioneLo stato di Han e gli altri Regni combattenti Evoluzione storicaOra parte di Cina Modifica dati su Wikidata · ManualeStato di Han(220 a.C.) Lo stato di Han (韓國, Hánguó) (403 a.C.-230 a.C.) fu un regno cinese del Periodo dei regni combattenti. Il territorio dello Stato di Han si collocava fra lo stato di Qin e le pianure del nord, e per questo motivo divenne bersaglio di frequenti operazioni militari. Nonostante i tentativi...

Åsa Westlund Membro del RiksdagIn caricaInizio mandato2014 Sito istituzionale EuroparlamentareLegislaturaVI-VII GruppoparlamentareAlleanza Progressista dei Socialisti e dei Democratici Dati generaliPartito politicoPartito Socialdemocratico dei Lavoratori di Svezia Åsa Westlund (Anderstorp, 19 maggio 1976) è una politica svedese, eurodeputata per il Partito Socialdemocratico Svedese dal 2004 al 2014 e deputata al Parlamento dal 2014. Indice 1 Biografia 2 Vita privata 3 Note 4 Altr...

 

King of Spain (1556–1598) and Portugal (1580–1598) Philip IIPortrait by Sofonisba Anguissola (1565)King of Spain (more...) Reign16 January 1556 – 13 September 1598PredecessorCharles ISuccessorPhilip IIIKing of Portugal (more...) Reign12 September 1580 – 13 September 1598Acclamation16 April 1581, TomarPredecessorHenry or Anthony (disputed)SuccessorPhilip III of Spain (as Philip II)King of England and Ireland (jure uxoris) Reign25 July 1554 – 17 November 1558PredecessorMary ISuccessor...

 

Dieser Artikel beschäftigt sich mit dem Automobilhersteller. Zu weiteren Bedeutungen siehe Saab (Begriffsklärung). Saab Automobile Parts AB Logo Rechtsform Aktiebolag Gründung 1947 Auflösung 2012 Auflösungsgrund Insolvenz Sitz Trollhättan, Schweden Schweden Branche Automobilhersteller Website www.saab.com Stand: 14. Juli 2024 Saab Automobile war ein schwedischer Automobilhersteller und ein Unternehmen der Automobilindustrie. Das Unternehmen wurde 1947 als Produktionssparte de...

機動戦士ガンダムさん ジャンル ギャグ 漫画 原作・原案など 矢立肇・富野由悠季 作画 大和田秀樹 出版社 KADOKAWA その他の出版社 台湾角川 Japonica Polonica Fantastica JBC出版社 掲載誌 ガンダムエース4コマnanoエース レーベル 角川コミックス・エース 発表号 2001年創刊号 - (ガンダムエース)Vol.1 - (4コマnanoエース) - 巻数 既刊21巻(2023年8月現在) アニメ:ガンダムさん �...

 

Honor society for science and engineering Sigma XiΣΞFounded1886; 138 years ago (1886)Cornell UniversityTypeHonor SocietyAffiliationHSCFormer AffiliationACHSStatusActiveEmphasisScience and EngineringScopeInternationalMottoCompanions in Zealous ResearchColors  Blue and   GoldPublicationAmerican ScientistChapters370 active, 170 inactiveMembers60,000[1] lifetimeHeadquarters3200 East NC Highway 54Ste 300Research Triangle Park, North Carolina 27709 United State...