Timsort

Timsort
ClassSorting algorithm
Data structureArray
Worst-case performance[1][2]
Best-case performance[3]
Average performance
Worst-case space complexity
OptimalNo[4]

Timsort is a hybrid, stable sorting algorithm, derived from merge sort and insertion sort, designed to perform well on many kinds of real-world data. It was implemented by Tim Peters in 2002 for use in the Python programming language. The algorithm finds subsequences of the data that are already ordered (runs) and uses them to sort the remainder more efficiently. This is done by merging runs until certain criteria are fulfilled. Timsort has been Python's standard sorting algorithm since version 2.3 (since version 3.11 using the Powersort merge policy[5]), and is used to sort arrays of non-primitive type in Java SE 7,[6] on the Android platform,[7] in GNU Octave,[8] on V8,[9] and Swift.[10]

It uses techniques from Peter McIlroy's 1993 paper "Optimistic Sorting and Information Theoretic Complexity".[11]

Operation

Timsort was designed to take advantage of runs of consecutive ordered elements that already exist in most real-world data, natural runs. It iterates over the data collecting elements into runs and simultaneously putting those runs in a stack. Whenever the runs on the top of the stack match a merge criterion, they are merged. This goes on until all data is traversed; then, all runs are merged two at a time and only one sorted run remains. The advantage of merging ordered runs instead of merging fixed size sub-lists (as done by traditional mergesort) is that it decreases the total number of comparisons needed to sort the entire list.

Each run has a minimum size, which is based on the size of the input and is defined at the start of the algorithm. If a run is smaller than this minimum run size, insertion sort is used to add more elements to the run until the minimum run size is reached.

Merge criteria

The runs are inserted in a stack. If |Z| ≤ |Y| + |X|, then X and Y are merged and replaced on the stack. In this way, merging is continued until all runs satisfy i. |Z| > |Y| + |X| and ii. |Y| > |X|.

Timsort is a stable sorting algorithm (order of elements with same key is kept) and strives to perform balanced merges (a merge thus merges runs of similar sizes).

In order to achieve sorting stability, only consecutive runs are merged. Between two non-consecutive runs, there can be an element with the same key inside the runs. Merging those two runs would change the order of equal keys. Example of this situation ([] are ordered runs): [1 2 2] 1 4 2 [0 1 2]

In pursuit of balanced merges, Timsort considers three runs on the top of the stack, X, Y, Z, and maintains the invariants:

  1. |Z| > |Y| + |X|
  2. |Y| > |X|[12]

If any of these invariants is violated, Y is merged with the smaller of X or Z and the invariants are checked again. Once the invariants hold, the search for a new run in the data can start.[13] These invariants maintain merges as being approximately balanced while maintaining a compromise between delaying merging for balance, exploiting fresh occurrence of runs in cache memory and making merge decisions relatively simple.

Merge space overhead

To merge, Timsort copies the elements of the smaller array (X in this illustration) to temporary memory, then sorts and fills elements in final order into the combined space of X and Y.

The original merge sort implementation is not in-place and it has a space overhead of N (data size). In-place merge sort implementations exist, but have a high time overhead. In order to achieve a middle term, Timsort performs a merge sort with a small time overhead and smaller space overhead than N.

First, Timsort performs a binary search to find the location where the first element of the second run would be inserted in the first ordered run, keeping it ordered. Then, it performs the same algorithm to find the location where the last element of the first run would be inserted in the second ordered run, keeping it ordered. Elements before and after these locations are already in their correct place and do not need to be merged. Then, the smaller of these shrunk runs is copied into temporary memory, and the copied elements are merged with the larger shrunk run into the now free space. If the leftmost shrunk run is smaller, the merge proceeds from left to right. If the rightmost shrunk run is smaller, merging proceeds from right to left (i.e. beginning with elements at the ends of the temporary space and leftmost run, and filling the free space from its end). This optimization reduces the number of required element movements, the running time and the temporary space overhead in the general case.

Example: two runs [1, 2, 3, 6, 10] and [4, 5, 7, 9, 12, 14, 17] must be merged. Note that both runs are already sorted individually. The smallest element of the second run is 4 and it would have to be added at the fourth position of the first run in order to preserve its order (assuming that the first position of a run is 1). The largest element of the first run is 10 and it would have to be added at the fifth position of the second run in order to preserve its order. Therefore, [1, 2, 3] and [12, 14, 17] are already in their final positions and the runs in which elements movements are required are [6, 10] and [4, 5, 7, 9]. With this knowledge, we only need to allocate a temporary buffer of size 2 instead of 4.

Merge direction

Merging can be done in both directions: left-to-right, as in the traditional mergesort, or right-to-left.

Galloping mode during merge

Elements (pointed to by blue arrow) are compared and the smaller element is moved to its final position (pointed to by red arrow).

An individual merge of runs R1 and R2 keeps the count of consecutive elements selected from a run. When this number reaches the minimum galloping threshold (min_gallop), Timsort considers that it is likely that many consecutive elements may still be selected from that run and switches into galloping mode. Let us assume that R1 is responsible for triggering it. In this mode, the algorithm performs a two-stage search for the place in the run R1 where the next element x of the run R2 would be inserted. In the first stage it performs an exponential search, also known as a galloping search, until finding a k such that R1[2k−1 − 1] < x <= R1[2k − 1], i.e. a region of uncertainty comprising 2k−1 − 1 consecutive elements of R1. The second stage performs a straight binary search of this region to find the exact location in R1 for x. Galloping mode is an attempt to adapt the merge algorithm to the pattern of intervals between elements in runs.

All red elements are smaller than blue (here, 21). Thus they can be moved in a chunk to the final array.

Galloping is not always efficient. In some cases galloping mode requires more comparisons than a simple linear search. According to benchmarks done by the developer, galloping is beneficial only when the initial element of one run is not one of the first seven elements of the other run. This implies an initial threshold of 7. To avoid the drawbacks of galloping mode, two actions are taken: (1) When galloping is found to be less efficient than binary search, galloping mode is exited. (2) The success or failure of galloping is used to adjust min_gallop. If the selected element is from the same array that returned an element previously, min_gallop is reduced by one, thus encouraging the return to galloping mode. Otherwise, the value is incremented by one, thus discouraging a return to galloping mode. In the case of random data, the value of min_gallop becomes so large that galloping mode never recurs.[14]

Descending runs

In order to also take advantage of data sorted in descending order, Timsort reverses strictly descending runs when it finds them and adds them to the stack of runs. Since descending runs are later blindly reversed, excluding runs with equal elements maintains the algorithm's stability; i.e., equal elements won't be reversed.

Minimum run size

Timsort algorithm searches for minimum-size ordered sequences, minruns, to perform its sort.

Because merging is most efficient when the number of runs is equal to, or slightly less than, a power of two, and notably less efficient when the number of runs is slightly more than a power of two, Timsort chooses minrun to try to ensure the former condition.[12]

Minrun is chosen from the range 32 to 64 inclusive, such that the size of the data, divided by minrun, is equal to, or slightly less than, a power of two. The final algorithm takes the six most significant bits of the size of the array, adds one if any of the remaining bits are set, and uses that result as the minrun. This algorithm works for all arrays, including those smaller than 64; for arrays of size 63 or less, this sets minrun equal to the array size and Timsort reduces to an insertion sort.[12]

Analysis

In the worst case, Timsort takes comparisons to sort an array of n elements. In the best case, which occurs when the input is already sorted, it runs in linear time, meaning that it is an adaptive sorting algorithm.[3]

It is superior to Quicksort for sorting object references or pointers because these require expensive memory indirection to access data and perform comparisons and Quicksort's cache coherence benefits are greatly reduced.

Formal verification

In 2015, Dutch and German researchers in the EU FP7 ENVISAGE project found a bug in the standard implementation of Timsort.[15] It was fixed in 2015 in Python, Java and Android.

Specifically, the invariants on stacked run sizes ensure a tight upper bound on the maximum size of the required stack. The implementation preallocated a stack sufficient to sort 264 bytes of input, and avoided further overflow checks.

However, the guarantee requires the invariants to apply to every group of three consecutive runs, but the implementation only checked it for the top three.[15] Using the KeY tool for formal verification of Java software, the researchers found that this check is not sufficient, and they were able to find run lengths (and inputs which generated those run lengths) which would result in the invariants being violated deeper in the stack after the top of the stack was merged.[16]

As a consequence, for certain inputs the allocated size is not sufficient to hold all unmerged runs. In Java, this generates for those inputs an array-out-of-bound exception. The smallest input that triggers this exception in Java and Android v7 is of size 67108864 (226). (Older Android versions already triggered this exception for certain inputs of size 65536 (216))

The Java implementation was corrected by increasing the size of the preallocated stack based on an updated worst-case analysis. The article also showed by formal methods how to establish the intended invariant by checking that the four topmost runs in the stack satisfy the two rules above. This approach was initially adopted by Python[17] until it switched to Powersort in 2022 with the release of Python 3.11.[5]

References

  1. ^ Peters, Tim (20 July 2002). "[Python-Dev] Sorting". Python Developers Mailinglist. Retrieved 24 February 2011. [Timsort] also has good aspects: It's stable (items that compare equal retain their relative order, so, e.g., if you sort first on zip code, and a second time on name, people with the same name still appear in order of increasing zip code; this is important in apps that, e.g., refine the results of queries based on user input). ... It has no bad cases (O(N log N) is worst case; N−1 compares is best).
  2. ^ Auger, Nicolas; Jugé, Vincent; Nicaud, Cyril; Pivoteau, Carine (2018). [DROPS]. doi:10.4230/LIPIcs.ESA.2018.4. ISBN 9783959770811. S2CID 44091254. Retrieved 1 September 2018. TimSort is an intriguing sorting algorithm designed in 2002 for Python, whose worst-case complexity was announced, but not proved until our recent preprint.
  3. ^ a b Chandramouli, Badrish; Goldstein, Jonathan (2014). Patience is a Virtue: Revisiting Merge and Sort on Modern Processors. SIGMOD/PODS.
  4. ^ Munro, J. Ian; Wild, Sebastian (2018). Nearly-Optimal Mergesorts: Fast, Practical Sorting Methods That Optimally Adapt to Existing Runs. Schloss Dagstuhl – Leibniz-Zentrum für Informatik. pp. 63:1–63:16. arXiv:1805.04154. doi:10.4230/lipics.esa.2018.63. S2CID 21678081.
  5. ^ a b James, Mike. "Python Now Uses Powersort". I Programmer. Retrieved 21 June 2024.
  6. ^ "[#JDK-6804124] (coll) Replace "modified mergesort" in java.util.Arrays.sort with timsort". JDK Bug System. Retrieved 11 June 2014.
  7. ^ "Class: java.util.TimSort<T>". Android Gingerbread Documentation. Archived from the original on 16 July 2015. Retrieved 24 February 2011.
  8. ^ "liboctave/util/oct-sort.cc". Mercurial repository of Octave source code. Lines 23-25 of the initial comment block. Retrieved 18 February 2013. Code stolen in large part from Python's, listobject.c, which itself had no license header. However, thanks to Tim Peters for the parts of the code I ripped-off.
  9. ^ "Getting things sorted in V8 · V8". v8.dev. Retrieved 21 December 2018.
  10. ^ "Is sort() stable in Swift 5?". Swift Forums. 4 July 2019. Retrieved 4 July 2019.
  11. ^ McIlroy, Peter (January 1993). "Optimistic Sorting and Information Theoretic Complexity". Proceedings of the Fourth Annual ACM-SIAM Symposium on Discrete Algorithms. pp. 467–474. ISBN 0-89871-313-7.
  12. ^ a b c "listsort.txt". Python source code. 18 May 2022. Archived from the original on 28 January 2016.
  13. ^ MacIver, David R. (11 January 2010). "Understanding timsort, Part 1: Adaptive Mergesort". Retrieved 5 December 2015.
  14. ^ Peters, Tim. "listsort.txt". CPython git repository. Retrieved 5 December 2019.
  15. ^ a b de Gouw, Stijn; Rot, Jurriaan; de Boer, Frank S.; Bubel, Richard; Hähnle, Reiner (July 2015). "OpenJDK's Java.utils.Collection.sort() is Broken: The Good, the Bad and the Worst Case". Computer Aided Verification (PDF). Lecture Notes in Computer Science. Vol. 9206. pp. 273–289. doi:10.1007/978-3-319-21690-4_16. ISBN 978-3-319-21689-8.
  16. ^ de Gouw, Stijn (24 February 2015). "Proving that Android's, Java's and Python's sorting algorithm is broken (and showing how to fix it)". Retrieved 6 May 2017.
  17. ^ Python Issue Tracker – Issue 23515: Bad logic in timsort's merge_collapse

Further reading

Read other articles:

Культура татар-мишарей — культура субэтноса татар, наряду с казанскими и сибирскими татарами образует культуру татарского народа. Содержание 1 Расселение мишарей 2 Хозяйство 3 Поселения и жилища 4 Национальная одежда 4.1 Мужская одежда 4.2 Женская одежда 5 Кухня 6 Язык 7 С�...

 

 

ميلبورت الإحداثيات 42°16′03″N 76°50′10″W / 42.2675°N 76.8361°W / 42.2675; -76.8361   [1] تقسيم إداري  البلد الولايات المتحدة[2]  التقسيم الأعلى مقاطعة تشيمونغ  خصائص جغرافية  المساحة 0.901138 كيلومتر مربع0.901403 كيلومتر مربع (1 أبريل 2010)  ارتفاع 219 متر  عدد السكان ...

 

 

  لمعانٍ أخرى، طالع جيه (توضيح). هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يناير 2015) جيهلقطة شاشة لمحرر النصوص جيه بالإصدارة 0.23.0معلومات عامةنوع محرر نصوصنظام التشغيل جميع أنظمة التشغيل التي تدعم لغة جافا�...

PS BPD JatengNama lengkapPersatuan Sepak Bola Bank Pembangunan Daerah Jawa TengahJulukanKepodang / The Blue ForceStadionStadion JatidiriSemarang, Jawa Tengah, Indonesia(Kapasitas: 21.000)PemilikBPD JatengKetua UmumMuhammad IsmailPelatihSungkowo SudiartoLigaDivisi Utama Indonesia1994–95Peringkat 14 PS BPD Jateng singkatan dari (Persatuan Sepak bola Bank Pembangunan Daerah Jawa Tengah) merupakan klub yang didirikan untuk menjadi wakil Jawa tengah di kompetisi Galatama. PS BPD Jateng masih ber...

 

 

Book by Abbé Augustin Barruel Memoirs Illustrating the History of Jacobinism Title Page of the First English Edition of Memoirs Illustrating the History of Jacobinism, 1799.AuthorAbbé Augustin BarruelTranslatorRobert CliffordCountryEnglandLanguageEnglishPublisherHudson & GoodwinPublication date1798–1799 Memoirs Illustrating the History of Jacobinism (French: Mémoires pour servir à l’histoire du Jacobinisme) is a book by Abbé Augustin Barruel, a French Jesuit priest. It was writte...

 

 

Early history of the Albanians Part of a series onAlbanians By country Native Albania Kosovo Croatia Greece Italy Montenegro North Macedonia Serbia Diaspora Australia Bulgaria Denmark Egypt Finland Germany Norway Romania South America Spain Sweden Switzerland Turkey Ukraine United Kingdom United States Culture Architecture Art Cuisine Dance Dress Literature Music Mythology Politics Religion Symbols Traditions Fis Religion Christianity Catholicism Italo-Albanian Church Albanian Greek-Catholic ...

Sayid SyuhadaLingkunganNegara Arab SaudiProvinsiProvinsi MadinahKotaMadinahZona waktuUTC+3 (EAT) • Musim panas (DST)UTC+3 (EAT) Sayid Syuhada (Arab: سيد شهدى) adalah sebuah lingkungan di kota suci Madinah di Provinsi Madinah, tepatnya di sebelah barat Arab Saudi.[1] Referensi ^ National Geospatial-Intelligence Agency. GeoNames database entry. (search Diarsipkan 2017-03-18 di Wayback Machine.) Accessed 12 May 2011. lbsLingkungan sekitar Masjid Nabawi, Madina...

 

 

МифологияРитуально-мифологическийкомплекс Система ценностей Сакральное Миф Мономиф Теория основного мифа Ритуал Обряд Праздник Жречество Мифологическое сознание Магическое мышление Низшая мифология Модель мира Цикличность Сотворение мира Мировое яйцо Мифическое �...

 

 

Argentine footballer Federico Mancuello Mancuello at Independiente.Personal informationFull name Federico Andrés MancuelloDate of birth (1989-03-26) 26 March 1989 (age 35)Place of birth Reconquista, ArgentinaHeight 1.75 m (5 ft 9 in)Position(s) Central midfielderTeam informationCurrent team IndependienteNumber 11Youth career2004–2008 IndependienteSenior career*Years Team Apps (Gls)2008–2015 Independiente 152 (20)2011–2012 → Belgrano (loan) 21 (1)2016–2017 Flamen...

United States Navy heavy cruiser For other ships with the same name, see USS Columbus. USS Columbus on 12 July 1948 History United States NameColumbus NamesakeCity of Columbus, Ohio Ordered9 September 1940 Laid down28 June 1943 Launched30 November 1944 Sponsored byMrs. E. G. Meyers Commissioned8 June 1945 Decommissioned31 January 1975 ReclassifiedCG-12, 30 September 1959 Stricken9 August 1976 Identification Callsign: NBZW Hull number: CA-74 MottoAd Frontes Mundi Nickname(s)The Tall Lady Honor...

 

 

Federal constituency of Kelantan, Malaysia Tumpat (P019) Kelantan constituencyFederal constituencyLegislatureDewan RakyatMPMumtaz Md. NawiPNConstituency created1958First contested1959Last contested2022DemographicsPopulation (2020)[1]179,944Electors (2023)[2]150,248Area (km²)[3]180Pop. density (per km²)999.7 Tumpat is a federal constituency in Tumpat District, Kelantan, Malaysia, that has been represented in the Dewan Rakyat since 1959. The federal constituency was cr...

 

 

周處除三害The Pig, The Snake and The Pigeon正式版海報基本资料导演黃精甫监制李烈黃江豐動作指導洪昰顥编剧黃精甫主演阮經天袁富華陳以文王淨李李仁謝瓊煖配乐盧律銘林孝親林思妤保卜摄影王金城剪辑黃精甫林雍益制片商一種態度電影股份有限公司片长134分鐘产地 臺灣语言國語粵語台語上映及发行上映日期 2023年10月6日 (2023-10-06)(台灣) 2023年11月2日 (2023-11-02)(香�...

Argentine footballer Alberto Sainz Personal informationFull name Alberto SainzDate of birth (1937-12-13)13 December 1937Place of birth Buenos Aires, ArgentinaDate of death 16 January 2016(2016-01-16) (aged 78)[1]Height 1.62 m (5 ft 4 in)Position(s) DefenderSenior career*Years Team Apps (Gls)1958–1961 Argentinos Juniors 1962–1967 River Plate 1968–1969 San Lorenzo Mar de Plata International career Argentina *Club domestic league appearances and goals Alberto Ni...

 

 

Chronologies Données clés 1677 1678 1679  1680  1681 1682 1683Décennies :1650 1660 1670  1680  1690 1700 1710Siècles :XVe XVIe  XVIIe  XVIIIe XIXeMillénaires :-Ier Ier  IIe  IIIe Chronologies thématiques Art Architecture, Arts plastiques (Dessin, Gravure, Peinture et Sculpture), (), Littérature (), Musique (Classique) et Théâtre   Ingénierie (), Architecture, () et ()   Politique Droit et ()   Religion (,)  ...

 

 

Chiesa di San Maurizio al Monastero MaggioreStato Italia RegioneLombardia LocalitàMilano IndirizzoCorso Magenta, 13 - 20123 Milano (MI) e Corso Magenta 13 Coordinate45°27′56.38″N 9°10′44.11″E45°27′56.38″N, 9°10′44.11″E Religionecattolica di rito ambrosiano Titolaresan Maurizio Arcidiocesi Milano Consacrazione1518 ArchitettoGian Giacomo Dolcebuono e Giovanni Antonio Amadeo Stile architettonicorinascimentale Sito webwww.museoarcheologicomilano.it/oltre-il-museo/la-chie...

1950 military campaign during the Korean WarThis article may be too long to read and navigate comfortably. Consider splitting content into sub-articles, condensing it, or adding subheadings. Please discuss this issue on the article's talk page. (August 2022)UN offensive into North KoreaPart of the Korean WarDate30 September – 25 November 1950(1 month, 3 weeks and 5 days)LocationNorth KoreaResult United Nations victory Chinese interventionBelligerents  United Nations ...

 

 

For culture before the division, see Culture of Korea. For the culture of the South, see Culture of South Korea. This article's tone or style may not reflect the encyclopedic tone used on Wikipedia. See Wikipedia's guide to writing better articles for suggestions. (September 2019) (Learn how and when to remove this message) At the Pyongyang Embroidery Institute Lapel pins from North Korea Part of a series on theCulture of Korea Society History People Diaspora Language Names of Korea Religion...

 

 

Slovak tennis player Magdaléna RybárikováRybáriková at the 2018 Birmingham ClassicCountry (sports) SlovakiaResidenceBratislava, SlovakiaBorn (1988-10-04) 4 October 1988 (age 35)Piešťany, Czechoslovakia (now Slovakia)Height1.80 m (5 ft 11 in)Turned pro2005[1]Retired2020PlaysRight-handed (two-handed backhand)CoachPeter HuberPrize moneyUS$5,252,043SinglesCareer record426–307Career titles4Highest rankingNo. 17 (5 March 2018)Grand ...

National academy in New Delhi Not to be confused with Indian Academy of Sciences or National Academy of Sciences, India. Indian National Science AcademyEstablished7 January 1935; 89 years ago (1935-01-07)FounderLewis Leigh FermorLocationNew Delhi, IndiaCoordinates28°37′43.8″N 77°14′26.7″E / 28.628833°N 77.240750°E / 28.628833; 77.240750PresidentAshutosh SharmaWebsitewww.insaindia.res.in The Indian National Science Academy (INSA) is a natio...

 

 

مقالات ممثلين غير موجودة الصفحة الرئيسةنقاش المشروعدليل التقديراتالمحتوى المميزالمقالات المقترحة سجل المهام هذه قائمة من ويكي بيانات تعرض اقتراحاتٍ بمقالاتٍ لممثلين وممثلات في موسوعات أخرى ولكن ليس في ويكيبيديا العربيّة. القائمة آخر تحديث: 2021-03-06 13:47 ويتم التحديث يوميً...