Perfect hash function

A perfect hash function for the four names shown
A minimal perfect hash function for the four names shown

In computer science, a perfect hash function h for a set S is a hash function that maps distinct elements in S to a set of m integers, with no collisions. In mathematical terms, it is an injective function.

Perfect hash functions may be used to implement a lookup table with constant worst-case access time. A perfect hash function can, as any hash function, be used to implement hash tables, with the advantage that no collision resolution has to be implemented. In addition, if the keys are not in the data and if it is known that queried keys will be valid, then the keys do not need to be stored in the lookup table, saving space.

Disadvantages of perfect hash functions are that S needs to be known for the construction of the perfect hash function. Non-dynamic perfect hash functions need to be re-constructed if S changes. For frequently changing S dynamic perfect hash functions may be used at the cost of additional space.[1] The space requirement to store the perfect hash function is in O(n) where n is the number of keys in the structure.

The important performance parameters for perfect hash functions are the evaluation time, which should be constant, the construction time, and the representation size.

Application

A perfect hash function with values in a limited range can be used for efficient lookup operations, by placing keys from S (or other associated values) in a lookup table indexed by the output of the function. One can then test whether a key is present in S, or look up a value associated with that key, by looking for it at its cell of the table. Each such lookup takes constant time in the worst case.[2] With perfect hashing, the associated data can be read or written with a single access to the table.[3]

Performance of perfect hash functions

The important performance parameters for perfect hashing are the representation size, the evaluation time, the construction time, and additionally the range requirement (average number of buckets per key in the hash table).[4] The evaluation time can be as fast as O(1), which is optimal.[2][4] The construction time needs to be at least O(n), because each element in S needs to be considered, and S contains n elements. This lower bound can be achieved in practice.[4]

The lower bound for the representation size depends on m and n. Let m = (1+ε) n and h a perfect hash function. A good approximation for the lower bound is Bits per element. For minimal perfect hashing, ε = 0, the lower bound is log e ≈ 1.44 bits per element.[4]

Construction

A perfect hash function for a specific set S that can be evaluated in constant time, and with values in a small range, can be found by a randomized algorithm in a number of operations that is proportional to the size of S. The original construction of Fredman, Komlós & Szemerédi (1984) uses a two-level scheme to map a set S of n elements to a range of O(n) indices, and then map each index to a range of hash values. The first level of their construction chooses a large prime p (larger than the size of the universe from which S is drawn), and a parameter k, and maps each element x of S to the index

If k is chosen randomly, this step is likely to have collisions, but the number of elements ni that are simultaneously mapped to the same index i is likely to be small. The second level of their construction assigns disjoint ranges of O(ni2) integers to each index i. It uses a second set of linear modular functions, one for each index i, to map each member x of S into the range associated with g(x).[2]

As Fredman, Komlós & Szemerédi (1984) show, there exists a choice of the parameter k such that the sum of the lengths of the ranges for the n different values of g(x) is O(n). Additionally, for each value of g(x), there exists a linear modular function that maps the corresponding subset of S into the range associated with that value. Both k, and the second-level functions for each value of g(x), can be found in polynomial time by choosing values randomly until finding one that works.[2]

The hash function itself requires storage space O(n) to store k, p, and all of the second-level linear modular functions. Computing the hash value of a given key x may be performed in constant time by computing g(x), looking up the second-level function associated with g(x), and applying this function to x. A modified version of this two-level scheme with a larger number of values at the top level can be used to construct a perfect hash function that maps S into a smaller range of length n + o(n).[2]

A more recent method for constructing a perfect hash function is described by Belazzougui, Botelho & Dietzfelbinger (2009) as "hash, displace, and compress". Here a first-level hash function g is also used to map elements onto a range of r integers. An element xS is stored in the Bucket Bg(x).[4]

Then, in descending order of size, each bucket's elements are hashed by a hash function of a sequence of independent fully random hash functions 1, Φ2, Φ3, ...), starting with Φ1. If the hash function does not produce any collisions for the bucket, and the resulting values are not yet occupied by other elements from other buckets, the function is chosen for that bucket. If not, the next hash function in the sequence is tested.[4]

To evaluate the perfect hash function h(x) one only has to save the mapping σ of the bucket index g(x) onto the correct hash function in the sequence, resulting in h(x) = Φσ(g(x)).[4]

Finally, to reduce the representation size, the (σ(i))0 ≤ i < r are compressed into a form that still allows the evaluation in O(1).[4]

This approach needs linear time in n for construction, and constant evaluation time. The representation size is in O(n), and depends on the achieved range. For example, with m = 1.23n Belazzougui, Botelho & Dietzfelbinger (2009) achieved a representation size between 3.03 bits/key and 1.40 bits/key for their given example set of 10 million entries, with lower values needing a higher computation time. The space lower bound in this scenario is 0.88 bits/key.[4]

Pseudocode

algorithm hash, displace, and compress is
(1) Split S into buckets Bi := g−1({i})∩S,0 ≤ i < r
(2) Sort buckets Bi in falling order according to size |Bi|
(3) Initialize array T[0...m-1] with 0's
(4) for all i∈[r], in the order from (2), do
(5)     for l1,2,...
(6)         repeat forming Ki{Φl(x)|xBi}
(6)         until |Ki|=|Bi| and Ki∩{j|T[j]=1}=∅
(7)     let σ(i):= the successful l
(8)     for all jKi let T[j]:=1
(9) Transform (σi)0≤i<r into compressed form, retaining O(1) access.

Space lower bounds

The use of O(n) words of information to store the function of Fredman, Komlós & Szemerédi (1984) is near-optimal: any perfect hash function that can be calculated in constant time requires at least a number of bits that is proportional to the size of S.[5]

For minimal perfect hash functions the information theoretic space lower bound is

bits/key.[4]

For perfect hash functions, it is first assumed that the range of h is bounded by n as m = (1+ε) n. With the formula given by Belazzougui, Botelho & Dietzfelbinger (2009) and for a universe whose size |U| = u tends towards infinity, the space lower bounds is

bits/key, minus log(n) bits overall.[4]

Extensions

Memory address identity

A trivial but pervasive example of perfect hashing is implicit in the (virtual) memory address space of a computer. Since each byte of virtual memory is a distinct, unique, directly addressable storage location, the value of the starting address where any object is stored in memory can be considered a de facto perfect hash of that object into the entire memory address range.[6]

Dynamic perfect hashing

Using a perfect hash function is best in situations where there is a frequently queried large set, S, which is seldom updated. This is because any modification of the set S may cause the hash function to no longer be perfect for the modified set. Solutions which update the hash function any time the set is modified are known as dynamic perfect hashing,[1] but these methods are relatively complicated to implement.

Minimal perfect hash function

A minimal perfect hash function is a perfect hash function that maps n keys to n consecutive integers – usually the numbers from 0 to n − 1 or from 1 to n. A more formal way of expressing this is: Let j and k be elements of some finite set S. Then h is a minimal perfect hash function if and only if h(j) = h(k) implies j = k (injectivity) and there exists an integer a such that the range of h is a..a + |S| − 1. It has been proven that a general purpose minimal perfect hash scheme requires at least bits/key.[4] Assuming that is a set of size containing integers in the range , it is known how to efficiently construct an explicit minimal perfect hash function from to that uses space bits and that supports constant evaluation time.[7] In practice, there are minimal perfect hashing schemes that use roughly 1.56 bits/key if given enough time.[8]

k-perfect hashing

A hash function is k-perfect if at most k elements from S are mapped onto the same value in the range. The "hash, displace, and compress" algorithm can be used to construct k-perfect hash functions by allowing up to k collisions. The changes necessary to accomplish this are minimal, and are underlined in the adapted pseudocode below:

(4) for all i∈[r], in the order from (2), do
(5)     for l1,2,...
(6)         repeat forming Ki{Φl(x)|xBi}
(6)         until |Ki|=|Bi| and Ki∩{j|T[j]=k}=∅
(7)     let σ(i):= the successful l
(8)     for all jKi set T[j]←T[j]+1

Order preservation

A minimal perfect hash function F is order preserving if keys are given in some order a1, a2, ..., an and for any keys aj and ak, j < k implies F(aj) < F(ak).[9] In this case, the function value is just the position of each key in the sorted ordering of all of the keys. A simple implementation of order-preserving minimal perfect hash functions with constant access time is to use an (ordinary) perfect hash function to store a lookup table of the positions of each key. This solution uses bits, which is optimal in the setting where the comparison function for the keys may be arbitrary.[10] However, if the keys a1, a2, ..., an are integers drawn from a universe , then it is possible to construct an order-preserving hash function using only bits of space.[11] Moreover, this bound is known to be optimal.[12]

While well-dimensioned hash tables have amortized average O(1) time (amortized average constant time) for lookups, insertions, and deletion, most hash table algorithms suffer from possible worst-case times that take much longer. A worst-case O(1) time (constant time even in the worst case) would be better for many applications (including network router and memory caches).[13]: 41 

Few hash table algorithms support worst-case O(1) lookup time (constant lookup time even in the worst case). The few that do include: perfect hashing; dynamic perfect hashing; cuckoo hashing; hopscotch hashing; and extendible hashing.[13]: 42–69 

A simple alternative to perfect hashing, which also allows dynamic updates, is cuckoo hashing. This scheme maps keys to two or more locations within a range (unlike perfect hashing which maps each key to a single location) but does so in such a way that the keys can be assigned one-to-one to locations to which they have been mapped. Lookups with this scheme are slower, because multiple locations must be checked, but nevertheless take constant worst-case time.[14]

References

  1. ^ a b Dietzfelbinger, Martin; Karlin, Anna; Mehlhorn, Kurt; Meyer auf der Heide, Friedhelm; Rohnert, Hans; Tarjan, Robert E. (1994), "Dynamic perfect hashing: upper and lower bounds", SIAM Journal on Computing, 23 (4): 738–761, doi:10.1137/S0097539791194094, MR 1283572.
  2. ^ a b c d e Fredman, Michael L.; Komlós, János; Szemerédi, Endre (1984), "Storing a Sparse Table with O(1) Worst Case Access Time", Journal of the ACM, 31 (3): 538, doi:10.1145/828.1884, MR 0819156, S2CID 5399743
  3. ^ Lu, Yi; Prabhakar, Balaji; Bonomi, Flavio (2006), "Perfect Hashing for Network Applications", 2006 IEEE International Symposium on Information Theory, pp. 2774–2778, doi:10.1109/ISIT.2006.261567, ISBN 1-4244-0505-X, S2CID 1494710
  4. ^ a b c d e f g h i j k l Belazzougui, Djamal; Botelho, Fabiano C.; Dietzfelbinger, Martin (2009), "Hash, displace, and compress" (PDF), Algorithms - ESA 2009 (PDF), Lecture Notes in Computer Science, vol. 5757, Berlin: Springer, pp. 682–693, CiteSeerX 10.1.1.568.130, doi:10.1007/978-3-642-04128-0_61, ISBN 978-3-642-04127-3, MR 2557794.
  5. ^ Fredman, Michael L.; Komlós, János (1984), "On the size of separating systems and families of perfect hash functions", SIAM Journal on Algebraic and Discrete Methods, 5 (1): 61–68, doi:10.1137/0605009, MR 0731857.
  6. ^ Witold Litwin; Tadeusz Morzy; Gottfried Vossen (19 August 1998). Advances in Databases and Information Systems. Springer Science+Business Media. p. 254. ISBN 9783540649243.
  7. ^ Hagerup, Torben; Tholey, Torsten (2001), "Efficient Minimal Perfect Hashing in Nearly Minimal Space", STACS 2001, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 317–326, doi:10.1007/3-540-44693-1_28, ISBN 978-3-540-41695-1, retrieved 2023-11-12
  8. ^ Esposito, Emmanuel; Mueller Graf, Thomas; Vigna, Sebastiano (2020), "RecSplit: Minimal Perfect Hashing via Recursive Splitting", 2020 Proceedings of the Symposium on Algorithm Engineering and Experiments (ALENEX), Proceedings, pp. 175–185, arXiv:1910.06416, doi:10.1137/1.9781611976007.14.
  9. ^ Jenkins, Bob (14 April 2009), "order-preserving minimal perfect hashing", in Black, Paul E. (ed.), Dictionary of Algorithms and Data Structures, U.S. National Institute of Standards and Technology, retrieved 2013-03-05
  10. ^ Fox, Edward A.; Chen, Qi Fan; Daoud, Amjad M.; Heath, Lenwood S. (July 1991), "Order-preserving minimal perfect hash functions and information retrieval" (PDF), ACM Transactions on Information Systems, 9 (3), New York, NY, USA: ACM: 281–308, doi:10.1145/125187.125200, S2CID 53239140.
  11. ^ Belazzougui, Djamal; Boldi, Paolo; Pagh, Rasmus; Vigna, Sebastiano (November 2008), "Theory and practice of monotone minimal perfect hashing", Journal of Experimental Algorithmics, 16, Art. no. 3.2, 26pp, doi:10.1145/1963190.2025378, S2CID 2367401.
  12. ^ Assadi, Sepehr; Farach-Colton, Martín; Kuszmaul, William (January 2023), "Tight Bounds for Monotone Minimal Perfect Hashing", Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Philadelphia, PA: Society for Industrial and Applied Mathematics, pp. 456–476, arXiv:2207.10556, doi:10.1137/1.9781611977554.ch20, ISBN 978-1-61197-755-4, retrieved 2023-04-27
  13. ^ a b Timothy A. Davis. "Chapter 5 Hashing": subsection "Hash Tables with Worst-Case O(1) Access"
  14. ^ Pagh, Rasmus; Rodler, Flemming Friche (2004), "Cuckoo hashing", Journal of Algorithms, 51 (2): 122–144, doi:10.1016/j.jalgor.2003.12.002, MR 2050140.

Further reading

  • gperf is an open source C and C++ perfect hash generator (very fast, but only works for small sets)
  • Minimal Perfect Hashing (bob algorithm) by Bob Jenkins
  • cmph: C Minimal Perfect Hashing Library, open source implementations for many (minimal) perfect hashes (works for big sets)
  • Sux4J: open source monotone minimal perfect hashing in Java
  • MPHSharp: perfect hashing methods in C#
  • BBHash: minimal perfect hash function in header-only C++
  • Perfect::Hash, perfect hash generator in Perl that makes C code. Has a "prior art" section worth looking at.

Read other articles:

Macrinus (bahasa Latin: Marcus Opellius Severus Macrinus Augustus;[1] 165 – June 218), adalah Kaisar Romawi yang berkuasa dari tahun 217 sampai dengan tahun 218. Macrinus berasal dari Mauretanian,[2] kemungkinan merupakan percampuran dengan Punic. Macrinus digulingkan dan dieksekusi pada tahun 218. Sebagai salah satu anggota keluarga dari kelas equestrian, dia menjadi kaisar pertama yang tidak berasal dari kalangan senat.[3] Lahir di Caesarea, Aljazair, provinsi...

 

 

Bandar Udara SilampariSilampari AirportIATA: LLJICAO: WIPBInformasiJenisSipilMelayaniLubuklinggau, Sumatera SelatanLokasiLubuklinggau, Sumatera SelatanKetinggian dpl125 mdplKoordinat03°16′48″S 102°55′02″E / 3.28000°S 102.91722°E / -3.28000; 102.91722Koordinat: 03°16′48″S 102°55′02″E / 3.28000°S 102.91722°E / -3.28000; 102.91722Situs webllg.informasibandara.orgPetaBandar Udara SilampariLokasi di IndonesiaLandasan...

 

 

Pour les articles homonymes, voir Monarchie espagnole (homonymie). Empire espagnol(es) Imperio español 1492–1975Drapeau de l'Espagne de 1506 à 1701 (en haut) et de 1785 à 1931 (en bas). Petites armoiries de l'Espagne de 1700 à 1931. Devise en latin : Plus ultra (« Plus loin ») Hymne Marcha RealHimno de Riego (pendant les périodes républicaines) Carte représentant les différentes possessions de l'Empire espagnol tout au long de son existence.Informati...

Former German state (1495-1806) Duchy of WürttembergHerzogtum Württemberg (German)1495–1803 Flag Coat of arms Duchy of Württemberg within the Holy Roman Empire (1618)CapitalStuttgartCommon languagesSwabian GermanReligion Roman CatholicLutheranDemonym(s)WürttembergerGovernmentDuchyDuke • 1495–1496 Eberhard I (first)• 1797–1803 Frederick II (last) Historical eraEarly modernNapoleonic• Diet of Worms 21 July 1495• Poor Conrad May 1514• R...

 

 

Academic journalNaval War College ReviewDisciplinePublic policyLanguageEnglishPublication detailsFormer name(s)Information Service for OfficersHistory1948-presentPublisherNaval War College (United States)FrequencyQuarterlyOpen accessYesLicensePublic domainStandard abbreviationsISO 4 (alt) · Bluebook (alt1 · alt2)NLM (alt) · MathSciNet (alt )ISO 4Nav. War Coll. Rev.IndexingCODEN (alt · alt2) · JSTOR (alt) ·...

 

 

这是马来族人名,“阿都沙末”是父名,不是姓氏,提及此人时应以其自身的名“卡立”为主。 卡立·阿都沙末Khalid bin Abdul Samad2019年8月15日,卡立阿都沙末与美国驻马大使雷荷花(英语:Kamala Shirin Lakhdhir)会面 马来西亚联邦直辖区部长任期2018年5月21日—2020年2月24日君主最高元首端姑莫哈末五世最高元首苏丹阿都拉首相马哈迪·莫哈末副职沙鲁丁前任东姑安南继任安努...

Australian judge The Right HonourableSir Frank KittoAC, KBE, QCHigh Court in 1952, Kitto far right, back rowJustice of the High Court of AustraliaIn office10 May 1950 – 1 August 1970Nominated byRobert MenziesPreceded bySir George RichSucceeded bySir Harry Gibbs Personal detailsBorn30 July 1903Melbourne, Victoria, AustraliaDied15 February 1994Armidale, New South Wales, Australia Sir Frank Walters Kitto, AC, KBE, QC (30 July 1903 – 15 February 19...

 

 

2014 British spy thriller television series For the Ugandan satire television series, see The Honourables. The Honourable WomanGenrePolitical thrillerSpy thrillerWritten byHugo BlickDirected byHugo BlickStarringMaggie GyllenhaalPhilip ArdittiLubna AzabalAndrew BuchanEve BestLindsay DuncanJanet McTeerTobias MenziesIgal NaorGenevieve O'ReillyKatherine ParkinsonStephen ReaComposerMartin PhippsCountry of originUnited KingdomOriginal languageEnglishNo. of episodes8 (list of episodes)ProductionExec...

 

 

Human settlement in EnglandAldrethAerial view of AldrethAldrethLocation within CambridgeshireOS grid referenceTL446735• London62 mi (100 km) SCivil parishHaddenhamDistrictEast CambridgeshireShire countyCambridgeshireRegionEastCountryEnglandSovereign stateUnited KingdomPost townELYPostcode districtCB6Dialling code01353PoliceCambridgeshireFireCambridgeshireAmbulanceEast of England UK ParliamentSouth East CambridgeshireWebsiteECDC List of...

1934 film by Charles Reisner The Show-OffLobby cardDirected byCharles ReisnerWritten byHerman J. MankiewiczBased onThe Show-Offby George KellyProduced byLucien HubbardStarringSpencer Tracy Madge Evans Henry WadsworthCinematographyJames Wong HoweEdited byWilliam S. GrayProductioncompanyMetro-Goldwyn-MayerDistributed byMetro-Goldwyn-MayerRelease date March 9, 1934 (1934-03-09) Running time77 minutesCountryUnited StatesLanguageEnglishBudget$162,000[1][2]Box office$...

 

 

For the 1961 British variety television programme, see The Jo Stafford Show (1961 TV series). American TV series or program The Jo Stafford ShowJo Stafford and husband/conductor Paul Weston on The Jo Stafford Show.GenreVarietyStarringJo StaffordCountry of originUnited StatesOriginal languageEnglishNo. of seasons1ProductionCamera setupMulti-cameraRunning time15 minutesOriginal releaseNetworkCBSRelease1954 (1954) –1955 (1955)Related Douglas Edwards with the News (7:30 pm E...

 

 

Association football club in England Football clubKempston RoversFull nameKempston Rovers Football ClubNickname(s)The Walnut BoysFounded1884GroundHillgrounds Leisure, KempstonCapacity2,000 (152 seated)[1]ChairmanAndy KirbyManagerChris NunnLeagueSpartan South Midlands League Premier Division2023–24Southern League Division One Central, 19th of 19 (relegated) Home colours Away colours Kempston Rovers Football Club is a football club based in Kempston, Bedfordshire, England. Affiliated ...

42°20′00″N 83°03′00″W / 42.3333°N 83.05°W / 42.3333; -83.05 الكأس الذهبي للكونكاكاف 2011Copa de Oro de la CONCACAF 2011 (بالإسبانية)تفاصيل المسابقةالبلد المضيف الولايات المتحدةالتواريخ5–25 يونيو 2011الفرق12 (من 1 اتحاد كونفدرالي)الأماكن13 (في 13 مدينة مضيفة)المراكز النهائيةالبطل المكسيك ...

 

 

Ethnic group of Central Europe This article is about the Central European ethnic group. For the genus of ungulates, see Goral. Gorale redirects here. For Polish place names, see Górale (disambiguation). A Goral with bagpipes from the region of Podhale in Poland The Gorals (Polish: Górale; Goral ethnolect: Górole; Slovak: Gorali; Cieszyn Silesian: Gorole), also known as the Highlanders (in Poland as the Polish Highlanders, a subethnic group of the Polish nation) and historically also as Vla...

 

 

LGBT rights in New York StateNew York (US)StatusLegal since 1980(New York v. Onofre)Legislative repeal in 2000Gender identitySex reassignment surgery not a requirement for changing birth certificatesDiscrimination protectionsSexual orientation and gender identity or expression protections (see below)Family rightsRecognition of relationshipsSame-sex marriage since 2011AdoptionYes The U.S. state of New York has generally been seen as socially liberal in regard to lesbian, gay, bisexual, and tr...

Disruption to science, space and technology projects globally This article has multiple issues. Please help improve it or discuss these issues on the talk page. (Learn how and when to remove these template messages) This article needs to be updated. Please help update this article to reflect recent events or newly available information. (June 2021) The examples and perspective in this article may not represent a worldwide view of the subject. You may improve this article, discuss the issue on...

 

 

American steamboat Mountain Gem at Eureka Bar, on the Snake River, circa 1906. History NameMountain Gem In service1904 Out of service1912 IdentificationU.S. #201045 FateAbandoned 1924 NotesMachinery to Grahamona in 1912. General characteristics TypeRiverine all-purpose Tonnage469 GT; 282 NT Length150 ft (45.72 m) Beam26.5 ft (8.08 m) Depth5 ft (1.52 m) depth of hold Installed powertwin steam engines, horizontally mounted, each with bore of 13 in (33.0 c...

 

 

Cristian NúñezNazionalità Paraguay Altezza166 cm Peso69 kg Calcio RuoloCentrocampista Squadra Banfield CarrieraGiovanili 20??-2013 Los Andes2013-2016 Juventud Loma Pytá2016-2018 Vélez Sarsfield Squadre di club1 2018-2019 Vélez Sarsfield2 (0)2020 Lanús0 (0)2021→  Platense2 (0)2022→  Sol de América39 (0)2023 Godoy Cruz20 (0)2024 Tacuary5 (0)2024- Banfield5 (0) Nazionale 2020 Paraguay U-232 (0) 1 I due numeri indicano le presenze...

Mục từ này liên quan đến chủ đề giáo dục giới tính và tình dục. Thông tin ở đây có thể không phù hợp với một số đối tượng độc giả hoặc khi truy cập ở những nơi công cộng. Wikipedia không chịu trách nhiệm về những nội dung có thể không phù hợp cho một số người xem, xem chi tiết tại Wikipedia:Phủ nhận về nội dung. Kama Sutra Bế tinh được hiểu là việc giữ tránh xuất tinh sớm ...

 

 

American politician from North Carolina Frances JacksonMember of the North Carolina House of Representativesfrom the 45th districtIncumbentAssumed office January 1, 2023Preceded byJohn Szoka Personal detailsBornFrances VinellPolitical partyDemocraticResidenceFayetteville, North CarolinaEducationNorth Carolina A&T State University (BS) Fayetteville State University (MA) Walden University (PhD)OccupationEducatorWebsiteOfficial website Frances Vinell Jackson is a Democratic m...