Succinct data structure

In computer science, a succinct data structure is a data structure which uses an amount of space that is "close" to the information-theoretic lower bound, but (unlike other compressed representations) still allows for efficient query operations. The concept was originally introduced by Jacobson[1] to encode bit vectors, (unlabeled) trees, and planar graphs. Unlike general lossless data compression algorithms, succinct data structures retain the ability to use them in-place, without decompressing them first. A related notion is that of a compressed data structure, insofar as the size of the stored or encoded data similarly depends upon the specific content of the data itself.

Suppose that is the information-theoretical optimal number of bits needed to store some data. A representation of this data is called:

  • implicit if it takes bits of space,
  • succinct if it takes bits of space, and
  • compact if it takes bits of space.

For example, a data structure that uses bits of storage is compact, bits is succinct, bits is also succinct, and bits is implicit.

Implicit structures are thus usually reduced to storing information using some permutation of the input data; the most well-known example of this is the heap.

Succinct indexable dictionaries

Succinct indexable dictionaries, also called rank/select dictionaries, form the basis of a number of succinct representation techniques, including binary trees, -ary trees and multisets,[2] as well as suffix trees and arrays.[3] The basic problem is to store a subset of a universe , usually represented as a bit array where iff An indexable dictionary supports the usual methods on dictionaries (queries, and insertions/deletions in the dynamic case) as well as the following operations:

for .

In other words, returns the number of elements equal to up to position while returns the position of the -th occurrence of .

There is a simple representation[4] which uses bits of storage space (the original bit array and an auxiliary structure) and supports rank and select in constant time. It uses an idea similar to that for range-minimum queries; there are a constant number of recursions before stopping at a subproblem of a limited size. The bit array is partitioned into large blocks of size bits and small blocks of size bits. For each large block, the rank of its first bit is stored in a separate table ; each such entry takes bits for a total of bits of storage. Within a large block, another directory stores the rank of each of the small blocks it contains. The difference here is that it only needs bits for each entry, since only the differences from the rank of the first bit in the containing large block need to be stored. Thus, this table takes a total of bits. A lookup table can then be used that stores the answer to every possible rank query on a bit string of length for ; this requires bits of storage space. Thus, since each of these auxiliary tables take space, this data structure supports rank queries in time and bits of space.

To answer a query for in constant time, a constant time algorithm computes:

In practice, the lookup table can be replaced by bitwise operations and smaller tables that can be used to find the number of bits set in the small blocks. This is often beneficial, since succinct data structures find their uses in large data sets, in which case cache misses become much more frequent and the chances of the lookup table being evicted from closer CPU caches becomes higher.[5] Select queries can be easily supported by doing a binary search on the same auxiliary structure used for rank; however, this takes time in the worst case. A more complicated structure using bits of additional storage can be used to support select in constant time.[6] In practice, many of these solutions have hidden constants in the notation which dominate before any asymptotic advantage becomes apparent; implementations using broadword operations and word-aligned blocks often perform better in practice.[7]

Entropy-compressed solutions

The space approach can be improved by noting that there are distinct -subsets of (or binary strings of length with exactly 1’s), and thus is an information theoretic lower bound on the number of bits needed to store . There is a succinct (static) dictionary which attains this bound, namely using space.[8] This structure can be extended to support rank and select queries and takes space.[2] Correct rank queries in this structure are however limited to elements contained in the set, analogous to how minimal perfect hashing functions work. This bound can be reduced to a space/time tradeoff by reducing the storage space of the dictionary to with queries taking time.[9]

It is also possible to construct a indexible dictionary supporting rank (but not select) that uses fewer than bits. Such a dictionary is called a monotone minimal perfect hash function, and can be implemented using as few as bits.[10][11]

Succinct hash tables

A succinct hash table, also known as a succinct unordered dictionary, is a data structure that stores keys from a universe using space bits, and while supporting membership queries in constant expected time. If a succinct hash table also supports insertions and deletions in constant expected time, then it is referred to as dynamic, and otherwise it is referred to as static.

The first dynamic succinct hash table was due to Raman and Rao in 2003.[12] In the case where , their solution uses space bits. Subsequently, it was shown that this space bound could be improved to bits for any constant number of logarithms[13] and a little after that this bound was also optimal.[14][15] The latter solution supports all operations in worst-case constant time with high probability.

The first static succinct hash table was due to Pagh in 1999.[16][17] In the case where , their solution uses space bits, and supports worst-case constant-time queries. This bound was subsequently improved to bits,[18] and then to bits.[19] Whereas the first two solutions[17][18] support worst-case constant-time queries, the final one supports constant expected-time queries.[19] The final solution also requires access to a lookup table of size , but this lookup table is independent of the set of elements being stored.[19]

Other examples

A string with an arbitrary length (Pascal string) takes Z + log(Z) space, and is thus succinct. If there is a maximum length – which is the case in practice, since 232 = 4 GiB of data is a very long string, and 264 = 16 EiB of data is larger than any string in practice – then a string with a length is also implicit, taking Z + k space, where k is the number of data to represent the maximum length (e.g., 64 bits).

When a sequence of variable-length items (such as strings) needs to be encoded, there are various possibilities. A direct approach is to store a length and an item in each record – these can then be placed one after another. This allows efficient next, but not finding the kth item. An alternative is to place the items in order with a delimiter (e.g., null-terminated string). This uses a delimiter instead of a length, and is substantially slower, since the entire sequence must be scanned for delimiters. Both of these are space-efficient. An alternative approach is out-of-band separation: the items can simply be placed one after another, with no delimiters. Item bounds can then be stored as a sequence of length, or better, offsets into this sequence. Alternatively, a separate binary string consisting of 1s in the positions where an item begins, and 0s everywhere else is encoded along with it. Given this string, the function can quickly determine where each item begins, given its index.[20] This is compact but not succinct, as it takes 2Z space, which is O(Z).

Another example is the representation of a binary tree: an arbitrary binary tree on nodes can be represented in bits while supporting a variety of operations on any node, which includes finding its parent, its left and right child, and returning the size of its subtree, each in constant time. The number of different binary trees on nodes is . For large , this is about ; thus we need at least about bits to encode it. A succinct binary tree therefore would occupy only bits per node.

See also

References

  1. ^ Jacobson, G. J (1988). Succinct static data structures (Ph.D. thesis). Pittsburgh, PA: Carnegie Mellon University.
  2. ^ a b Raman, R.; V. Raman; S. S Rao (2002). "Succinct indexable dictionaries with applications to encoding k-ary trees and multisets". Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms. pp. 233–242. arXiv:0705.0552. CiteSeerX 10.1.1.246.3123. doi:10.1145/1290672.1290680. ISBN 0-89871-513-X.
  3. ^ Sadakane, K.; R. Grossi (2006). "Squeezing succinct data structures into entropy bounds" (PDF). Proceedings of the seventeenth annual ACM-SIAM symposium on Discrete algorithm. pp. 1230–1239. ISBN 0-89871-605-5. Archived from the original (PDF) on 2011-09-29.
  4. ^ Jacobson, G. (1 November 1989). Space-efficient static trees and graphs (PDF). 30th IEEE Symposium on Foundations of Computer Science. doi:10.1109/SFCS.1989.63533. Archived from the original (PDF) on 2016-03-12.
  5. ^ González, R.; S. Grabowski; V. Mäkinen; G. Navarro (2005). "Practical implementation of rank and select queries" (PDF). Poster Proceedings Volume of 4th Workshop on Efficient and Experimental Algorithms (WEA). pp. 27–38.
  6. ^ Clark, David (1996). Compact pat trees (PDF) (Ph.D. thesis). University of Waterloo.
  7. ^ Vigna, S. (2008). "Broadword Implementation of Rank/Select Queries" (PDF). Experimental Algorithms. Lecture Notes in Computer Science. Vol. 5038. pp. 154–168. CiteSeerX 10.1.1.649.8950. doi:10.1007/978-3-540-68552-4_12. ISBN 978-3-540-68548-7. S2CID 13963489.
  8. ^ Brodnik, A.; J. I Munro (1999). "Membership in constant time and almost-minimum space" (PDF). SIAM J. Comput. 28 (5): 1627–1640. CiteSeerX 10.1.1.530.9223. doi:10.1137/S0097539795294165.
  9. ^ Pătraşcu, M. (2008). "Succincter" (PDF). Foundations of Computer Science, 2008. FOCS'08. IEEE 49th Annual IEEE Symposium on. pp. 305–313.
  10. ^ Belazzougui, Djamal; Boldi, Paolo; Pagh, Rasmus; Vigna, Sebastiano (2009-01-04). "Monotone Minimal Perfect Hashing: Searching a Sorted Table with O(1) Accesses". Proceedings of the Twentieth Annual ACM-SIAM Symposium on Discrete Algorithms. Philadelphia, PA: Society for Industrial and Applied Mathematics: 785–794. doi:10.1137/1.9781611973068.86. ISBN 978-0-89871-680-1.
  11. ^ Assadi, Sepehr; Farach-Colton, Martín; Kuszmaul, William (January 2023), "Tight Bounds for Monotone Minimal Perfect Hashing", Proceedings of the 2023 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), Philadelphia, PA: Society for Industrial and Applied Mathematics, pp. 456–476, arXiv:2207.10556, doi:10.1137/1.9781611977554.ch20, ISBN 978-1-61197-755-4, retrieved 2023-04-28
  12. ^ Raman, Rajeev; Rao, Satti Srinivasa (2003), "Succinct Dynamic Dictionaries and Trees", Automata, Languages and Programming, Berlin, Heidelberg: Springer Berlin Heidelberg, pp. 357–368, doi:10.1007/3-540-45061-0_30, ISBN 978-3-540-40493-4, retrieved 2023-04-28
  13. ^ Bender, Michael A.; Farach-Colton, Martín; Kuszmaul, John; Kuszmaul, William; Liu, Mingmou (2022-06-09). "On the optimal time/Space tradeoff for hash tables". Proceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing. New York, NY, USA: ACM. pp. 1284–1297. arXiv:2111.00602. doi:10.1145/3519935.3519969. hdl:1721.1/146419. ISBN 9781450392648. S2CID 240354692.
  14. ^ Li, Tianxiao; Liang, Jingxun; Yu, Huacheng; Zhou, Renfei (2023). "Tight Cell-Probe Lower Bounds for Dynamic Succinct Dictionaries". arXiv:2306.02253 [cs.DS].
  15. ^ Nadis, Steve (8 February 2024). "Scientists Find Optimal Balance of Data Storage and Time". Quanta Magazine.
  16. ^ Pagh, Rasmus (1998-01-28). "Low Redundancy in Dictionaries with O(1) Worst Case Lookup Time". BRICS Report Series. 5 (28). doi:10.7146/brics.v5i28.19434. ISSN 1601-5355.
  17. ^ a b Pagh, Rasmus (January 2001). "Low Redundancy in Static Dictionaries with Constant Query Time". SIAM Journal on Computing. 31 (2): 353–363. doi:10.1137/s0097539700369909. ISSN 0097-5397.
  18. ^ a b Patrascu, Mihai (October 2008). "Succincter". 2008 49th Annual IEEE Symposium on Foundations of Computer Science. IEEE. pp. 305–313. doi:10.1109/focs.2008.83. ISBN 978-0-7695-3436-7. S2CID 257721481.
  19. ^ a b c Yu, Huacheng (2020-06-22). "Nearly optimal static Las Vegas succinct dictionary". Proceedings of the 52nd Annual ACM SIGACT Symposium on Theory of Computing. New York, NY, USA: ACM. pp. 1389–1401. arXiv:1911.01348. doi:10.1145/3357713.3384274. ISBN 9781450369794. S2CID 207780523.
  20. ^ Belazzougui, Djamal. "Hash, displace, and compress" (PDF).

Read other articles:

  لمعانٍ أخرى، طالع واشنطن (توضيح). واشنطن     الإحداثيات 40°42′13″N 89°24′24″W / 40.703611111111°N 89.406666666667°W / 40.703611111111; -89.406666666667  تاريخ التأسيس 1834  سبب التسمية جورج واشنطن  تقسيم إداري  البلد الولايات المتحدة[1][2]  التقسيم الأعلى مقاطعة تيزويل...

William Hall William Hall 7º Governador do Tennessee Período 1829 Antecessor(a) Sam Houston Sucessor(a) William Carroll Membro da Câmara de Representantes dos Estados Unidos Período 1831 - 1833 Membro do Senado de Tennessee Período 1827 - 1829[1] Dados pessoais Nascimento 11 de fevereiro de 1775 Condado de Surry, Carolina do Norte Morte 7 de outubro de 1856 (81 anos) Condado de Sumner, Tennessee Nacionalidade Americano Cônjuge Mary Alexander Partido Partido Democrático Profis...

Este artículo o sección necesita referencias que aparezcan en una publicación acreditada.Este aviso fue puesto el 5 de diciembre de 2009. María Aura Información personalNombre de nacimiento María Aura BoullosaNacimiento 25 de septiembre de 1982 (41 años) Ciudad de México, MéxicoNacionalidad mexicanaFamiliaPadres Alejandro Aura Carmen Boullosa Cónyuge Alonso BarreraInformación profesionalOcupación Actriz, actriz de televisión, actriz de cine y actriz de teatro Años activa de...

Bintang MedanNama lengkapBintang MedanJulukanBuaya Sumatra Bintang Sore KejoraBerdiri2010StadionStadion Teladan, Kota Medan, Provinsi Sumatera Utara(Kapasitas: 25.000)Ketua Umum Avian TumengkolPelatih Fabio LopezAsisten Pelatih M. Khaidir Syahril Nasution Hasyim Asy'arieLigaLiga Primer Indonesia2011Peringkat 12 Kostum kandang Kostum tandang Bintang Medan adalah sebuah tim sepak bola Indonesia yang berbasis di Kota Medan. Klub yang didirikan tahun 2010 ini merupakan klub bentukan PSMS Medan ya...

Magical Circle Guru Guru魔法陣グルグル(Mahōjin Guru Guru)GenrePetualangan, Komedi, Fantasi, Percintaan MangaPengarangHiroyuki EtōPenerbitEnixMajalahMonthly Shōnen GanganDemografiShōnenTerbit1992 – 2003Volume16 Seri animeSutradaraNobuaki NakanishiSkenarioChika HojoChinatsu HoujouHideki MitsuiMayumi KoyamaYasuhiro KomatsuzakiStudioNippon AnimationSaluranasliTV AsahiTayang 13 Oktober 1994 – 14 September 1995Episode45 Film animeSutradaraNobuaki NakanishiSkenarioHideki MitsuiStudioN...

NGC 43NGC 43 oleh 2MASS Kredit: Two Micron All-Sky SurveyData pengamatan (J2000 epos)Rasi bintangAndromedaAsensio rekta 00j 10m 24.95dDeklinasi +30° 38′ 14.2″Pergeseran merah-4785 ± 10 km/sJarak65,0 ± 4,6 Mpc (212 ± 15,1 juta tc)Magnitudo semu (V)13,6Ciri-ciriJenisSB0Ukuran semu (V)1,6′ × 1,5'Penamaan lainUGC 120 • PGC 875 • CGCG 499-79 • MCG +05-01-054 • 2MASX J00130073+3054551 • GC 21 • h 9 NGC 43 adalah galaksi lentikular yang ...

Artikel ini bukan mengenai Persibu Ibu. Persibu BulelengNama lengkapPersatuan Sepakbola Indonesia BulelengJulukanPasukan Singa Ambara RajaBerdiri1 Agustus 1955; 68 tahun lalu (1955-08-01)StadionStadion Mayor Metra, Kabupaten Buleleng, Provinsi Bali(Kapasitas: 5.000)PemilikAskab PSSI BulelengLigaLiga 3 Bali Persatuan Sepakbola Indonesia Buleleng disingkat Persibu adalah klub sepak bola Indonesia yang berasal dari Kabupaten Buleleng, Provinsi Bali. Mereka berkompetisi di Liga 3 Bali.[1...

Town in Wisconsin, United StatesLedgeview, WisconsinTownTown Hall and Fire Department SignMotto: Set Your Sights HighLocation in Brown County and the state of WisconsinCoordinates: 44°25′26″N 87°59′10″W / 44.42389°N 87.98611°W / 44.42389; -87.98611Country United StatesState WisconsinCountyBrownArea • Total17.5 sq mi (45.4 km2) • Land17.4 sq mi (45.0 km2) • Water0.2 sq ...

Mythical three-legged crow This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Yatagarasu – news · newspapers · books · scholar · JSTOR (February 2022) (Learn how and when to remove this template message) Not to be confused with Three-legged crow. For the novel series, see Yatagarasu (novel series). Statue of Ya...

George William FeatherstonhaughBorn(1780-04-09)9 April 1780London, EnglandDied28 September 1866(1866-09-28) (aged 86)Le Havre, FranceResting placeTunbridge Wells, EnglandOccupation(s)Farmer, geologist and surveyorKnown forExplorer; railway pioneerSpouse(s)Sarah Duane (1808-11-06 – 1826),Charlotte Williams Carter (m. 1831-01-28)ChildrenBy Sarah: James, Ann d1826, George, Jr.,[1] and Georgianna d1826;By Charlotte: Albany, Georgiannia, and HenryParent(s)George and Dorothy Sim...

Irregular militia in Russia and the Soviet Union Reenactors dressed in the uniforms of the World War II-era People's Militia during the 2015 Moscow Victory Day Parade. The People's Militia (Russian: Народное ополчение, tr. Narodnoe opolcheniye, IPA: [nɐˈrodnəjə ɐpɐlˈtɕenʲɪjə], lit. 'popular regimentation') was the name given to irregular troops formed from the population in the Russian Empire and later the Soviet Union. They fought behind front lines an...

Indigenous Australian convicted of murder in 1959 Rupert Maxwell (Max) Stuart (c. 1932[notes 1] – 21 November 2014[1]) was an Indigenous Australian who was convicted of murder in 1959. His conviction was subject to several appeals to higher courts,[2][3] the Judicial Committee of the Privy Council, and a Royal Commission,[4] all of which upheld the verdict. Newspapers campaigned successfully against the death sentence being imposed. After servin...

Latin Catholic ecclesiastical jurisdiction in Iowa, USA Diocese of Des MoinesDiœcesis DesmoinensisSt. Ambrose CathedralCoat of armsLocationCountry United StatesTerritory23 counties in the Southwest quadrant of IowaEcclesiastical provinceDubuqueCoordinates41°35′19″N 93°37′32″W / 41.58861°N 93.62556°W / 41.58861; -93.62556StatisticsArea12,446 sq mi (32,230 km2)Population- Total- Catholics(as of 2013)837,773103,430 (12.3%)Pa...

Tan Lioe IeLahir(1958-06-01)1 Juni 1958 Denpasar, BaliPekerjaanSastrawanTahun aktif1980 - sekarangSuami/istriIda Ayu Nyoman Suwiti Tan Lioe Ie Hanzi tradisional: 陳六一 Alih aksara Mandarin - Hanyu Pinyin: Chén Liù Yī Min Nan - Romanisasi POJ: Tân Lio̍k-it Nama Indonesia Indonesia: Tan Lioe Ie Tan Lioe Ie (lahir 1 Juni 1958) adalah seorang penyair Indonesia terkenal asal Bali. Ia merupakan penyair pertama Indonesia yang melakukan eksplorasi atas ritual dan mitologi Tionghoa dalam...

Characters in Harry Potter books This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Hogwarts staff – news · newspapers · books · scholar · JSTOR (May 2023) (Learn how and when to remove this template message) The following is a list of Hogwarts staff in the Harry Potter books written by J. K. Rowling. The staff...

Tabletop space opera role-playing game For other types of Star Wars role-playing game, see Star Wars role-playing games. Star WarsThe Roleplaying GameCover based on the original movie poster by Tom Cantrell, 1977.DesignersGreg CostikyanPublishersWest End GamesFantasy Flight GamesPublication1987 1st edition1992 2nd edition1996 2nd edition Revised and Expanded2018 30th Anniversary editionGenresSpace opera[1]SystemsD6 SystemThe core system is based on that of WEG's Ghostbusters RPG. Star...

2001 film by Vanessa Middleton 30 Years to LifeDVD coverDirected byVanessa MiddletonWritten byVanessa MiddletonProduced byAndrew LeBlancFelice SchachterGingi RochelleJulie Ann LucasTim MosleyVanessa MiddletonStarringAllen PayneErika AlexanderMelissa De SousaPaula Jai ParkerTracy MorganCinematographyCliff CharlesEdited byGershon HinksonMusic byTimbalandDistributed byExodus EntertainmentScreen Media FilmsRelease dates January 21, 2001 (2001-01-21) (Sundance) June 7, ...

Noa MikićDatos personalesNacimiento Varaždin, Croacia27 de enero de 2007 (16 años)Nacionalidad(es) CroataAltura 1,75 m (5′ 9″)Carrera deportivaDeporte FútbolClub profesionalDebut deportivo 2023(G. N. K. Dinamo de Zagreb II)Club G. N. K. Dinamo de Zagreb IILiga 3. HNLPosición DefensaDorsal(es) 67Goles en clubes 0Trayectoria G. N. K. Dinamo de Zagreb II (2023-presente)[editar datos en Wikidata] Noa Mikić (Varaždin, Croacia, 27 de enero de 2007) es un futbolista croata ...

Paghimo ni bot Lsjbot. Angraecum madagascariense Siyentipikinhong Pagklasipikar Kaginharian: Plantae Kabahig: Tracheophyta Kahutong: Liliopsida Kahanay: Asparagales Kabanay: Orchidaceae Kahenera: 'Angraecum' Espesye: ''Angraecum madagascariense'' Siyentipikinhong Ngalan Angraecum madagascariense(Finet) Schltr. Laing Ngalan Angraecum madagascariensis (Finet) Szlach., Mytnik & Grochocka Kaliwatan sa tanom nga asparagos ang Angraecum madagascariense.[1] Una ning gihulagway ni Achille...

Paghimo ni bot Lsjbot. 18°15′11″S 126°14′46″E / 18.25303°S 126.246°E / -18.25303; 126.246 Leopold River Suba Nasod  Ostralya Estado State of Western Australia Gitas-on 157 m (515 ft) Tiganos 18°15′11″S 126°14′46″E / 18.25303°S 126.246°E / -18.25303; 126.246 Timezone AWST (UTC+8) GeoNames 8608095 Suba ang Leopold River sa Ostralya.[1] Nahimutang ni sa estado sa State of Western Australia, sa amihanan-kasad...