Partitioned Elias–Fano indexesPartitioned Elias–Fano (PEF) indexes are a compressed data structure designed for efficiently representing sorted integer sequences, notably inverted indexes in information retrieval. Introduced by Giuseppe Ottaviano and Rossano Venturini in 2014, PEF indexes enhance classic Elias–Fano encoding by dividing sequences into partitions or chunks to leverage local clustering, thus achieving superior compression without sacrificing query speed.[1] BackgroundElias–Fano encoding, proposed by Peter Elias and Robert Fano in the 1970s, provides an efficient method for representing monotonically increasing sequences of integers. This encoding splits each integer into high and low bits, storing the low bits explicitly and the high bits in a compressed bit vector. Elias–Fano encoding is quasi-succinct, using very close to the theoretical minimum number of bits required.[2] It supports constant-time random access, fast skipping, and efficient searching within the compressed data, making it especially suitable for information retrieval tasks such as intersection and merging of posting lists.[3] Partitioned Elias–Fano techniqueThe partitioned Elias–Fano method extends the standard Elias–Fano encoding by dividing integer sequences into smaller chunks. Each chunk is compressed individually, capturing local clustering more effectively. This two-level approach consists of:
Chunk partitioning can be fixed-length or variable-length, with the latter providing superior compression by adaptively setting chunk boundaries based on data distribution. The optimal variable partitioning is approximated efficiently using heuristics, avoiding prohibitive computational costs.[4] PerformancePartitioned Elias–Fano indexes significantly improve compression over standard Elias–Fano indexes, achieving up to double the compression efficiency. Benchmarks demonstrate that PEF maintains competitive query performance, especially for intersection and skip-heavy queries common in web search applications. When compared to other compression schemes, such as gamma-delta coding, variable-byte coding, and frame-of-reference encoding (PForDelta), partitioned Elias–Fano consistently shows favorable trade-offs between compression size and query speed.[3] Applications and adoptionPartitioned Elias–Fano indexes are widely adopted in both academic research tools and commercial search infrastructures. Academic implementations include the PISA toolkit (Performant Indexes and Search for Academia) and MG4J (Managing Gigabytes for Java).[4] Commercially, PEF influences systems like Apache Lucene, which underpins search platforms Elasticsearch and Solr, and Facebook’s Graph Search system (Unicorn), which utilizes Elias–Fano encoding for rapid graph queries at scale.[5] Further developments include clustered Elias–Fano indexes, which improve upon PEF by exploiting redundancy across multiple sequences, and hybrid indexing techniques combining PEF with other methods like BitFunnel, achieving notable performance gains in large-scale information retrieval systems.[6] References
External links
|