Partitioned Elias–Fano indexes

Partitioned Elias–Fano (PEF) indexes are a compressed data structure designed for efficiently representing sorted integer sequences, notably inverted indexes in information retrieval. Introduced by Giuseppe Ottaviano and Rossano Venturini in 2014, PEF indexes enhance classic Elias–Fano encoding by dividing sequences into partitions or chunks to leverage local clustering, thus achieving superior compression without sacrificing query speed.^[1]

Background

Elias–Fano encoding, proposed by Peter Elias and Robert Fano in the 1970s, provides an efficient method for representing monotonically increasing sequences of integers. This encoding splits each integer into high and low bits, storing the low bits explicitly and the high bits in a compressed bit vector. Elias–Fano encoding is quasi-succinct, using very close to the theoretical minimum number of bits required.^[2] It supports constant-time random access, fast skipping, and efficient searching within the compressed data, making it especially suitable for information retrieval tasks such as intersection and merging of posting lists.^[3]

Partitioned Elias–Fano technique

The partitioned Elias–Fano method extends the standard Elias–Fano encoding by dividing integer sequences into smaller chunks. Each chunk is compressed individually, capturing local clustering more effectively. This two-level approach consists of:

Top-level sequence: Encodes chunk descriptors using Elias–Fano encoding to allow quick navigation between chunks.
Chunk-level sequences: Compresses each chunk individually, choosing optimal encoding strategies based on the local structure of data.^[1]

Chunk partitioning can be fixed-length or variable-length, with the latter providing superior compression by adaptively setting chunk boundaries based on data distribution. The optimal variable partitioning is approximated efficiently using heuristics, avoiding prohibitive computational costs.^[4]

Performance

Partitioned Elias–Fano indexes significantly improve compression over standard Elias–Fano indexes, achieving up to double the compression efficiency. Benchmarks demonstrate that PEF maintains competitive query performance, especially for intersection and skip-heavy queries common in web search applications. When compared to other compression schemes, such as gamma-delta coding, variable-byte coding, and frame-of-reference encoding (PForDelta), partitioned Elias–Fano consistently shows favorable trade-offs between compression size and query speed.^[3]

Applications and adoption

Partitioned Elias–Fano indexes are widely adopted in both academic research tools and commercial search infrastructures. Academic implementations include the PISA toolkit (Performant Indexes and Search for Academia) and MG4J (Managing Gigabytes for Java).^[4] Commercially, PEF influences systems like Apache Lucene, which underpins search platforms Elasticsearch and Solr, and Facebook’s Graph Search system (Unicorn), which utilizes Elias–Fano encoding for rapid graph queries at scale.^[5]

Further developments include clustered Elias–Fano indexes, which improve upon PEF by exploiting redundancy across multiple sequences, and hybrid indexing techniques combining PEF with other methods like BitFunnel, achieving notable performance gains in large-scale information retrieval systems.^[6]

References

^ ^a ^b Ottaviano, Giuseppe; Venturini, Rossano (2014). "Partitioned Elias–Fano indexes". Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval. pp. 273–282. doi:10.1145/2600428.2609615. ISBN 9781450322577.
^ "EliasFanoEncoder) (Lucene 4.8.0 API)". Apache Lucene.
^ ^a ^b Aher, Gaiti (May 9, 2021). "Survey of Data Structures for Large Scale Information Retrieval".
^ ^a ^b "gioaudino / PartitionedEliasFano". GitHub.
^ Curtiss, Matthew (2013). "Unicorn: A System for Searching the Social Graph" (PDF). Proceedings of the VLDB Endowment. 6 (11): 1150–1161. doi:10.14778/2536222.2536239.
^ Pibiri, Giulio Ermanno; Venturini, Rossano (2017). "Clustered Elias–Fano Indexes". ACM Transactions on Information Systems. 36 (1) 2. doi:10.1145/3086689 (inactive 1 July 2025).{{cite journal}}: CS1 maint: DOI inactive as of July 2025 (link)