Variable-length code

In coding theory, a variable-length code is a code which maps source symbols to a variable number of bits. The equivalent concept in computer science is bit string.

Variable-length codes can allow sources to be compressed and decompressed with zero error (lossless data compression) and still be read back symbol by symbol. With the right coding strategy an independent and identically-distributed source may be compressed almost arbitrarily close to its entropy. This is in contrast to fixed-length coding methods, for which data compression is only possible for large blocks of data, and any compression beyond the logarithm of the total number of possibilities comes with a finite (though perhaps arbitrarily small) probability of failure.

Some examples of well-known variable-length coding strategies are Huffman coding, Lempel–Ziv coding, arithmetic coding, and context-adaptive variable-length coding.

Codes and their extensions

The extension of a code is the mapping of finite length source sequences to finite length bit strings, that is obtained by concatenating for each symbol of the source sequence the corresponding codeword produced by the original code.

Using terms from formal language theory, the precise mathematical definition is as follows: Let and be two finite sets, called the source and target alphabets, respectively. A code is a total function[1] mapping each symbol from to a sequence of symbols over , and the extension of to a homomorphism of into , which naturally maps each sequence of source symbols to a sequence of target symbols, is referred to as its extension.

Classes of variable-length codes

Variable-length codes can be strictly nested in order of decreasing generality as non-singular codes, uniquely decodable codes and prefix codes. Prefix codes are always uniquely decodable, and these in turn are always non-singular:

Non-singular codes

A code is non-singular if each source symbol is mapped to a different non-empty bit string, i.e. the mapping from source symbols to bit strings is injective.

  • For example, the mapping is not non-singular because both "a" and "b" map to the same bit string "0"; any extension of this mapping will generate a lossy (non-lossless) coding. Such singular coding may still be useful when some loss of information is acceptable (for example when such code is used in audio or video compression, where a lossy coding becomes equivalent to source quantization).
  • However, the mapping is non-singular; its extension will generate a lossless coding, which will be useful for general data transmission (but this feature is not always required). Note that it is not necessary for the non-singular code to be more compact than the source (and in many applications, a larger code is useful, for example as a way to detect and/or recover from encoding or transmission errors, or in security applications to protect a source from undetectable tampering).

Uniquely decodable codes

A code is uniquely decodable if its extension is § non-singular. Whether a given code is uniquely decodable can be decided with the Sardinas–Patterson algorithm.

  • The mapping is uniquely decodable (this can be demonstrated by looking at the follow-set after each target bit string in the map, because each bitstring is terminated as soon as we see a 0 bit which cannot follow any existing code to create a longer valid code in the map, but unambiguously starts a new code).
  • Consider again the code from the previous section.[1] This code is not uniquely decodable, since the string 011101110011 can be interpreted as the sequence of codewords 01110 – 1110 – 011, but also as the sequence of codewords 011 – 1 – 011 – 10011. Two possible decodings of this encoded string are thus given by cdb and babe. However, such a code is useful when the set of all possible source symbols is completely known and finite, or when there are restrictions (for example a formal syntax) that determine if source elements of this extension are acceptable. Such restrictions permit the decoding of the original message by checking which of the possible source symbols mapped to the same symbol are valid under those restrictions.

Prefix codes

A code is a prefix code if no target bit string in the mapping is a prefix of the target bit string of a different source symbol in the same mapping. This means that symbols can be decoded instantaneously after their entire codeword is received. Other commonly used names for this concept are prefix-free code, instantaneous code, or context-free code.

  • The example mapping in the previous paragraph is not a prefix code because we don't know after reading the bit string "0" if it encodes an "a" source symbol, or if it is the prefix of the encodings of the "b" or "c" symbols.
  • An example of a prefix code is shown below.
Symbol Codeword
a 0
b 10
c 110
d 111
Example of encoding and decoding:
aabacdab → 00100110111010 → |0|0|10|0|110|111|0|10| → aabacdab

A special case of prefix codes are block codes. Here all codewords must have the same length. The latter are not very useful in the context of source coding, but often serve as forward error correction in the context of channel coding.

Another special case of prefix codes are LEB128 and variable-length quantity (VLQ) codes, which encode arbitrarily large integers as a sequence of octets—i.e., every codeword is a multiple of 8 bits.

Advantages

The advantage of a variable-length code is that unlikely source symbols can be assigned longer codewords and likely source symbols can be assigned shorter codewords, thus giving a low expected codeword length. For the above example, if the probabilities of (a, b, c, d) were , the expected number of bits used to represent a source symbol using the code above would be:

.

As the entropy of this source is 1.75 bits per symbol, this code compresses the source as much as possible so that the source can be recovered with zero error.

See also

References

  1. ^ a b This code is based on an example found in Berstel et al. (2009), Example 2.3.1, p. 63.

Further reading

  • Salomon, David (September 2007). Variable-Length Codes for Data Compression (1 ed.). Springer Verlag. ISBN 978-1-84628-958-3. (xii+191 pages) Errata 1Errata 2
  • Berstel, Jean; Perrin, Dominique; Reutenauer, Christophe (2010). Codes and automata. Encyclopedia of Mathematics and its Applications. Vol. 129. Cambridge, UK: Cambridge University Press. ISBN 978-0-521-88831-8. Zbl 1187.94001. Draft available online

Read other articles:

هذه المقالة تحتاج للمزيد من الوصلات للمقالات الأخرى للمساعدة في ترابط مقالات الموسوعة. فضلًا ساعد في تحسين هذه المقالة بإضافة وصلات إلى المقالات المتعلقة بها الموجودة في النص الحالي. (أبريل 2016) سوسونو    علم شعار الاسم الرسمي (باليابانية: 裾野町)‏(باليابانية: 裾野市)‏&...

У Вікіпедії є статті про інші географічні об’єкти з назвою Елізабеттаун. Переписна місцевість Елізабеттаунангл. Elizabethtown Координати 44°12′59″ пн. ш. 73°35′26″ зх. д. / 44.21640000002777526° пн. ш. 73.59060000002777713° зх. д. / 44.21640000002777526; -73.59060000002777713Координати: 44°12

« Sleeping Beauty » redirige ici. Pour les autres significations, voir Sleeping Beauty (homonymie). Pour les articles homonymes, voir La Belle au bois dormant (homonymie). La Belle au bois dormant Couverture d'un livre pour enfants des années 1930 Auteur Charles Perrault Pays Royaume de France Préface Jean Charles Biteis Genre Conte en prose Éditeur Claude Barbin Lieu de parution Paris Date de parution 1697 Chronologie Peau d'âne Le Petit Chaperon rouge modifier  La Belle...

Lambang Komune Marles-en-Brie. Marles-en-BrieNegaraPrancisArondisemenProvinsKantonRozay-en-BrieAntarkomuneCommunauté de communes du Val BréonPemerintahan • Wali kota (2008-2014) Monique Allain • Populasi11,294Kode INSEE/pos77277 / 2 Population sans doubles comptes: penghitungan tunggal penduduk di komune lain (e.g. mahasiswa dan personil militer). Marles-en-Brie merupakan sebuah komune di departemen Seine-et-Marne di region Île-de-France di utara-tengah Prancis...

Retrato de Isidoro Araujo en La Ilustración de Galicia y Asturias.[1]​ Placa en Bouzas en homenaje a Isidoro Araujo. Isidoro Araujo de Lira (Bouzas, 2 de enero de 1816-La Habana, 7 de mayo de 1861) fue un periodista, escritor y empresario español. Biografía Nacido en la localidad gallega de Bouzas el 2 de enero de 1816,[2]​[3]​[nota 1]​ estudió Humanidades en Tuy y Filosofía en el monasterio benedictino de Samos.[2]​ En 1835, cuando se produjo una de las pri...

Historic house in Rhode Island, United States United States historic placeE.A. Burnham HouseU.S. National Register of Historic Places E.A. Burnham HouseShow map of Rhode IslandShow map of the United StatesLocationPawtucket, Rhode IslandCoordinates41°52′58″N 71°23′20″W / 41.88278°N 71.38889°W / 41.88278; -71.38889Built1902ArchitectAlbert H. HumesMPSPawtucket MRANRHP reference No.83003806 [1]Added to NRHPNovember 18, 1983 The E. A. Burnham H...

Football match2009 FIFA Club World Cup finalMatch programme coverEvent2009 FIFA Club World Cup Estudiantes (LP) Barcelona 1 2 After extra timeDate19 December 2009VenueZayed Sports City, Abu Dhabi[1]RefereeBenito Archundia (Mexico)[1]Attendance43,050[1]WeatherPartly cloudy22 °C (72 °F)60% humidity← 2008 2010 → The 2009 FIFA Club World Cup final was the final match of the 2009 FIFA Club World Cup, a football tournament for the champion clubs from ...

Video game label Portkey GamesTypeDivisionIndustryVideo gamesFounded2017; 6 years ago (2017)ParentWarner Bros. GamesWebsiteportkeygames.com Portkey Games is a video game label owned by Warner Bros. Games founded in 2017 and dedicated to creating gaming experiences related to the Wizarding World. In the beginning, the company's primary focus was on publishing mobile games, but in 2023, they diversified by launching Hogwarts Legacy for both PC and consoles. Overview Portkey Ga...

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (يناير 2018) تعتبر سلطة الراهب خفيفة وسهلة التحضير، وتعتبر من أطباق المطبخ اللبناني. تحضّر سلطة الراهب من الباذنجان، الطماطم، البصل، البقدونس، عصير الليمون الحامض، الز�...

هذه المقالة يتيمة إذ تصل إليها مقالات أخرى قليلة جدًا. فضلًا، ساعد بإضافة وصلة إليها في مقالات متعلقة بها. (مارس 2019) جورج بي. ريد معلومات شخصية الميلاد 9 نوفمبر 1807  مقاطعة ميدلسيكس  تاريخ الوفاة 10 يناير 1883 (75 سنة)   مواطنة الولايات المتحدة  إخوة وأخوات كورتيس ريد،  و

This article is part of a series on thePolitics of the People's Republic of Bangladesh Constitution Amendments Law of Bangladesh Human rights Article 70 Judicial review Government President: Mohammed Shahabuddin Prime Minister: Sheikh Hasina Cabinet: Hasina IV Taxation Agencies Civil Service Local governments Parliament Speaker: Shirin Sharmin Chaudhury Leader of the House: Sheikh Hasina Leader of the Opposition: Rowshan Ershad Judiciary Supreme Court: Appellate Division High Court Division D...

Эту страницу предлагается переименовать в «Возвращение блудного сына (картина Рембрандта)».Пояснение причин и обсуждение — на странице Википедия:К переименованию/10 июля 2017. Пожалуйста, основывайте свои аргументы на правилах именования статей. Не удаляйте шаблон до п...

Historical fiction novel by Leslie Feinberg Stone Butch Blues Front cover of 2004 Alyson Books paperback editionAuthorLeslie FeinbergCountryUnited StatesLanguageEnglishGenrehistorical fictionPublisherFirebrand BooksPublication dateMarch 1993Media typePrint (hardcover and paperback)ISBN1-56341-030-3OCLC27336208Dewey Decimal813/.54 20LC ClassPS3556.E427 S7 1993 Stone Butch Blues is a historical fiction novel written by Leslie Feinberg about life as a butch lesbian in 1970s America. Wh...

Mexican actress Olivia BucioBorn (1954-10-26) October 26, 1954 (age 69)Uruapan, Michoacán, MexicoOccupationActressYears active1980–present Olivia Bucio (born October 26, 1954, in Uruapan, Michoacán, Mexico), is a Mexican actress. She worked with Televisa as an actress of telenovelas.[1] Filmography Television Year Title Role Notes 1980 Conflictos de un médico Isabel Television debut 1982 El amor nunca muere Gloria 1988 Amor en silencio Elena Robles 1990 Amor de nadi...

У цій статті в хронологічному порядку подано головні події, що стосуються української мови з 1919 по 1991 на території УСРР, УРСР та загалом в СРСР. 1919 рік — після завоювання України більшовиками — знищення національної патріотичної частини населення та заборона тв�...

Film archive in ThailandFilm Archive (Public Organization)หอภาพยนตร์ (องค์การมหาชน)National Film Archive in 2008Agency overviewFormed2009Preceding agencyNational Film ArchiveTypePublic organizationHeadquartersSalaya, Phutthamonthon, Nakhon Pathom, Thailand13°47′56″N 100°18′11″E / 13.7989°N 100.3030°E / 13.7989; 100.3030Agency executiveDome Sukwong, DirectorWebsiteOfficial website The Film Archive (Public Organizat...

◄   Erdalkalimetalle   ►                                                                                                                             ...

Sri Lankan politician Hon.Kanaka HerathMPකනක හේරත් கனக ஹேரத்Minister of HighwaysIn office18 April 2022 – 9 May 2022PresidentGotabaya RajapaksaPrime MinisterMahinda RajapaksaPreceded byJohnston FernandoSucceeded byBandula GunawardenaMember of Parliamentfor Kegalle DistrictIncumbentAssumed office 2010 Personal detailsBorn (1976-04-27) April 27, 1976 (age 47)NationalitySri LankanPolitical partySri Lanka Podujana PeramunaOther politicalaffiliati...

World Bowl XI Rhein Fire (2nd) Frankfurt Galaxy (1st) 16 35 1234 Total RHE 3607 16 FRA 111473 35 DateSaturday, June 14, 2003StadiumHampden Park, Glasgow, ScotlandMVPJonas Lewis, Running backRefereePete MorelliAttendance28,138CeremoniesHalftime showSugababesTV in the United StatesNetworkFoxAnnouncersCurt Menefee and Brian Baldinger ← X World Bowl XII → World Bowl XI was NFL Europe's 2003 championship game. It was played at Hampden Park in Glasgow, Scotland on June 14, 200...

Physical characteristic Pointy ears or pointed ears are a characteristic of many animals, a genetic disorder in humans, as well as a cliché in popular culture, particularly in the fantasy genre. They are commonly known as elf ears.[1][2][3] Animals A southern bushpig. Pointy ears is a characteristic of some animals. Some examples are the cat, vampire bats,[4] civets and genets of the viverridae family,[5] red pandas,[6] and African bush pigs.&#...