Shannon's source coding theorem

In information theory, Shannon's source coding theorem (or noiseless coding theorem) establishes the statistical limits to possible data compression for data whose source is an independent identically-distributed random variable, and the operational meaning of the Shannon entropy.

Named after Claude Shannon, the source coding theorem shows that, in the limit, as the length of a stream of independent and identically-distributed random variable (i.i.d.) data tends to infinity, it is impossible to compress such data such that the code rate (average number of bits per symbol) is less than the Shannon entropy of the source, without it being virtually certain that information will be lost. However it is possible to get the code rate arbitrarily close to the Shannon entropy, with negligible probability of loss.

The source coding theorem for symbol codes places an upper and a lower bound on the minimal possible expected length of codewords as a function of the entropy of the input word (which is viewed as a random variable) and of the size of the target alphabet.

Note that, for data that exhibits more dependencies (whose source is not an i.i.d. random variable), the Kolmogorov complexity, which quantifies the minimal description length of an object, is more suitable to describe the limits of data compression. Shannon entropy takes into account only frequency regularities while Kolmogorov complexity takes into account all algorithmic regularities, so in general the latter is smaller. On the other hand, if an object is generated by a random process in such a way that it has only frequency regularities, entropy is close to complexity with high probability (Shen et al. 2017).[1]

Statements

Source coding is a mapping from (a sequence of) symbols from an information source to a sequence of alphabet symbols (usually bits) such that the source symbols can be exactly recovered from the binary bits (lossless source coding) or recovered within some distortion (lossy source coding). This is one approach to data compression.

Source coding theorem

In information theory, the source coding theorem (Shannon 1948)[2] informally states that (MacKay 2003, pg. 81,[3] Cover 2006, Chapter 5[4]):

N i.i.d. random variables each with entropy H(X) can be compressed into more than N H(X) bits with negligible risk of information loss, as N → ∞; but conversely, if they are compressed into fewer than N H(X) bits it is virtually certain that information will be lost.

The coded sequence represents the compressed message in a biunivocal way, under the assumption that the decoder knows the source. From a practical point of view, this hypothesis is not always true. Consequently, when the entropy encoding is applied the transmitted message is . Usually, the information that characterizes the source is inserted at the beginning of the transmitted message.

Source coding theorem for symbol codes

Let Σ1, Σ2 denote two finite alphabets and let Σ
1
and Σ
2
denote the set of all finite words from those alphabets (respectively).

Suppose that X is a random variable taking values in Σ1 and let f be a uniquely decodable code from Σ
1
to Σ
2
where 2| = a. Let S denote the random variable given by the length of codeword f (X).

If f is optimal in the sense that it has the minimal expected word length for X, then (Shannon 1948):

Where denotes the expected value operator.

Proof: source coding theorem

Given X is an i.i.d. source, its time series X1, ..., Xn is i.i.d. with entropy H(X) in the discrete-valued case and differential entropy in the continuous-valued case. The Source coding theorem states that for any ε > 0, i.e. for any rate H(X) + ε larger than the entropy of the source, there is large enough n and an encoder that takes n i.i.d. repetition of the source, X1:n, and maps it to n(H(X) + ε) binary bits such that the source symbols X1:n are recoverable from the binary bits with probability of at least 1 − ε.

Proof of Achievability. Fix some ε > 0, and let

The typical set, Aε
n
, is defined as follows:

The asymptotic equipartition property (AEP) shows that for large enough n, the probability that a sequence generated by the source lies in the typical set, Aε
n
, as defined approaches one. In particular, for sufficiently large n, can be made arbitrarily close to 1, and specifically, greater than (See AEP for a proof).

The definition of typical sets implies that those sequences that lie in the typical set satisfy:


  • The probability of a sequence being drawn from Aε
    n
    is greater than 1 − ε.
  • , which follows from the left hand side (lower bound) for .
  • , which follows from upper bound for and the lower bound on the total probability of the whole set Aε
    n
    .

Since bits are enough to point to any string in this set.

The encoding algorithm: the encoder checks if the input sequence lies within the typical set; if yes, it outputs the index of the input sequence within the typical set; if not, the encoder outputs an arbitrary n(H(X) + ε) digit number. As long as the input sequence lies within the typical set (with probability at least 1 − ε), the encoder does not make any error. So, the probability of error of the encoder is bounded above by ε.

Proof of converse: the converse is proved by showing that any set of size smaller than Aε
n
(in the sense of exponent) would cover a set of probability bounded away from 1.

Proof: Source coding theorem for symbol codes

For 1 ≤ in let si denote the word length of each possible xi. Define , where C is chosen so that q1 + ... + qn = 1. Then

where the second line follows from Gibbs' inequality and the fifth line follows from Kraft's inequality:

so log C ≤ 0.

For the second inequality we may set

so that

and so

and

and so by Kraft's inequality there exists a prefix-free code having those word lengths. Thus the minimal S satisfies

Extension to non-stationary independent sources

Fixed rate lossless source coding for discrete time non-stationary independent sources

Define typical set Aε
n
as:

Then, for given δ > 0, for n large enough, Pr(Aε
n
) > 1 − δ
. Now we just encode the sequences in the typical set, and usual methods in source coding show that the cardinality of this set is smaller than . Thus, on an average, Hn(X) + ε bits suffice for encoding with probability greater than 1 − δ, where ε and δ can be made arbitrarily small, by making n larger.

See also

References

  1. ^ Shen, A. and Uspensky, V.A. and Vereshchagin, N. (2017). "Chapter 7.3. : Complexity and entropy". Kolmogorov Complexity and Algorithmic Randomness. American Mathematical Society. p. 226. ISBN 9781470431822.{{cite book}}: CS1 maint: multiple names: authors list (link)
  2. ^ C.E. Shannon, "A Mathematical Theory of Communication Archived 2009-02-16 at the Wayback Machine", Bell System Technical Journal, vol. 27, pp. 379–423, 623-656, July, October, 1948
  3. ^ David J. C. MacKay. Information Theory, Inference, and Learning Algorithms Cambridge: Cambridge University Press, 2003. ISBN 0-521-64298-1
  4. ^ Cover, Thomas M. (2006). "Chapter 5: Data Compression". Elements of Information Theory. John Wiley & Sons. pp. 103–142. ISBN 0-471-24195-4.

Read other articles:

National Soccer LeagueNegara AustraliaKlub lain dari Selandia BaruDibentuk1977Musim perdana1977Dibubarkan2004Jumlah tim42 (total)Juara bertahan ligaPerth Glory (gelar ke-2)Klub tersuksesMarconi StallionsSouth MelbourneSydney City(masing-masing 4 gelar)Gelar musimreguler terbanyakMelbourne Knights (5 gelar)Televisi penyiarSeven Network & C7 Sport (1998–2002)ABC (2000)SBS (2002–2004) National Soccer League (NSL) adalah liga sepak bola tingkat teratas di Australia yang dijalank...

 

 

Aksara Han Tulisan Perintis Tulang Ramalan Perunggu Segel (cacing-burungbesarkecil) Klerikal Reguler Semikursif Kursif Kuas datar Gaya dan Jenis Huruf Song Imitasi Ming Sans-serif Karakteristik Goresan (urutan) Radikal Klasifikasi Varian Standar bentuk aksara Kamus KangxiXin Zixing Aksara Han Standar Umum (RRT) Grafem Aksara Han Umum Digunakan (Hong Kong) Jenis Huruf Standar untuk Aksara Han (Taiwan) Grafem-standar penggunaan Varian Grafemis Aksara Standar Umum (RRT) Jōyō kanji (J...

 

 

Katedral Salib Suci adalah sebuah situs sejarah dari kompleks keagamaan Gereja Apostolik Armenia yang terletak di Pulau Aghtamar, Danau Van, Turki. Katedral tersebut dibangun antara tahun 915-921 oleh arsitek Manuel, mengikuti perintah Raja Armenia Gagik I Ardzruni, yang merupakan penguasa Kerajaan Vaspurakan. Katedral tersebut dibuka untuk peribadatan satu tahun sekali dengan izin khusus Kementerian Kebudayaan dan Pariwisata Turki. Katedral tersebut masuk ke Daftar Warisan Dunia Sementara U...

العلاقات المجرية الطاجيكستانية المجر طاجيكستان   المجر   طاجيكستان تعديل مصدري - تعديل   العلاقات المجرية الطاجيكستانية هي العلاقات الثنائية التي تجمع بين المجر وطاجيكستان.[1][2][3][4][5] مقارنة بين البلدين هذه مقارنة عامة ومرجعية للدولتين: و�...

 

 

American politician Lorena GonzalezMember of the California State Assemblyfrom the 80th districtIn officeMay 28, 2013 – January 5, 2022Preceded byBen HuesoSucceeded byDavid Alvarez Personal detailsBornLorena Sofia Gonzalez (1971-09-16) September 16, 1971 (age 52)Oceanside, California, U.S.Political partyDemocraticSpouseNathan Fletcher (m. 2017)Children2EducationStanford University (BA)Georgetown University (MA)University of California, Los Angeles (JD) Lorena Sofia Gonzalez Fl...

 

 

County in Florida, United States This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Gulf County, Florida – news · newspapers · books · scholar · JSTOR (April 2021) (Learn how and when to remove this template message) County in FloridaGulf CountyCountyGulf County Courthouse SealLocation within the U.S. state of ...

استخدم البشر عبر التاريخ في معظم حروبهم تشكيلات قتالية تماثل في فكرتها سلاح الفرسان، ونتيجة لذلك فقد تطورت تكتيكات سلاح الفرسان مع مرور الوقت. من الناحية التكتيكية، تفوق سلاح الفرسان على قوات المشاة بمزايا عدة منها القدرة الأكبر على الحركة والتأثير الأوسع والموقع الأعلى...

 

 

Hospital in Jidhafs, BahrainInternational Hospital of Bahrain W.L.L.IHB Main ReceptionGeographyLocationJidhafs, BahrainCoordinates26°13′9″N 50°31′35″E / 26.21917°N 50.52639°E / 26.21917; 50.52639OrganisationTypePrivateServicesEmergency department24-Hour Accident & EmergencyHistoryOpenedOctober 1978Closed15 June 2018LinksWebsitehttp://www.ihb.net The International Hospital of Bahrain (Abbreviation: IHB; Arabic: مستشفى البحرين الدولي ) ...

 

 

Lokasi IJsselmeer IJsselmeer atau Danau IJssel adalah sebuah danau dangkal di Belanda. Danau ini merupakan danau buatan dan tercipta ketika laut Zuiderzee atau Laut Selatan dibendung dengan diberi tanggul Afsluitdijk pada tahun 1932. Ketika itu laut ini hilang dan berubah menjadi sebuah danau air tawar. Nama IJsselmeer diambil dari nama sungai IJssel yang bermuara di sini. Danau ini luasnya adalah 1100 km² di tengah-tengah Belanda dan berbatasan dengan provinsi Flevoland, Noord Holland dan F...

Fictional location This article is about the fictional location. For other uses, see Back room (disambiguation). A typical depiction of the Backrooms, digitally rendered The Backrooms are a fictional concept first mentioned on a 2019 4chan thread. One of the best known examples of the liminal space aesthetic, the Backrooms are commonly depicted as an extradimensional space containing impossibly large expanses of empty rooms accessed by no-clipping out of reality in certain areas. Internet use...

 

 

NahuatlnāhuatlàtōlliParlato in Messico RegioniAmerica del Nord LocutoriTotale1.725.620 (2015) Classificanon nelle prime 100 Altre informazioniScritturaAlfabeto latino Tipopolisintetica TassonomiaFilogenesilingue uto-azteche Nahuatl Codici di classificazioneISO 639-2nah ISO 639-5nah Glottologazte1234 (EN) Estratto in linguaDichiarazione universale dei diritti umani, art. 1Nochi tlakamej uan siuamej kipiaj manoj kuali tlakatisej, nochi san se totlatechpouiltilis uan titlatepanitalo...

 

 

NeoNeo mentre blocca le pallottole degli agenti in una scena di Matrix (1999) UniversoMatrix AutoreFratelli Wachowski 1ª app. inMatrix (1999) Ultima app. inMatrix Resurrections (2021) Interpretato daKeanu Reeves Voce italianaLuca Ward Caratteristiche immaginarieAlter egoThomas A. Anderson[5] SoprannomeSig. Anderson[1]L'Eletto[2] SessoMaschio ProfessioneProgrammatore[3]Hacker[3] Oppositore delle macchine[4] Poteri Prestazioni fisiche sovrum...

American football player and coach (1932–2014) American football player Chuck NollNoll with the Cleveland Browns in 1954No. 65Position:GuardLinebackerPersonal informationBorn:(1932-01-05)January 5, 1932Cleveland, Ohio, U.S.Died:June 13, 2014(2014-06-13) (aged 82)Sewickley, Pennsylvania, U.S.Height:6 ft 1 in (1.85 m)Weight:220 lb (100 kg)Career informationHigh school:Benedictine (Cleveland, Ohio)College:DaytonNFL draft:1953 / Round: 20 / Pick:...

 

 

Notoryctemorphia Notoryctes typhlops TaksonomiKerajaanAnimaliaFilumChordataKelasMammaliaOrdoNotoryctemorphia Famili dan genus Notoryctidae Notoryctes Stirling, 1891 †Naraboryctes Archer dkk., 2010 lbs Notoryctemorphia adalah ordo dalam kelas mamalia. Ordo ini hanya memiliki satu famili, yaitu Notoryctidae, dan dua genus, yaitu Naraboryctes yang telah punah dan Notoryctes atau tikus mondok marsupial [1] Hewan ini tergolong marsupialia australidelphia yang terdiri atas dua spesies, ya...

 

 

この記事は検証可能な参考文献や出典が全く示されていないか、不十分です。出典を追加して記事の信頼性向上にご協力ください。(このテンプレートの使い方)出典検索?: コルク – ニュース · 書籍 · スカラー · CiNii · J-STAGE · NDL · dlib.jp · ジャパンサーチ · TWL(2017年4月) コルクを打ち抜いて作った瓶の栓 コルク(木栓、�...

Belgic tribe Map of Gaul with tribes, 1st century BC; the Mediomatrici are circled. Civitas of the Mediomatrici City scape of Divodurum Mediomatricum (ca. 2nd century AD), ancestor of present-day Metz, capital of the Mediomatrici. The Mediomatrici (Gaulish: *Medio-māteres) were according to Caesar a Gaulish tribe at the frontier to the Belgicae dwelling in the present-day regions Lorraine, Upper Moselle during the Iron Age and the Roman period. Name They are mentioned as Mediomatricorum and ...

 

 

土库曼斯坦总统土库曼斯坦国徽土库曼斯坦总统旗現任谢尔达尔·别尔德穆哈梅多夫自2022年3月19日官邸阿什哈巴德总统府(Oguzkhan Presidential Palace)機關所在地阿什哈巴德任命者直接选举任期7年,可连选连任首任萨帕尔穆拉特·尼亚佐夫设立1991年10月27日 土库曼斯坦土库曼斯坦政府与政治 国家政府 土库曼斯坦宪法 国旗 国徽 国歌 立法機關(英语:National Council of Turkmenistan) ...

 

 

Law of Taiwan Anti-Infiltration ActLegislative YuanCitationAnti-Infiltration ActPassed31 December 2019Signed byPresident Tsai Ing-wenSigned15 January 2020Effective17 January 2020Legislative historySecond reading29 November 2019Third reading31 December 2019Status: In force The Anti-Infiltration Act (Chinese: 反滲透法) is a law regulating the influence of entities deemed foreign hostile forces on the political processes of the Republic of China (commonly known as Taiwan), including...

此条目序言章节没有充分总结全文内容要点。 (2019年3月21日)请考虑扩充序言,清晰概述条目所有重點。请在条目的讨论页讨论此问题。 哈萨克斯坦總統哈薩克總統旗現任Қасым-Жомарт Кемелұлы Тоқаев卡瑟姆若马尔特·托卡耶夫自2019年3月20日在任任期7年首任努尔苏丹·纳扎尔巴耶夫设立1990年4月24日(哈薩克蘇維埃社會主義共和國總統) 哈萨克斯坦 哈萨克斯坦政府...

 

 

Військово-музичне управління Збройних сил України Тип військове формуванняЗасновано 1992Країна  Україна Емблема управління Військово-музичне управління Збройних сил України — структурний підрозділ Генерального штабу Збройних сил України призначений для планува...