Speech coding

Speech coding is an application of data compression to digital audio signals containing speech. Speech coding uses speech-specific parameter estimation using audio signal processing techniques to model the speech signal, combined with generic data compression algorithms to represent the resulting modeled parameters in a compact bitstream.[1]

Common applications of speech coding are mobile telephony and voice over IP (VoIP).[2] The most widely used speech coding technique in mobile telephony is linear predictive coding (LPC), while the most widely used in VoIP applications are the LPC and modified discrete cosine transform (MDCT) techniques.[citation needed]

The techniques employed in speech coding are similar to those used in audio data compression and audio coding where appreciation of psychoacoustics is used to transmit only data that is relevant to the human auditory system. For example, in voiceband speech coding, only information in the frequency band 400 to 3500 Hz is transmitted but the reconstructed signal retains adequate intelligibility.

Speech coding differs from other forms of audio coding in that speech is a simpler signal than other audio signals, and statistical information is available about the properties of speech. As a result, some auditory information that is relevant in general audio coding can be unnecessary in the speech coding context. Speech coding stresses the preservation of intelligibility and pleasantness of speech while using a constrained amount of transmitted data.[3] In addition, most speech applications require low coding delay, as latency interferes with speech interaction.[4]

Categories

Speech coders are of two classes:[5]

  1. Waveform coders
  2. Vocoders

Sample companding viewed as a form of speech coding

The A-law and μ-law algorithms used in G.711 PCM digital telephony can be seen as an earlier precursor of speech encoding, requiring only 8 bits per sample but giving effectively 12 bits of resolution.[7] Logarithmic companding are consistent with human hearing perception in that a low-amplitude noise is heard along a low-amplitude speech signal but is masked by a high-amplitude one. Although this would generate unacceptable distortion in a music signal, the peaky nature of speech waveforms, combined with the simple frequency structure of speech as a periodic waveform having a single fundamental frequency with occasional added noise bursts, make these very simple instantaneous compression algorithms acceptable for speech.[citation needed][dubiousdiscuss]

A wide variety of other algorithms were tried at the time, mostly delta modulation variants, but after careful consideration, the A-law/μ-law algorithms were chosen by the designers of the early digital telephony systems. At the time of their design, their 33% bandwidth reduction for a very low complexity made an excellent engineering compromise. Their audio performance remains acceptable, and there was no need to replace them in the stationary phone network.[citation needed]

In 2008, G.711.1 codec, which has a scalable structure, was standardized by ITU-T. The input sampling rate is 16 kHz.[8]

Modern speech compression

Much of the later work in speech compression was motivated by military research into digital communications for secure military radios, where very low data rates were used to achieve effective operation in a hostile radio environment. At the same time, far more processing power was available, in the form of VLSI circuits, than was available for earlier compression techniques. As a result, modern speech compression algorithms could use far more complex techniques than were available in the 1960s to achieve far higher compression ratios.

The most widely used speech coding algorithms are based on linear predictive coding (LPC).[9] In particular, the most common speech coding scheme is the LPC-based code-excited linear prediction (CELP) coding, which is used for example in the GSM standard. In CELP, the modeling is divided in two stages, a linear predictive stage that models the spectral envelope and a code-book-based model of the residual of the linear predictive model. In CELP, linear prediction coefficients (LPC) are computed and quantized, usually as line spectral pairs (LSPs). In addition to the actual speech coding of the signal, it is often necessary to use channel coding for transmission, to avoid losses due to transmission errors. In order to get the best overall coding results, speech coding and channel coding methods are chosen in pairs, with the more important bits in the speech data stream protected by more robust channel coding.

The modified discrete cosine transform (MDCT) is used in the LD-MDCT technique used by the AAC-LD format introduced in 1999.[10] MDCT has since been widely adopted in voice-over-IP (VoIP) applications, such as the G.729.1 wideband audio codec introduced in 2006,[11] Apple's FaceTime (using AAC-LD) introduced in 2010,[12] and the CELT codec introduced in 2011.[13]

Opus is a free software audio coder. It combines the speech-oriented LPC-based SILK algorithm and the lower-latency MDCT-based CELT algorithm, switching between or combining them as needed for maximal efficiency.[14][15] It is widely used for VoIP calls in WhatsApp.[16][17][18] The PlayStation 4 video game console also uses Opus for its PlayStation Network system party chat.[19]

A number of codecs with even lower bit rates have been demonstrated. Codec2, which operates at bit rates as low as 450 bit/s, sees use in amateur radio.[20] NATO currently uses MELPe, offering intelligible speech at 600 bit/s and below.[21] Neural vocoder approaches have also emerged: Lyra by Google gives an "almost eerie" quality at 3 kbit/s.[22] Microsoft's Satin also uses machine learning, but uses a higher tunable bitrate and is wideband.[23]

Sub-fields

Wideband audio coding
Narrowband audio coding

See also

References

  1. ^ Arjona Ramírez, M.; Minam, M. (2003). "Low bit rate speech coding". Wiley Encyclopedia of Telecommunications, J. G. Proakis, Ed. 3. New York: Wiley: 1299–1308.
  2. ^ M. Arjona Ramírez and M. Minami, "Technology and standards for low-bit-rate vocoding methods," in The Handbook of Computer Networks, H. Bidgoli, Ed., New York: Wiley, 2011, vol. 2, pp. 447–467.
  3. ^ P. Kroon, "Evaluation of speech coders," in Speech Coding and Synthesis, W. Bastiaan Kleijn and K. K. Paliwal, Ed., Amsterdam: Elsevier Science, 1995, pp. 467-494.
  4. ^ J. H. Chen, R. V. Cox, Y.-C. Lin, N. S. Jayant, and M. J. Melchner, A low-delay CELP coder for the CCITT 16 kb/s speech coding standard. IEEE J. Select. Areas Commun. 10(5): 830-849, June 1992.
  5. ^ "Soo Hyun Bae, ECE 8873 Data Compression & Modeling, Georgia Institute of Technology, 2004". Archived from the original on 7 September 2006.
  6. ^ Zeghidour, Neil; Luebs, Alejandro; Omran, Ahmed; Skoglund, Jan; Tagliasacchi, Marco (2022). "SoundStream: An End-to-End Neural Audio Codec". IEEE/ACM Transactions on Audio, Speech, and Language Processing. 30: 495–507. arXiv:2107.03312. doi:10.1109/TASLP.2021.3129994. S2CID 236149944.
  7. ^ Jayant, N. S.; Noll, P. (1984). Digital coding of waveforms. Englewood Cliffs: Prentice-Hall.
  8. ^ G.711.1 : Wideband embedded extension for G.711 pulse code modulation, ITU-T, 2012, retrieved 2022-12-24
  9. ^ Gupta, Shipra (May 2016). "Application of MFCC in Text Independent Speaker Recognition" (PDF). International Journal of Advanced Research in Computer Science and Software Engineering. 6 (5): 805–810 (806). ISSN 2277-128X. S2CID 212485331. Archived from the original (PDF) on 2019-10-18. Retrieved 18 October 2019.
  10. ^ Schnell, Markus; Schmidt, Markus; Jander, Manuel; Albert, Tobias; Geiger, Ralf; Ruoppila, Vesa; Ekstrand, Per; Bernhard, Grill (October 2008). MPEG-4 Enhanced Low Delay AAC - A New Standard for High Quality Communication (PDF). 125th AES Convention. Fraunhofer IIS. Audio Engineering Society. Retrieved 20 October 2019.
  11. ^ Nagireddi, Sivannarayana (2008). VoIP Voice and Fax Signal Processing. John Wiley & Sons. p. 69. ISBN 9780470377864.
  12. ^ Daniel Eran Dilger (June 8, 2010). "Inside iPhone 4: FaceTime video calling". AppleInsider. Retrieved June 9, 2010.
  13. ^ Presentation of the CELT codec Archived 2011-08-07 at the Wayback Machine by Timothy B. Terriberry (65 minutes of video, see also presentation slides in PDF)
  14. ^ "Opus Codec". Opus (Home page). Xiph.org Foundation. Retrieved July 31, 2012.
  15. ^ Valin, Jean-Marc; Maxwell, Gregory; Terriberry, Timothy B.; Vos, Koen (October 2013). High-Quality, Low-Delay Music Coding in the Opus Codec. 135th AES Convention. Audio Engineering Society. arXiv:1602.04845.
  16. ^ Leyden, John (27 October 2015). "WhatsApp laid bare: Info-sucking app's innards probed". The Register. Retrieved 19 October 2019.
  17. ^ Hazra, Sudip; Mateti, Prabhaker (September 13–16, 2017). "Challenges in Android Forensics". In Thampi, Sabu M.; Pérez, Gregorio Martínez; Westphall, Carlos Becker; Hu, Jiankun; Fan, Chun I.; Mármol, Félix Gómez (eds.). Security in Computing and Communications: 5th International Symposium, SSCC 2017. Springer. pp. 286–299 (290). doi:10.1007/978-981-10-6898-0_24. ISBN 9789811068980.
  18. ^ Srivastava, Saurabh Ranjan; Dube, Sachin; Shrivastaya, Gulshan; Sharma, Kavita (2019). "Smartphone Triggered Security Challenges: Issues, Case Studies and Prevention". In Le, Dac-Nhuong; Kumar, Raghvendra; Mishra, Brojo Kishore; Chatterjee, Jyotir Moy; Khari, Manju (eds.). Cyber Security in Parallel and Distributed Computing: Concepts, Techniques, Applications and Case Studies. John Wiley & Sons. pp. 187–206 (200). doi:10.1002/9781119488330.ch12. ISBN 9781119488057. S2CID 214034702.
  19. ^ "Open Source Software used in PlayStation4". Sony Interactive Entertainment Inc. Retrieved 2017-12-11.[failed verification]
  20. ^ "GitHub - Codec2". GitHub. November 2019.
  21. ^ Alan McCree, “A scalable phonetic vocoder framework using joint predictive vector quantization of MELP parameters,” in Proc. IEEE Int. Conf. Acoust., Speech, Signal Processing, 2006, pp. I 705–708, Toulouse, France
  22. ^ Buckley, Ian (2021-04-08). "Google Makes Its Lyra Low Bitrate Speech Codec Public". MakeUseOf. Retrieved 2022-07-21.
  23. ^ Levent-Levi, Tsahi (2021-04-19). "Lyra, Satin and the future of voice codecs in WebRTC". BlogGeek.me. Retrieved 2022-07-21.
  24. ^ "LPCNet: Efficient neural speech synthesis". Xiph.Org Foundation. 8 August 2023.

Read other articles:

Kabupaten TrenggalekKabupatenTranskripsi bahasa daerah • Hanacarakaꦠꦽꦔ꧀ꦒꦭꦺꦏ꧀ • Pegonترڠڬالَيك • Bentuk nonformal JawangGalekSearah jarum jam: Pantai Blado, Gua Kumbokarno, Pantai Prigi, dan lahan sawah berlatar belakang Gunung Wilis LambangEtimologi: Teranging galihGaplekJulukan: Penghasil GaplekMotto: Jwalita praja karana(Jawa) Cemerlang karena rakyatPetaKabupaten TrenggalekPetaTampilkan peta JawaKabupaten Trenggal...

 

 

Biografi ini tidak memiliki sumber tepercaya sehingga isinya tidak dapat dipastikan. Bantu memperbaiki artikel ini dengan menambahkan sumber tepercaya. Materi kontroversial atau trivial yang sumbernya tidak memadai atau tidak bisa dipercaya harus segera dihapus.Cari sumber: Al-Mu'tashim Billah – berita · surat kabar · buku · cendekiawan · JSTOR (Pelajari cara dan kapan saatnya untuk menghapus pesan templat ini) Biografi ini memerlukan lebih banyak cata...

 

 

Isoprena Isoprena adalah nama umum (nama trivial) dari 2-metilbuta-1,3-diena. Senyawa ini biasa digunakan dalam industri, penyusun berbagai senyawa biologi penting, serta dapat berbahaya bagi lingkungan dan beracun bagi manusia bila terpapar secara berlebihan. Dalam suhu ruang isoprena berwujud cairan bening yang sangat mudah terbakar dan terpantik. Bila tercampur dengan udara sangat mudah meledak dan sangat reaktif bila dipanaskan. Pengangkutan isoprena memerlukan penanganan khusus. Secara i...

الممثل الدائم للمغرب لدى الأمم المتحدة عمر هلال  عن المنصب المعين محمد السادس بن الحسن  تأسيس المنصب 1956  الموقع الرسمي الموقع الرسمي  تعديل مصدري - تعديل   يتولى السفير والممثل الدائم للمغرب لدى الأمم المتحدة مهمة رئاسة الوفد المغربي لدى الأمم المتحدة في نيويو...

 

 

Синелобый амазон Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:ЧетвероногиеКлада:АмниотыКлада:ЗавропсидыКласс:Пт�...

 

 

Bakteri lipofilik adalah sejenis bakteri yang memiliki lipofilisitas, dapat berkembang biak dalam lipid (zat lemak yang tidak larut dalam air, tetapi umumnya larut dalam alkohol dan eter). Risiko kesehatan Kebanyakan material/perlengkapan di laboratorium dan pusat kesehatan memiliki sejumlah kecil lipid pada permukaan bakteri ini dan dengan demikian dapat mendukung proliferasi bakteri lipofilik,[1] tetapi karena bakteri ini tidak patogenik,[2] hal tersebut bukan merupakan suat...

Untuk film Denmark 2011, lihat ID A. IdaPoster filmSutradaraPaweł PawlikowskiProduser Eric Abraham Piotr Dzięcioł Ewa Puszczyńska Ditulis oleh Rebecca Lenkiewicz Paweł Pawlikowski Pemeran Agata Kulesza Agata Trzebuchowska Dawid Ogrodnik Penata musikKristian Eidnes AndersenSinematografer Łukasz Żal Ryszard Lenczewski PenyuntingJarosław KamińskiPerusahaanproduksi Canal+ Polska Institut Film Denmark Eurimages Distributor Solopan (Polandia) Memento Films (Prancis) Artificial Eye (U...

 

 

1841–1946 kingdom on northern Borneo Kingdom of Sarawak redirects here. For the former sultanate, see Sultanate of Sarawak. Raj of SarawakKerajaan Sarawak1841–1946 Flag Coat of arms Motto: Latin: Dum Spiro Spero[1][2](While I breathe, I hope)[2]Anthem: Gone Forth Beyond the Sea The Raj in the 1920sStatusIndependent sovereign state (1841–1888)Independent Protected State[3] (1888–1946)CapitalKuchingCommon languagesEnglish, Iban, Melanau, Bi...

 

 

2020年夏季奥林匹克运动会波兰代表團波兰国旗IOC編碼POLNOC波蘭奧林匹克委員會網站olimpijski.pl(英文)(波兰文)2020年夏季奥林匹克运动会(東京)2021年7月23日至8月8日(受2019冠状病毒病疫情影响推迟,但仍保留原定名称)運動員206參賽項目24个大项旗手开幕式:帕维尔·科热尼奥夫斯基(游泳)和马娅·沃什乔夫斯卡(自行车)[1]闭幕式:卡罗利娜·纳亚(皮划艇)&#...

2020年夏季奥林匹克运动会波兰代表團波兰国旗IOC編碼POLNOC波蘭奧林匹克委員會網站olimpijski.pl(英文)(波兰文)2020年夏季奥林匹克运动会(東京)2021年7月23日至8月8日(受2019冠状病毒病疫情影响推迟,但仍保留原定名称)運動員206參賽項目24个大项旗手开幕式:帕维尔·科热尼奥夫斯基(游泳)和马娅·沃什乔夫斯卡(自行车)[1]闭幕式:卡罗利娜·纳亚(皮划艇)&#...

 

 

1959 Christmas carol Cover of original 1959 edition of sheet music of Little Donkey Little Donkey is a popular Christmas carol, written by British songwriter Eric Boswell in 1959, which describes the journey by Mary the mother of Jesus to Bethlehem on the donkey of the title.[1] The first version to chart was by Gracie Fields, followed a fortnight later by The Beverley Sisters, who overtook her in the charts by Christmas. The song became No. 1 in the UK Sheet Music Chart[2] fr...

 

 

Species of flowering plant Penstemon barrettiae Conservation status Imperiled  (NatureServe) Scientific classification Kingdom: Plantae Clade: Tracheophytes Clade: Angiosperms Clade: Eudicots Clade: Asterids Order: Lamiales Family: Plantaginaceae Genus: Penstemon Species: P. barrettiae Binomial name Penstemon barrettiaeA.Gray Penstemon barrettiae is a species of flowering plant in the plantain family known by the common name Barrett's beardtongue or Barrett's penstemon. It is endemi...

Ancient Greek historian For other uses, see Polybius (disambiguation). PolybiusThe stele of Kleitor depicting Polybius, Hellenistic art, 2nd century BC, Museum of Roman Civilization[1]Bornc. 200 BCMegalopolis, ArcadiaDiedc. 118 BC (aged approx. 82)Roman GreeceNationalityGreekOccupationHistorianNotable workThe Histories (events of the Roman Republic, 220–146 BC)Main interestsHistory, philosophy of historyNotable ideasAnacyclosis Polybius (/pəˈlɪbiəs/; Greek: Πολύ�...

 

 

1930 film Sinkin' in the BathtubDirected byHugh Harman and Rudolf Ising (both uncredited)Story byIsadore Freleng (uncredited)Produced byHugh HarmanRudolf IsingAssociate producer:Leon SchlesingerStarringCarman MaxwellRochelle Hudson (both uncredited)Music byMusical Score and Direction by:Frank MarsalesAnimation byIsadore FrelengUncredited animators:Rollin HamiltonNorm BlackbornCarman MaxwellPaul J. SmithBen CloptonHugh HarmanRudolf IsingPainted and traced by:Robert McKimson (uncredited)Lay...

 

 

Inaugural ICC Women's World Twenty20 competition Cricket tournament 2009 ICC Women's World Twenty20Dates11 – 21 June 2009Administrator(s)International Cricket CouncilCricket formatWomen's Twenty20 InternationalTournament format(s)Group stage and KnockoutHost(s) EnglandChampions England (1st title)Runners-up New ZealandParticipants8Matches15Player of the series Claire TaylorMost runs Aimee Watkins (200)Most wickets Holly Colvin (9)2010 → South Africa in the field during ...

كارلو الثاني دوق بارما (بالإيطالية: Carlo II di Parma)‏  معلومات شخصية الميلاد 22 ديسمبر 1799(1799-12-22)مدريد الوفاة 16 أبريل 1883 (83 سنة)نيس مواطنة إسبانيا  الزوجة ماريا تيريزا أميرة سافوي (5 سبتمبر 1820–)  الأولاد كارلو الثالث دوق بارما  [لغات أخرى]‏  الأب لويس الأول ملك إترو...

 

 

نهائي كأس إيطاليا 2021الحدثكأس إيطاليا 2020–21  أتالانتا يوفنتوس 1 2 التاريخ19 مايو 2021الملعبملعب مابي – تشيتا ديل تريكولوري، ريدجو إميليارجل المباراةفيديريكو كييزا (يوفنتوس)[1]الحكمدافيد ماسا[2]الحضور4.300[note 1] → 2020 2022 ← نهائي كأس إيطاليا 2021 (بالإيطالية: Finale Coppa Italia...

 

 

この記事には独自研究が含まれているおそれがあります。 問題箇所を検証し出典を追加して、記事の改善にご協力ください。議論はノートを参照してください。(2013年4月) クライマックスシリーズ > 2011年のパシフィック・リーグクライマックスシリーズ 2011年のパシフィック・リーグクライマックスシリーズ 2011マニュライフ生命クライマックスシリーズ パ MV...

Roman fort in Northern England VindolandaBardon Mill, Northumberland, U.K. Military bathhouse at VindolandaVindolandaCoordinates54°59′28″N 2°21′39″W / 54.9911°N 2.3608°W / 54.9911; -2.3608Grid referencegrid reference NY7766TypeRoman fortSite informationControlled byVindolanda TrustOpen tothe publicYesConditionDerelictWebsitehttp://www.vindolanda.com/ Vindolanda was a Roman auxiliary fort (castrum) just south of Hadrian's Wall in northern Engl...

 

 

Диакритические знаки со сходным начертанием: ˇ · ੱ · ◌ﬞЗапросы «Ĭ» и «Ĕ» перенаправляются сюда. На эти темы нужно создать отдельные статьи. Кратка ◌̆˘ Изображение ◄ ◌̂ ◌̃ ◌̄ ◌̅ ◌̆ ◌̇ ◌̈ ◌̉ ◌̊ ► ◄ ˔ ˕ ˖ ˗ ˘ ˙ ˚ ˛ ˜ ► Характеристики Название ◌̆: combining bre...