Mô hình ngôn ngữ

Một mô hình ngôn ngữ là mô hình xác suất của một ngôn ngữ tự nhiên^[1] có thể tạo ra xác suất của một chuỗi từ, dựa trên ngữ liệu văn bản trong một hoặc nhiều ngôn ngữ mà nó được đào tạo. Năm 1980, mô hình ngôn ngữ thống kê đầu tiên được đề xuất, và trong suốt thập kỷ IBM thực hiện các thử nghiệm theo phong cách 'Shannon', trong đó nhận diện các nguồn tiềm năng để cải thiện mô hình ngôn ngữ thông qua việc quan sát và phân tích khả năng của con người trong việc dự đoán hoặc sửa chữa văn bản.^[2]

Mô hình ngôn ngữ hữu ích cho nhiều tác vụ, bao gồm nhận dạng tiếng nói^[3] (giúp ngăn chặn dự đoán chuỗi có xác suất thấp (ví dụ: chuỗi không có ý nghĩa)), dịch tự động,^[4] sinh ngôn ngữ tự nhiên, nhận dạng ký tự quang học, nhận dạng chữ viết tay,^[5] và truy hồi thông tin.^[6]^[7]

Mô hình ngôn ngữ, hiện tại là hình thức tiên tiến nhất, kết hợp giữa các bộ dữ liệu lớn hơn (thường sử dụng dữ liệu từ internet công khai), mạng thần kinh truyền thẳng, và transformer. Chúng đã thay thế các mô hình dựa trên mạng thần kinh hồi quy, trước đó đã thay thế các mô hình thống kê thuần túy, như mô hình N-gram.

Chú thích

^ Jurafsky, Dan; Martin, James H. (2021). “N-gram Language Models”. Speech and Language Processing (ấn bản thứ 3). Lưu trữ bản gốc ngày 22 tháng 5 năm 2022. Truy cập ngày 24 tháng 5 năm 2022.
^ Rosenfeld, Ronald (2000). “Two decades of statistical language modeling: Where do we go from here?”. Proceedings of the IEEE. 88 (8).
^ Kuhn, Roland, and Renato De Mori (1990). "A cache-based natural language model for speech recognition". IEEE transactions on pattern analysis and machine intelligence 12.6: 570–583.
^ Andreas, Jacob, Andreas Vlachos, and Stephen Clark (2013). "Semantic parsing as machine translation" Lưu trữ 15 tháng 8 2020 tại Wayback Machine. Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers).
^ Pham, Vu, et al (2014). "Dropout improves recurrent neural networks for handwriting recognition" Lưu trữ 11 tháng 11 2020 tại Wayback Machine. 14th International Conference on Frontiers in Handwriting Recognition. IEEE.
^ Ponte, Jay M.; Croft, W. Bruce (1998). A language modeling approach to information retrieval. Proceedings of the 21st ACM SIGIR Conference. Melbourne, Australia: ACM. tr. 275–281. doi:10.1145/290941.291008.
^ Hiemstra, Djoerd (1998). A linguistically motivated probabilistically model of information retrieval. Proceedings of the 2nd European conference on Research and Advanced Technology for Digital Libraries. LNCS, Springer. tr. 569–584. doi:10.1007/3-540-49653-X_34.

Bài viết này vẫn còn sơ khai. Bạn có thể giúp Wikipedia mở rộng nội dung để bài được hoàn chỉnh hơn.

Read other articles:

Christina Cole

Christina ColeCole in September 2010Lahir08 Mei 1982 (umur 41)London, EnglandAlmamaterOxford School of DramaPekerjaanActressTahun aktif2002–present Christina Cole (lahir 8 Mei 1982) adalah seorang aktris film dan televisi asal Inggris. Ia merupakan seorang sarjana seni dari Oxford School of Drama, Inggris. Ia menjadi bintang utama dalam serial TV Hex, yang disiarkan oleh channel SkyOne Inggris. Dalam serial itu ia berperan sebagai Cassie Hughes. Setelah meninggalkan Hex di season...

Gereja Katedral Atambua

Gereja Katedral AtambuaGereja Katedral Santa Maria ImakulataLokasiKabupaten Belu, Nusa Tenggara TimurNegaraIndonesiaDenominasiGereja Katolik RomaArsitekturStatus fungsionalAktifTipe arsitekturGerejaAdministrasiKeuskupanKeuskupan Atambua Gereja Katedral Atambua atau yang bernama lengkap Paroki Katedral Santa Maria Imakulata Atambua adalah sebuah gereja katedral Katolik di Atambua, Nusa Tenggara Timur. Gereja Katedral Atambua didekisasikan untuk gelar Maria, yaitu Immaculata berarti Dikandung T...

Al Qadimah

Al Qadimah (bahasa arab: القضيمة) adalah sebuah desa di Provinsi Makkah, di pantai barat Arab Saudi.[1] Kota ini terletak 94 kilometer di utara Jeddah. Selain Thual, kota ini menjadi proyek kota ekonomi terbaru, King Abdullah University of Science and Technology (KAUST). Al Qadimah terletak di pesisir pantai, disana terdapat beberapa pulau pribadi milik Raja Fahd. Referensi ^ National Geospatial-Intelligence Agency. lbs Provinsi MakkahIbukota: Mekah • Abu Hisani • Abu Qirfa...

Национальный символ

Галльский петух, один из национальных символов Франции[1]. Марка Французской Республики, 1962 Национальный символ — отличительный знак, олицетворяющий государство (государственный символ) или национальное сообщество. Служит для самоидентификации народа, нации[2] ...

Friedrich Schiller

Disambiguazione – Schiller rimanda qui. Se stai cercando altri significati, vedi Schiller (disambigua). Schiller ritratto da Gerhard von Kügelgen. Johann Christoph Friedrich von Schiller (/'jo:han 'kʀɪstɔf 'fʀi:dʀɪç fɔn 'ʃɪlɐ/, pronuncia tedesca ascoltaⓘ) (Marbach am Neckar, 10 novembre 1759 – Weimar, 9 maggio 1805) è stato un poeta, filosofo, drammaturgo, medico e storico tedesco. Indice 1 Biografia 1.1 Anni giovanili (1759-1775) 1.1.1 La famiglia 1.1.2 L'educazio...

Burning of Washington

British naval attack on the United States during the War of 1812 Burning of WashingtonPart of the War of 1812The Capture of the City of Washington shows the burning of Washington, D.C., on August 24, 1814.DateAugust 24, 1814LocationWashington, D.C.38°53′23″N 77°00′32″W / 38.88972°N 77.00889°W / 38.88972; -77.00889Result British victory(see Aftermath section)Belligerents United Kingdom United StatesCommanders and leaders George Cockburn Robert Ro...

Durga Khote

Indian actress (1905–1991) Durga KhoteKhote in Mughal E Azam (1960)BornVita Lad(1905-01-14)14 January 1905Bombay, Bombay Presidency, British India (present-day Mumbai, Maharashtra, India)Died22 September 1991(1991-09-22) (aged 86)Bombay, Maharashtra, IndiaOccupationsActressfilm producerYears active1931–1983FamilyViju Khote (nephew) Shubha Khote(niece) Bhavna Balsavar (grand-niece)Awards BFJA Award for Best Actress Filmfare Award for Best Supporting Actress Honours Padma Shri (19...

Lourdes Maldonado (periodista española)

No debe confundirse con la periodista mexicana, Lourdes Maldonado. Lourdes Maldonado Información personalNombre de nacimiento María Lourdes Maldonado Alconada Nacimiento 3 de mayo de 1973 (50 años)Irún (Guipúzcoa, España) Residencia Irún, Granada y Madrid Nacionalidad EspañolaLengua materna EuskeraCastellanoCaracterísticas físicasAltura 1,59 m FamiliaCónyuge José Antonio Guerrero MartínezHijos Icíar(1/2006)Daniel(12/2008)EducaciónEducada en Facultad de Comunicación de la ...

ソー (オー＝ド＝セーヌ県)

Sceaux 行政国フランス地域圏 (Région) イル＝ド＝フランス地域圏県 (département) オー＝ド＝セーヌ県郡 (arrondissement) アントニー郡小郡 (canton) 小郡庁所在地INSEEコード 92071郵便番号 92330市長（任期）フィリップ・ローラン（2008年-2014年）自治体間連合 (fr) メトロポール・デュ・グラン・パリ人口動態人口 19,679人（2007年）人口密度 5466人/km2住民の呼称 Scéens地理座標北緯48度4...

سلمان رشدي

سلمان رشدي (بالأردوية: سلمان رشدی)‏ رشدي عام 2014 معلومات شخصية اسم الولادة (بالأردوية: احمد سلمان رشدی)‏ الميلاد 19 يونيو 1947 (العمر 76 سنة)مومباي الهند الجنسية بريطاني العرق الشعب الكشميري عضو في الجمعية الملكية للأدب، والأكاديمية الأمريكية للفنون والعلوم ال...

Marxism and the Oppression of Women

1983 book by Lise Vogel Marxism and the Oppression of Women: Toward a Unitary Theory Cover of the first editionAuthorLise VogelLanguageEnglishSeriesHistorical Materialism Book SeriesSubjectMarxist feminismPublisherRutgers University PressPublication date1983Publication placeUnited StatesMedia typePrint (Hardcover and Paperback)Pages231 (1987 edition)266 (2014 edition)ISBN978-1-60846-340-4 Marxism and the Oppression of Women: Toward a Unitary Theory (1983; revised edition 2013) is a book ...

Костел Пресвятого Імені Пресвятої Діви Марії (Великі Чорнокінці)

Костел Пресвятого Імені Пресвятої Діви Маріїу Великих Чорнокінцях Костел Пресвятого Імені Пресвятої Діви Марії 48°57′50″ пн. ш. 26°02′23″ сх. д. / 48.96405555558332878° пн. ш. 26.03988888891666775° сх. д. / 48.96405555558332878; 26.03988888891666775Координати: 48°57′50″ пн. ш. 26°02...

Kyle Naughton

Kyle Naughton Kyle Naughton bermain untuk Leicester CityInformasi pribadiNama lengkap Kyle NaughtonTanggal lahir 17 November 1988 (umur 35)Tempat lahir Sheffield, EnglandTinggi 1,80 m (5 ft 11 in)Posisi bermain Bek KananInformasi klubKlub saat ini Swansea CityNomor 26Karier junior1996–2008 Sheffield UnitedKarier senior*Tahun Tim Tampil (Gol)2008–2009 Sheffield United 40 (1)2008 → Gretna (pinjaman) 18 (0)2009–2015 Tottenham Hotspur 42 (0)2010 → Middlesbrough (pin...

Iron Duke engine

This article is about the automobile engine. For the steam locomotive, see GWR Iron Duke Class. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Iron Duke engine – news · newspapers · books · scholar · JSTOR (January 2014) (Learn how and when to remove this message) Reciprocating internal combustion engine Ir...

نادي راتشابوري ميتر فول

نادي راتشابوري ميتر فول تأسس عام 2004 البلد تايلاند الدوري الدوري التايلاندي الممتاز الموقع الرسمي الموقع الرسمي الطقم الأساسي الطقم الاحتياطي الطقم الثالث تعديل مصدري - تعديل نادي راتشابوري ميتر فول لكرة القدم (بالتايلندية: สโมสร ฟุตบอล ราชบุรี...

Cameroon Cricket Federation

Cameroon Cricket FederationSportCricketFounded2005AffiliationInternational Cricket CouncilAffiliation date2007Regional affiliationAfricaLocationYaounde, CameroonOfficial websitewww.facebook.com/Cameroon-Cricket-Federation-Fecacricket-1890549164493544 The Cameroon Cricket Federation (French: Fédération camerounaise de cricket; FECACRICKET) is the official governing body of the sport of cricket in Cameroon since 15 February 2005 and operates Cameroon national cricket teams. Cameroon Cricket ...

Adam Kazanowski

Adam KazanowskiCourt Marshal of the CrownCoat of armsGrzymałaBornca. 1599Polish–Lithuanian CommonwealthDied25 December 1649Polish–Lithuanian CommonwealthBuriedSt. John's Cathedral, WarsawFamilyKazanowskiWifeElżbieta SłuszczankaFatherZygmunt KazanowskiMotherElżbieta Humnicka Adam Kazanowski (c. 1599 – 25 December 1649) was a noble of the Polish–Lithuanian Commonwealth from 1633; Greater Crown Stolnik from 1634; Court Chamberlain (podkomorzy koronny) and castellan of Sandomierz from...

台北駐大阪経済文化弁事処福岡分処

台北駐大阪経済文化弁事処福岡分処臺北駐大阪經濟文化辦事處福岡分處台北駐大阪経済文化弁事処福岡分処（2009年）所在地日本住所福岡県福岡市中央区桜坂3-12-42座標北緯33度34分27秒東経130度23分8.2秒 / 北緯33.57417度東経130.385611度 / 33.57417; 130.385611座標: 北緯33度34分27秒東経130度23分8.2秒 / 北緯33.57417度東経130.385611度 / 33.57417; 130.38561...

Fort Rupert

For the former Hudson's Bay Company trading post on James Bay, see Waskaganish, Quebec. Place in British Columbia, CanadaFort RupertFort Rupert in 1878Fort RupertLocation of Fort Rupert in British ColumbiaCoordinates: 50°41′40″N 127°24′43″W / 50.69444°N 127.41194°W / 50.69444; -127.41194Country CanadaProvince British ColumbiaRegionVancouver IslandRegional DistrictMount WaddingtonArea codes250, 778, 236, & 672 Fort Rupert is the site of a forme...

江淮官話

シナ・チベット語族 > シナ語派 > 中国語 > 官話 > 江淮官話主な分布地域江淮官話（こうわいかんわ、簡体字: 江淮官话、英語：Lower Yangtze Mandarin）は、中国語の北方方言（官話とも呼ばれる）の一種で、主に中国東部の江蘇省の中部、安徽省の中部、湖北省の東部、江西省の北部で話されている。呉語・贛語・中原官話・西南官話からの影響を多�...