Share to: share facebook share twitter share wa share telegram print page

Maximum-entropy Markov model

In statistics, a maximum-entropy Markov model (MEMM), or conditional Markov model (CMM), is a graphical model for sequence labeling that combines features of hidden Markov models (HMMs) and maximum entropy (MaxEnt) models. An MEMM is a discriminative model that extends a standard maximum entropy classifier by assuming that the unknown values to be learnt are connected in a Markov chain rather than being conditionally independent of each other. MEMMs find applications in natural language processing, specifically in part-of-speech tagging[1] and information extraction.[2]

Model

Suppose we have a sequence of observations that we seek to tag with the labels that maximize the conditional probability . In a MEMM, this probability is factored into Markov transition probabilities, where the probability of transitioning to a particular label depends only on the observation at that position and the previous position's label[citation needed]:

Each of these transition probabilities comes from the same general distribution . For each possible label value of the previous label , the probability of a certain label is modeled in the same way as a maximum entropy classifier:[3]

Here, the are real-valued or categorical feature-functions, and is a normalization term ensuring that the distribution sums to one. This form for the distribution corresponds to the maximum entropy probability distribution satisfying the constraint that the empirical expectation for the feature is equal to the expectation given the model:

The parameters can be estimated using generalized iterative scaling.[4] Furthermore, a variant of the Baum–Welch algorithm, which is used for training HMMs, can be used to estimate parameters when training data has incomplete or missing labels.[2]

The optimal state sequence can be found using a very similar Viterbi algorithm to the one used for HMMs. The dynamic program uses the forward probability:

Strengths and weaknesses

An advantage of MEMMs rather than HMMs for sequence tagging is that they offer increased freedom in choosing features to represent observations. In sequence tagging situations, it is useful to use domain knowledge to design special-purpose features. In the original paper introducing MEMMs, the authors write that "when trying to extract previously unseen company names from a newswire article, the identity of a word alone is not very predictive; however, knowing that the word is capitalized, that is a noun, that it is used in an appositive, and that it appears near the top of the article would all be quite predictive (in conjunction with the context provided by the state-transition structure)."[2] Useful sequence tagging features, such as these, are often non-independent. Maximum entropy models do not assume independence between features, but generative observation models used in HMMs do.[2] Therefore, MEMMs allow the user to specify many correlated, but informative features.

Another advantage of MEMMs versus HMMs and conditional random fields (CRFs) is that training can be considerably more efficient. In HMMs and CRFs, one needs to use some version of the forward–backward algorithm as an inner loop in training[citation needed]. However, in MEMMs, estimating the parameters of the maximum-entropy distributions used for the transition probabilities can be done for each transition distribution in isolation.

A drawback of MEMMs is that they potentially suffer from the "label bias problem," where states with low-entropy transition distributions "effectively ignore their observations." Conditional random fields were designed to overcome this weakness,[5] which had already been recognised in the context of neural network-based Markov models in the early 1990s.[5][6] Another source of label bias is that training is always done with respect to known previous tags, so the model struggles at test time when there is uncertainty in the previous tag.

References

  1. ^ Toutanova, Kristina; Manning, Christopher D. (2000). "Enriching the Knowledge Sources Used in a Maximum Entropy Part-of-Speech Tagger". Proc. J. SIGDAT Conf. on Empirical Methods in NLP and Very Large Corpora (EMNLP/VLC-2000). pp. 63–70.
  2. ^ a b c d McCallum, Andrew; Freitag, Dayne; Pereira, Fernando (2000). "Maximum Entropy Markov Models for Information Extraction and Segmentation" (PDF). Proc. ICML 2000. pp. 591–598.
  3. ^ Berger, A.L. and Pietra, V.J.D. and Pietra, S.A.D. (1996). "A maximum entropy approach to natural language processing". Computational Linguistics. 22 (1). MIT Press: 39–71.{{cite journal}}: CS1 maint: multiple names: authors list (link)
  4. ^ Darroch, J.N. & Ratcliff, D. (1972). "Generalized iterative scaling for log-linear models". The Annals of Mathematical Statistics. 43 (5). Institute of Mathematical Statistics: 1470–1480. doi:10.1214/aoms/1177692379.
  5. ^ a b Lafferty, John; McCallum, Andrew; Pereira, Fernando (2001). "Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data". Proc. ICML 2001.
  6. ^ Léon Bottou (1991). Une Approche théorique de l'Apprentissage Connexionniste: Applications à la Reconnaissance de la Parole (Ph.D.). Université de Paris XI.

Read other articles:

Senior officer of the British Army during the Second World War Sir William PlattWilliam Platt inspecting troops during the Second World WarNickname(s)The KaidBorn(1885-06-14)14 June 1885Brooklands, Cheshire, EnglandDied28 September 1975(1975-09-28) (aged 90)London, EnglandAllegiance United KingdomService/branch British ArmyYears of service1905–1945RankGeneralService number9000UnitNorthumberland FusiliersWiltshire RegimentCommands held2nd Battalion, Wiltshire Regiment7th Inf…

Синяя корова Синяя корова Продуктивность Мясо-молочное направление Происхождение Страна  Латвия Местность Прибалтика Год XIX век  Медиафайлы на Викискладе Синяя корова (Морская корова, Латвийская светло-голубая корова; латыш. zilā govs) — порода коров. Порода называет…

أردو   تقسيم إداري البلد تركيا  [1] عاصمة لـ أردو  التقسيم الأعلى أردو (1922–)ولاية طرابزون (1867–1922)  خصائص جغرافية إحداثيات 40°59′00″N 37°53′00″E / 40.983333333333°N 37.883333333333°E / 40.983333333333; 37.883333333333  الارتفاع 5 متر  معلومات أخرى المدينة التوأم مرماريسباطومى (26 فب…

Kasih dan AmaraGenre Drama Roman PembuatMD EntertainmentDitulis olehAviv ElhamSkenarioAviv ElhamCeritaPaulus PuiPemeran Sonya Fatmala Galih Ginanjar Tiara Westlake Arie Dwi Andhika Rendy Septino Krisna Mukti Kinaryosih Maya Caroline Conchita Caroline Ferdi Ali Baby Savira Penggubah lagu temaAfganLagu pembukaSadis — AfganLagu penutupSadis — AfganPenata musik Anton BHS & Dicky AJ (Eps. 1 — 183) Iwang Modulus (Eps. 184 — 239) Negara asalIndonesiaBahasa asliBahasa IndonesiaJmlh. mu…

Dieser Artikel oder nachfolgende Abschnitt ist nicht hinreichend mit Belegen (beispielsweise Einzelnachweisen) ausgestattet. Angaben ohne ausreichenden Beleg könnten demnächst entfernt werden. Bitte hilf Wikipedia, indem du die Angaben recherchierst und gute Belege einfügst. Weinberg-Gymnasium Kleinmachnow Vorderansicht des Hauptgebäudes Schulform Gymnasium Gründung 1938 Adresse Am Weinberg 20 14532 Kleinmachnow Land Brandenburg Staat Deutschland Koordinaten 52° 23′ 49″ …

присілок Корчани Корчаны Країна  Росія Суб'єкт Російської Федерації Ленінградська область Муніципальний район Волосовський район Поселення Бегуницьке сільське поселення Код ЗКАТУ: 41206000081 Код ЗКТМО: 41606452126 Основні дані Населення ▲ 64 Поштовий індекс 188425 Телефонний код …

European Judaism redirects here. For the academic journal, see European Judaism (journal). The location of modern-day Europe (dark green) Part of a series onJews and Judaism Etymology Who is a Jew? Religion God in Judaism (names) Principles of faith Mitzvot (613) Halakha Shabbat Holidays Prayer Tzedakah Land of Israel Brit Bar and bat mitzvah Marriage Bereavement Philosophy Ethics Kabbalah Customs Rites Synagogue Rabbi Texts Tanakh Torah Nevi'im Ketuvim Talmud Mishnah Gemara …

De Wereldruiterspelen waren een groot twee weken durend ruitersportevenement dat om de vier jaar gehouden werd in acht verschillende paardensportdisciplines op één locatie. De oorspronkelijke disciplines waren springen, dressuur, mennen, eventing, voltige en endurance. Later kwamen daar reining en dressuur voor gehandicapten bij. Geschiedenis Wereldruiterspelen 2006 in Aken De Wereldruiterspelen werden voor het eerst in 1990, op initiatief van prins Philip georganiseerd. De prins was in de jar…

Logo Pramuka UGM Pramuka UGM adalah salah satu unit kegiatan mahasiswa di tingkat universitas yang mewadahi mahasiswa dengan minat khusus di bidang kepanduan/kepramukaan. Kegiatan kepramukaan menjadi ekstrakurikuler yang ditujukan bagi pemuda usia 18-26 tahun pada golongan Pandega.[1] Sekretariat Pramuka UGM berada di Perumahan Dosen Flat A2 Jalan Yacaranda, Blimbingsari, Caturtunggal, Sleman, Daerah Istimewa Yogyakarta 55281.[2] Berdasarkan Surat Keputusan Kwartir Nasional Nomor…

Chilean politician In this Spanish name, the first or paternal surname is Büchi and the second or maternal family name is Buc. Büchi Buc in 2011, at the library of the National Congress of Chile Hernán Alberto Büchi Buc (Latin American Spanish: [eɾˈnan alˈβeɾto ˈβixi ˈβuk]; born March 6, 1949) is a Chilean economist who served as minister of finance of the Pinochet government. In 1989 he ran unsuccessfully for president with support of Chilean right-wing parties.[…

Hội chứng sợ bẩnTên khácKhiết phích, ám ảnh sạch sẽKhoa/NgànhTâm lý học Hội chứng sợ bẩn hay khiết phích, ám ảnh sạch sẽ (tiếng Anh: mysophobia, verminophobia, germophobia, germaphobia, bacillophobia và bacteriophobia) là một hội chứng sợ ô nhiễm và vi trùng. Thuật ngữ này được William A. Hammond đặt ra vào năm 1879 khi mô tả một trường hợp rối loạn ám ảnh cưỡng chế liên quan đến việc rửa tay …

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Januari 2023. The Book of est adalah akun fiksi dari pelatihan yang dibuat oleh Werner Erhard, ( est ), atau Erhard Seminars Training, pertama kali diterbitkan pada tahun 1976 oleh Holt, Rinehart dan Winston. Buku itu ditulis oleh lulusan terbaik Luke Rhinehart .[1…

Brazilian footballer For the Portuguese footballer, see Bruno Alves. Bruno Alves Bruno Alves with São Paulo in 2018Personal informationFull name Bruno Fabiano AlvesDate of birth (1991-04-16) 16 April 1991 (age 32)Place of birth Jacareí, BrazilHeight 1.85 m (6 ft 1 in)Position(s) Centre-backTeam informationCurrent team GrêmioNumber 34Youth career Primeira Camisa2010–2011 FigueirenseSenior career*Years Team Apps (Gls)2011–2017 Figueirense 103 (7)2011 → Ribeirão (loan)…

Este artículo o sección necesita referencias que aparezcan en una publicación acreditada.Este aviso fue puesto el 6 de diciembre de 2019. Sociedad Nacional de Radio y Televisión Acrónimo SNRTVTipo PrivadaFundación 12 de mayo de 2004Sede central Perú PerúPresidente Michelle Szejer AragonésAsociados Grupo ATVLatina TelevisiónGrupo Plural TVPanamericana TelevisiónCorporación UniversalGrupo RPPCRP RadiosMiembro de CONFIEPSitio web SNRTVCronología Unión Peruana de Radiodifusión ←SNRT…

Italy team at athletics event Sporting event delegationItaly at the2022 World Championships in AthleticsWA codeITANational federationFIDALWebsitewww.fidal.itin EugeneCompetitors60 (29 men, 31 women)MedalsRanked 19th Gold 1 Silver 0 Bronze 1 Total 2 World Championships in Athletics appearances (overview)197619801983198719911993199519971999200120032005200720092011201320152017201920222023 Italy national athletics team competed at the 2022 World Athletics Championships in Eugene, Oregon, from 15 to …

Darío Cvitanich Informasi pribadiNama lengkap Darío CvitanichTanggal lahir 16 Mei 1984 (umur 39)Tempat lahir Baradero, Buenos Aires, ArgentinaTinggi 1,70 m (5 ft 7 in)Posisi bermain PenyerangInformasi klubKlub saat ini NiceNomor 12Karier junior BanfieldKarier senior*Tahun Tim Tampil (Gol)2003–2008 Banfield 92 (37)2008–2012 Ajax 30 (13)2010–2011 → Pachuca (pinjaman) 32 (13)2011–2012 → Boca Juniors (pinjaman) 27 (9)2012– Nice 9 (4) * Penampilan dan gol di klub s…

2023 single by (G)I-dle I DoSingle by (G)I-dlefrom the EP Heat LanguageEnglishReleasedJuly 13, 2023 (2023-07-13)Recorded2023GenreSynth-popLength3:10Label Cube 88rising Songwriter(s) Rogét Chahayed Imad Royal Blaise Railey Drew Love Producer(s) Rogét Chahayed Imad Royal (G)I-dle singles chronology Queencard (2023) I Do (2023) I Want That (2023) Music videoI Do on YouTube I Do is a song by South Korean girl group (G)I-dle. It was released on July 14, 2023, through Cube and 88risin…

Rent-A-Girlfriend is an anime television series adapted from the manga series of the same title written by Reiji Miyajima. A total of three seasons has been produced. The first season aired from July 11 to September 26, 2020, on the Super Animeism programming block on MBS, TBS and other networks.[1][2][a] The second season aired from July 2 to September 17, 2022.[3][4][b] A third season aired from July 8 to September 30, 2023.[5][c]…

Isamu AkasakiIsamu AkasakiNama asal赤崎 勇Lahir(1929-01-30)30 Januari 1929Chiran, Distrik Kawanabe, Prefektur Kagoshima, Kekaisaran JepangMeninggal1 April 2021(2021-04-01) (umur 92)Nagoya, Aichi, JepangKebangsaanJepangAlmamaterUniversitas KyotoUniversitas NagoyaPenghargaanPenghargaan Asahi (2001)Penghargaan Takeda (2002)Penghargaan Kyoto (2009)Medali Edison IEEE (2011)Penghargaan Nobel Fisika (2014)Penghargaan Charles Stark Draper (2015)Karier ilmiahBidangFisika, TeknikInstitusiUniv…

Artikel ini tidak memiliki referensi atau sumber tepercaya sehingga isinya tidak bisa dipastikan. Tolong bantu perbaiki artikel ini dengan menambahkan referensi yang layak. Tulisan tanpa sumber dapat dipertanyakan dan dihapus sewaktu-waktu.Cari sumber: Zainul Arifin Pohan – berita · surat kabar · buku · cendekiawan · JSTOR Ini adalah nama Batak Toba/Angkola, marganya adalah Pohan. Zainul Arifin PohanZainul Arifin sebagai Wakil Perdana Menteri IndonesiaKet…

Kembali kehalaman sebelumnya

Lokasi Pengunjung: 3.15.21.73