Lesk algorithm

Lesk algorithm is a classical algorithm for word sense disambiguation introduced by Michael E. Lesk in 1986.[1] It operates on the premise that words within a given context are likely to share a common meaning. This algorithm compares the dictionary definitions of an ambiguous word with the words in its surrounding context to determine the most appropriate sense. Variations, such as the Simplified Lesk algorithm, have demonstrated improved precision and efficiency. However, the Lesk algorithm has faced criticism for its sensitivity to definition wording and its reliance on brief glosses. Researchers have sought to enhance its accuracy by incorporating additional resources like thesauruses and syntactic models.

Overview

The Lesk algorithm is based on the assumption that words in a given "neighborhood" (section of text) will tend to share a common topic. A simplified version of the Lesk algorithm is to compare the dictionary definition of an ambiguous word with the terms contained in its neighborhood. Versions have been adapted to use WordNet.[2] An implementation might look like this:

  1. for every sense of the word being disambiguated one should count the number of words that are in both the neighborhood of that word and in the dictionary definition of that sense
  2. the sense that is to be chosen is the sense that has the largest number of this count.

A frequently used example illustrating this algorithm is for the context "pine cone". The following dictionary definitions are used:

PINE 
1. kinds of evergreen tree with needle-shaped leaves
2. waste away through sorrow or illness
CONE 
1. solid body which narrows to a point
2. something of this shape whether solid or hollow
3. fruit of certain evergreen trees

As can be seen, the best intersection is Pine #1 ⋂ Cone #3 = 2.

Simplified Lesk algorithm

In Simplified Lesk algorithm,[3] the correct meaning of each word in a given context is determined individually by locating the sense that overlaps the most between its dictionary definition and the given context. Rather than simultaneously determining the meanings of all words in a given context, this approach tackles each word individually, independent of the meaning of the other words occurring in the same context.

"A comparative evaluation performed by Vasilescu et al. (2004)[4] has shown that the simplified Lesk algorithm can significantly outperform the original definition of the algorithm, both in terms of precision and efficiency. By evaluating the disambiguation algorithms on the Senseval-2 English all words data, they measure a 58% precision using the simplified Lesk algorithm compared to the only 42% under the original algorithm.

Note: Vasilescu et al. implementation considers a back-off strategy for words not covered by the algorithm, consisting of the most frequent sense defined in WordNet. This means that words for which all their possible meanings lead to zero overlap with current context or with other word definitions are by default assigned sense number one in WordNet."[5]

Simplified LESK Algorithm with smart default word sense (Vasilescu et al., 2004)[6]

function SIMPLIFIED LESK(word,sentence) returns best sense of word
best-sense <- most frequent sense for word
max-overlap <- 0
context <- set of words in sentence
for each sense in senses of word do
signature <- set of words in the gloss and examples of sense
overlap <- COMPUTEOVERLAP (signature,context)
if overlap > max-overlap then
max-overlap <- overlap
best-sense <- sense

end return (best-sense)

The COMPUTEOVERLAP function returns the number of words in common between two sets, ignoring function words or other words on a stop list. The original Lesk algorithm defines the context in a more complex way.

Criticisms

Unfortunately, Lesk’s approach is very sensitive to the exact wording of definitions, so the absence of a certain word can radically change the results. Further, the algorithm determines overlaps only among the glosses of the senses being considered. This is a significant limitation in that dictionary glosses tend to be fairly short and do not provide sufficient vocabulary to relate fine-grained sense distinctions.

A lot of work has appeared offering different modifications of this algorithm. These works use other resources for analysis (thesauruses, synonyms dictionaries or morphological and syntactic models): for instance, it may use such information as synonyms, different derivatives, or words from definitions of words from definitions.[7]

Lesk variants

  • Original Lesk (Lesk, 1986)
  • Adapted/Extended Lesk (Banerjee and Pederson, 2002/2003): In the adaptive lesk algorithm, a word vector is created corresponds to every content word in the wordnet gloss. Concatenating glosses of related concepts in WordNet can be used to augment this vector. The vector contains the co-occurrence counts of words co-occurring with w in a large corpus. Adding all the word vectors for all the content words in its gloss creates the Gloss vector g for a concept. Relatedness is determined by comparing the gloss vector using the Cosine similarity measure.[8]

There are a lot of studies concerning Lesk and its extensions:[9]

  • Wilks and Stevenson, 1998, 1999;
  • Mahesh et al., 1997;
  • Cowie et al., 1992;
  • Yarowsky, 1992;
  • Pook and Catlett, 1988;
  • Kilgarriff and Rosensweig, 2000;
  • Kwong, 2001;
  • Nastase and Szpakowicz, 2001;
  • Gelbukh and Sidorov, 2004.

See also

References

  1. ^ Lesk, M. (1986). Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In SIGDOC '86: Proceedings of the 5th annual international conference on Systems documentation, pages 24-26, New York, NY, USA. ACM.
  2. ^ Satanjeev Banerjee and Ted Pedersen. An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet, Lecture Notes in Computer Science; Vol. 2276, Pages: 136 - 145, 2002. ISBN 3-540-43219-1
  3. ^ Kilgarriff and J. Rosenzweig. 2000. English SENSEVAL:Report and Results. In Proceedings of the 2nd International Conference on Language Resourcesand Evaluation, LREC, Athens, Greece.
  4. ^ Florentina Vasilescu, Philippe Langlais, and Guy Lapalme. 2004. Evaluating Variants of the Lesk Approach for Disambiguating Words. LREC, Portugal.
  5. ^ Agirre, Eneko & Philip Edmonds (eds.). 2006. Word Sense Disambiguation: Algorithms and Applications. Dordrecht: Springer. www.wsdbook.org
  6. ^ Florentina Vasilescu, Philippe Langlais, and Guy Lapalme. 2004. Evaluating Variants of the Lesk Approach for Disambiguating Words. LREC, Portugal.
  7. ^ Alexander Gelbukh, Grigori Sidorov. Automatic resolution of ambiguity of word senses in dictionary definitions (in Russian). J. Nauchno-Tehnicheskaya Informaciya (NTI), ISSN 0548-0027, ser. 2, N 3, 2004, pp. 10–15.
  8. ^ Banerjee, Satanjeev; Pedersen, Ted (2002-02-17). "An Adapted Lesk Algorithm for Word Sense Disambiguation Using WordNet". Computational Linguistics and Intelligent Text Processing. Lecture Notes in Computer Science. Vol. 2276. Springer, Berlin, Heidelberg. pp. 136–145. CiteSeerX 10.1.1.118.8359. doi:10.1007/3-540-45715-1_11. ISBN 978-3540457152.
  9. ^ Roberto Navigli. Word Sense Disambiguation: A Survey, ACM Computing Surveys, 41(2), 2009, pp. 1–69.

Read other articles:

Edema SerebriSkull MRI (T2 flair) dari metastasis otak disertai edemaInformasi umumNama lainEdema otak,[1] edema serebri, [2] pembengkakan otakAspek klinisGejala dan tandaPusing, mual, muntah, penurunan kesadaran, kejangKondisi serupastrok, hematoma subdural, hematoma epidural, hematoma intraserebral, perdarahan intraventrikular, perdarahan subarachnoid, hidrosefalus, cedera otak traumatis, abses otak, tumor otak, hiponatremia, ensefalopati hepatik Edema serebri atau bisa juga...

 

Book of the Bible For the Indian film also known as Book of Job, see Iyobinte Pustakam. Hebrew Bible (Judaism) Torah (Instruction)GenesisBereshitExodusShemotLeviticusWayiqraNumbersBemidbarDeuteronomyDevarim Nevi'im (Prophets) Former JoshuaYehoshuaJudgesShofetimSamuelShemuelKingsMelakhim Latter IsaiahYeshayahuJeremiahYirmeyahuEzekielYekhezqel Minor Hosea Joel Amos Obadiah Jonah Micah Nahum Habakkuk Zephaniah Haggai Zechariah Malachi Ketuvim (Writings) Poetic PsalmsTehillimProver...

 

This article may have been created or edited in return for undisclosed payments, a violation of Wikipedia's terms of use. It may require cleanup to comply with Wikipedia's content policies, particularly neutral point of view. (August 2020) Lamar State College Port ArthurFormer namesPort Arthur Business College, Port Arthur Collegiate Institute, Port Arthur College, Lamar University Center at Port Arthur, Lamar University-Port Arthur[1]TypePublic community collegeEstablished1909 (...

Idi Amin, pemimpin Uganda yang memerintahkan pengusiran orang India dari negerinya. Pada tanggal 4 Agustus 1972, Presiden Uganda Idi Amin memerintahkan pengusiran orang berketurunan India dari Uganda. Ia memberi mereka waktu 90 hari untuk angkat kaki dari Uganda.[1] Amin mengaku bermimpi bahwa Tuhan memerintahkannya melakukan pengusiran tersebut. Pembersihan etnis India di Uganda dilancarkan dalam suasana yang indofobik. Pemerintah Uganda mendaku bahwa orang India menumpuk kekayaan un...

 

Untuk kegunaan lain, lihat Santo. Nama ini menggunakan cara penamaan Spanyol: nama keluarga pertama atau paternalnya adalah Guzmán dan nama keluarga kedua atau maternalnya adalah Huerta. Rodolfo Guzmán HuertaTopeng yang selalu dipakai Santo selama bertandingNama lahirRodolfo Guzmán HuertaLahir01917-09-2323 September 1917Tulancingo, Hidalgo, MeksikoMeninggal5 Februari 1984(1984-02-05) (umur 66)Sebab meninggalSerangan jantungKarier gulat profesionalNama ringEl Santo Rudy GuzmánEl ...

 

Historic house in Texas, United States This article lacks inline citations besides NRIS, a database which provides minimal and sometimes ambiguous information. Please help ensure the accuracy of the information in this article by providing inline citations to additional reliable sources. (November 2013) (Learn how and when to remove this template message) United States historic placeT. A. Hasler HouseU.S. National Register of Historic Places T. A. Hasler House in 2008T. A. Hasler HouseShow ma...

Opera by Vincenzo Bellini I puritaniOpera by Vincenzo BelliniThe Hall of Arms (act 1, scene 3) in the original 1835 productionLibrettistCarlo PepoliLanguageItalianBased onTêtes Rondes et Cavalieres by Jacques-François Ancelot and Joseph Xavier SaintinePremiere24 January 1835 (1835-01-24)Théâtre-Italien, Paris Vincenzo Bellini I puritani (The Puritans) is an 1835 opera by Vincenzo Bellini. It was originally written in two acts and later changed to three acts on the advice of...

 

Victory ship of WWII SS Red Oak Victory SS Red Oak Victory History United States NameRed Oak Victory NamesakeThe city of Red Oak, Iowa BuilderPermanente Metals Corporation, Richmond, California Yard numberYard No.1 Laid down9 September 1944 Launched7 November 1944 Acquired5 December 1944 Commissioned5 December 1944 Decommissioned21 May 1946 Out of service19 December 1969 Stricken19 July 1946 Identification Hull symbol: AK-235 IMO number: 5291331 Callsign: KYVM FateReleased from the Natio...

 

Biological term Northern gannet pair In biology, a pair bond is the strong affinity that develops in some species between a mating pair, often leading to the production and rearing of young and potentially a lifelong bond. Pair-bonding is a term coined in the 1940s[1] that is frequently used in sociobiology and evolutionary biology circles. The term often implies either a lifelong socially monogamous relationship or a stage of mating interaction in socially monogamous species. It is s...

保良局馬錦明夫人章馥仙中學Po Leung Kuk Mrs.Ma-Cheung Fook Sien College翻漆後的校舍東北面(2022年3月)地址 香港新界離島區大嶼山東涌富東邨类型津貼中學宗教背景無隶属保良局创办日期1997年学区香港離島區東涌校長柯玉琼女士副校长鄭健華先生,劉俊偉先生助理校长梁煥儀女士职员人数56人年级中一至中六学生人数約700人,24個班別校訓愛、敬、勤、誠校歌保良局屬下校歌�...

 

smooth muscle protein/calponin المعرفات الأسماء المستعارة SM22_calponin, IPR003096, Calponin, CNN, Calponins معرفات خارجية أورثولوج الأنواع الإنسان الفأر أنتريه n/a Ensembl n/a n/a يونيبروت n/a n/a RefSeq (مرسال ر.ن.ا.) n/a n/a RefSeq (بروتين) n/a n/a الموقع (UCSC n/a بحث ببمد n/a ويكي بياناتاعرض/عدّل إنسان Calponin هوَ بروتين يُشَفر بواسطة جي�...

 

此條目可参照英語維基百科相應條目来扩充。 (2021年5月6日)若您熟悉来源语言和主题,请协助参考外语维基百科扩充条目。请勿直接提交机械翻译,也不要翻译不可靠、低品质内容。依版权协议,译文需在编辑摘要注明来源,或于讨论页顶部标记{{Translated page}}标签。 约翰斯顿环礁Kalama Atoll 美國本土外小島嶼 Johnston Atoll 旗幟颂歌:《星條旗》The Star-Spangled Banner約翰斯頓環礁�...

Частина серії проФілософіяLeft to right: Plato, Kant, Nietzsche, Buddha, Confucius, AverroesПлатонКантНіцшеБуддаКонфуційАверроес Філософи Епістемологи Естетики Етики Логіки Метафізики Соціально-політичні філософи Традиції Аналітична Арістотелівська Африканська Близькосхідна іранська Буддій�...

 

Italian political party Coraggio Italia PresidentLuigi BrugnaroVice PresidentMichaela BiancofioreFounded27 May 2021; 3 years ago (2021-05-27)Split fromForza ItaliaHeadquartersVia Gaspare Spontini 22, RomeIdeologyLiberal conservatismPolitical positionCentre[1] tocentre-right[1][2][3]National affiliationElectoral list:Us Moderates (2022–2023)Coalition:Centre-right coalitionColours  Fuchsia   IndigoChamber of Deputies1 / 400...

 

European Parliament political group This article is about the current European Parliament Group. For the europarty established in 1992, see Party of European Socialists. S&D redirects here. For other uses, see S&D (disambiguation). Progressive Allianceof Socialists and DemocratsEuropean Parliament groupNameProgressive Allianceof Socialists and DemocratsEnglish abbr.S&D[1](23 June 2009 – present) Older: PES[2](21 April 1993[3] – 22 June 2009)SOC[2 ...

كرونة نروجيةnorsk krone/norsk kronaمعلومات عامةالبلد  النرويجتاريخ الإصدار 1875عوض Norwegian rigsdaler (en) رمز العملة kr.رمز الأيزو 4217 NOKالمصرف المركزي بنك النرويجسعر الصرف 0٫094 دولار أمريكي (9 أبريل 2024)0٫086 يورو (9 أبريل 2024)0٫99 كرونة سويدية (9 أبريل 2024)0٫64 كرونة دنماركية (9 أبريل 2024) تعديل - تعديل مص�...

 

شبه جزيرة لبرادور   الإحداثيات 55°N 69°W / 55°N 69°W / 55; -69   تقسيم إداري  البلد كندا  التقسيم الأعلى نيوفندلاند ولابرادوركيبك  خصائص جغرافية  المساحة 1400000 كيلومتر مربع  تعديل مصدري - تعديل   شبه جزيرة لبرادور (كما تُعرف باسم شبه جزيرة كيبك-لبرادور) �...

 

Questa voce sull'argomento centri abitati della Toscana è solo un abbozzo. Contribuisci a migliorarla secondo le convenzioni di Wikipedia. Abetone Cutiglianocomune Abetone Cutigliano – Veduta LocalizzazioneStato Italia Regione Toscana Provincia Pistoia AmministrazioneCapoluogoCutigliano SindacoGabriele Bacci (lista civica Vivi Abetone Cutigliano) dal 09-06-2024 Data di istituzione1º gennaio 2017 TerritorioCoordinatedel capoluogo44°06′01″N 10°45′23″...

Pour les articles homonymes, voir Convention d'Annapolis. À l’occasion de la conférence d’Annapolis, le Centre Peres pour la Paix finance une distribution du jeu à grande échelle en Israël et en Palestine. La Convention d'Annapolis est une rencontre ayant eu lieu à Annapolis (Maryland) de 12 délégués de cinq États (New Jersey, New York, Pennsylvanie, Delaware et Virginie), appelés pour une Convention Constitutionnelle, en 1786. Le titre officiel est Rencontre de commissaires p...

 

اضغط هنا للاطلاع على كيفية قراءة التصنيف قد مخطط   حالة الحفظ   أنواع غير مهددة أو خطر انقراض ضعيف جدا [1] المرتبة التصنيفية نوع  التصنيف العلمي  فوق النطاق  حيويات مملكة عليا  أبواكيات مملكة  بعديات حقيقية عويلم  كلوانيات مملكة فرعية  ثانويات الفم...