Query understanding

Query understanding is the process of inferring the intent of a search engine user by extracting semantic meaning from the searcher’s keywords.[1] Query understanding methods generally take place before the search engine retrieves and ranks results. It is related to natural language processing but specifically focused on the understanding of search queries.

Methods

Stemming and lemmatization

Many languages inflect words to reflect their role in the utterance they appear in. The variation between various forms of a word is likely to be of little importance for the relatively coarse-grained model of meaning involved in a retrieval system, and for this reason the task of conflating the various forms of a word is a potentially useful technique to increase recall of a retrieval system.[2]

Stemming algorithms, also known as stemmers, typically use a collection of simple rules to remove suffixes intended to model the language’s inflection rules.[3]

For some languages, there are simple lemmatisation methods to reduce a word in query to its lemma or root form or its stem; for others, this operation involves non-trivial string processing and may require recognizing the word's part of speech or referencing a lexical database.

The effectiveness of stemming and lemmatization varies across languages. [4] [5]

Query Segmentation

Query segmentation is a key component of query understanding, aiming to divide a query into meaningful segments. Traditional approaches, such as the bag-of-words model, treat individual words as independent units, which can limit interpretative accuracy. For languages like Chinese, where words are not separated by spaces, segmentation is essential, as individual characters often lack standalone meaning. Even in English, the BOW model may not capture the full meaning, as certain phrases—such as "New York"—carry significance as a whole rather than as isolated terms. By identifying phrases or entities within queries, query segmentation enhances interpretation, enabling search engines to apply proximity and ordering constraints, ultimately improving search accuracy and user satisfaction.[6]

Entity recognition

Entity recognition is the process of locating and classifying entities within a text string. Named-entity recognition specifically focuses on named entities, such as names of people, places, and organizations. In addition, entity recognition includes identifying concepts in queries that may be represented by multi-word phrases. Entity recognition systems typically use grammar-based linguistic techniques or statistical machine learning models.[7]

Query rewriting

Query rewriting is the process of automatically reformulating a search query to more accurately capture its intent. Query expansion adds additional query terms, such as synonyms, in order to retrieve more documents and thereby increase recall. Query relaxation removes query terms to reduce the requirements for a document to match the query, thereby also increasing recall. Other forms of query rewriting, such as automatically converting consecutive query terms into phrases and restricting query terms to specific fields, aim to increase precision.

Spelling Correction

Automatic spelling correction is a critical feature of modern search engines, designed to address common spelling errors in user queries. Such errors are especially frequent as users often search for unfamiliar topics. By correcting misspelled queries, search engines enhance their understanding of user intent, thereby improving the relevance and quality of search results and overall user experience.[8]

References

  1. ^ "Association for Computing Machinery (ACM) Special Interest Group on Information Retrieval (SIGIR) 2010 Workshop on Query Representation and Understanding" (PDF).
  2. ^ Lowe, Thomas; Roberts, David; Kurtz, Peterdate=1973. Additional Text Processing for On-Line Retrieval (The RADCOL System). Volume 1. DTIC Document.{{cite book}}: CS1 maint: numeric names: authors list (link) Lennon, Martin; Peirce, David; Tarry, Brian D; Willett, Peter (1981). "An evaluation of some conflation algorithms for information retrieval". Information Scientist. 3 (4). SAGE.
  3. ^ Lovins, Julie (1968). Development of a stemming algorithm. MIT Information Processing Group.
  4. ^ Harman, Donna (1991). "How Effective is Suffixing?". Journal of the American Society for Information Science. 42 (1): 7–15. doi:10.1002/(SICI)1097-4571(199101)42:1<7::AID-ASI2>3.0.CO;2-P.
  5. ^ Popovic, Mirkoc; Willett, Peter (1981). "The effectiveness of stemming for natural-language access to Slovene textual data". Information Scientist. 3 (4). SAGE.
  6. ^ Li, Hang; Xu, Jun; Zhang, Min (2021). Query Understanding for Search Engines. Springer.
  7. ^ "A Survey of Named Entity Recognition and Classification" (PDF).
  8. ^ Li, Hang; Xu, Jun; Zhang, Min (2021). Query Understanding for Search Engines. Springer.

Read other articles:

1933 film The BoweryTheatrical release posterDirected byRaoul WalshWritten byHoward EstabrookJames GleasonBased onChuck Connors(novel)by Michael L. SimmonsBessie Roth SolomonProduced byDarryl F. ZanuckJoseph M. SchenckStarringWallace BeeryGeorge RaftJackie CooperFay WrayCinematographyBarney McGillEdited byAllen McNeilMusic byAlfred NewmanProductioncompanyTwentieth Century PicturesDistributed byUnited ArtistsRelease date October 7, 1933 (1933-10-07) Running time92 minutesCountry...

 

 

Serie AMusim2013–14Tanggal28 September 2013 sampai 17 Mei 2014JuaraBresciaDegradasiChiasiellis Valpolicella Inter Milan Napoli Perugia ScaleseLiga ChampionsBrescia TorresJumlah pertandingan240Jumlah gol805 (3,35 per pertandingan)Pencetak golterbanyak Patrizia Panico (43 gol)← 2012–13 2014–15 → Serie A (wanita) 2013–14 merupakan edisi ke–47 dari kompetisi sepak bola tertinggi di Italia. Seperti pada musim sebelumnya, kompetisi ini terdiri dari 16 tim. Empat tim terdegradasi,...

 

 

Overview of health care finance in the United States This article is part of a series onHealthcare reform in theUnited States History Debate Legislation Preceding Social Security Amendments of 1965 EMTALA (1986) HIPAA (1996) Medicare Modernization Act (2003) PSQIA (2005) Superseded Affordable Health Care for America (H.R. 3962) America's Affordable Health Choices (H.R. 3200) Baucus Health Bill (S. 1796) Proposed American Health Care Act (2017) Medicare for All Act (2021, H.R. 1976) Healthy Am...

Eukaryotic membrane-bounded organelle containing DNA HeLa cells stained for nuclear DNA with the blue fluorescent Hoechst dye. The central and rightmost cells are in interphase, thus their entire nuclei are labeled. On the left, a cell is going through mitosis and its DNA has condensed. Cell biologyAnimal cell diagramComponents of a typical animal cell: Nucleolus Nucleus Ribosome (dots as part of 5) Vesicle Rough endoplasmic reticulum Golgi apparatus (or, Golgi body) Cytoskeleton Smooth endop...

 

 

Irish singer Nadine CoyleCoyle in 2004BornNadine Elizabeth Louise Coyle (1985-06-15) 15 June 1985 (age 38)Derry, Northern IrelandNationalityIrish[1]EducationThornhill CollegeOccupationsSingermodelYears active1999–presentChildren1Musical careerGenresPopelectropopR&BInstrument(s)VocalsLabelsPolydorFascinationBlack PenVirgin EMIWebsiteofficialnadinecoyle.com Musical artist Nadine Elizabeth Louise Coyle (born 15 June 1985) is an Irish singer. In 2002, she was selected as a...

 

 

この項目には、一部のコンピュータや閲覧ソフトで表示できない文字が含まれています(詳細)。 数字の大字(だいじ)は、漢数字の一種。通常用いる単純な字形の漢数字(小字)の代わりに同じ音の別の漢字を用いるものである。 概要 壱万円日本銀行券(「壱」が大字) 弐千円日本銀行券(「弐」が大字) 漢数字には「一」「二」「三」と続く小字と、「壱」「�...

Taman Nasional Taka BonerateIUCN Kategori II (Taman Nasional)Taka Bone Rate NPLetakSulawesi Selatan, IndonesiaKoordinat6°41′S 121°9′E / 6.683°S 121.150°E / -6.683; 121.150Koordinat: 6°41′S 121°9′E / 6.683°S 121.150°E / -6.683; 121.150Luas5,307 km²Didirikan1992[1] Taman Nasional Taka Bonerate adalah taman laut yang mempunyai kawasan atol terbesar ketiga di dunia[2][3] setelah Kwajifein di Kepulauan Marshal...

 

 

兜造りの古民家(東京都檜原村) 山梨県富士河口湖町、「西湖いやしの里根場」に再現された兜造りの古民家群 前兜造りの古民家(群馬県吾妻郡中之条町、国指定重要文化財 富沢家住宅) 兜造り(かぶとづくり)は、日本の民家における屋根形式の一つである。かつて日本の武士が用いた兜に似ていることから名付けられた。 解説 基本的には寄棟造あるいは入母屋...

 

 

一中同表,是台灣处理海峡两岸关系问题的一种主張,認為中华人民共和国與中華民國皆是“整個中國”的一部份,二者因為兩岸現狀,在各自领域有完整的管辖权,互不隶属,同时主張,二者合作便可以搁置对“整个中國”的主权的争议,共同承認雙方皆是中國的一部份,在此基礎上走向終極統一。最早是在2004年由台灣大學政治学教授張亞中所提出,希望兩岸由一中各表�...

American black music festival Afropunk Festival2013 Afropunk FestivalYears active2005-PresentWebsiteafropunk.com Afropunk Festival is an annual arts festival that features music, film, fashion, and art produced by alternative black artists. The Afropunk Festival began in 2005, at the Brooklyn Academy of Music in New York. By 20218 Afropunk Festivals had also been held in various major cities, including Atlanta, Paris, France, London, UK, Salvador, Brazil, Dakar, Senegal, and Johannesburg, Sou...

 

 

Species of frog Green mountain frog Conservation status Data Deficient  (IUCN 3.1)[1] Scientific classification Domain: Eukaryota Kingdom: Animalia Phylum: Chordata Class: Amphibia Order: Anura Family: Ranidae Genus: Odorrana Species: O. livida Binomial name Odorrana livida(Blyth, 1856) Synonyms[2] Polypedates lividus Blyth, 1856 Huia livida (Blyth, 1856) Rana livida (Blyth, 1856) Odorrana livida, also known as the green mountain frog, green cascade frog, Tenasserim ...

 

 

Eurovision Song Contest 2008Country SloveniaNational selectionSelection processEMA 2008Selection date(s)Semi-finals:1 February 20082 February 2008Final:3 February 2008Selected entrantRebeka DremeljSelected songVrag naj vzameSelected songwriter(s)Josip Miani-PipiIgor Amon MazulFinals performanceSemi-final resultFailed to qualify (11th)Slovenia in the Eurovision Song Contest ◄2007 • 2008 • 2009► Slovenia participated in the Eurovision Song Contest 200...

Pyrophosphate leaving group in a condensation reaction forming the ribose-phosphate polymer. Condensation of Adenine and Guanine forming a phosphodiester bond, the basis of the nucleic acid backbone. DNA condensation refers to the process of compacting DNA molecules in vitro or in vivo.[1] Mechanistic details of DNA packing are essential for its functioning in the process of gene regulation in living systems. Condensed DNA often has surprising properties, which one would not predict f...

 

 

Cantone di La Fèreex cantoneCanton de La Fère LocalizzazioneStato Francia Regione Piccardia Dipartimento Aisne ArrondissementLaon AmministrazioneCapoluogoLa Fère Data di soppressione21 febbraio 2014 TerritorioCoordinatedel capoluogo49°40′N 3°22′E49°40′N, 3°22′E (Cantone di La Fère) Superficie154,52 km² Abitanti12 259 (2008) Densità79,34 ab./km² Comuni20 Altre informazioniFuso orarioUTC+1 Codice INSEE0214 CartografiaLa Fère Cantone di La Fère ...

 

 

Cyrillic Internet country code top-level domain for the Russian Federation .рфIntroduced13 May 2010; 14 years ago (2010-05-13)TLD typeInternationalised (Cyrillic) country code top-level domainStatusActiveRegistryCoordination Center for TLD RU/РФIntended useEntities connected with  RussiaActual useActive / Limited registrationRegistered domains900,058 (February 2016)[1]Registration restrictionsIntended for Cyrillic domain names only.[2]DNS namexn--p1a...

Former position in British government This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Secretary of State for India – news · newspapers · books · scholar · JSTOR (January 2023) (Learn how and when to remove this message) Secretary of State for IndiaRoyal Arms as used by His Majesty's GovernmentIndia OfficeMem...

 

 

Building in D.C., United StatesThe House of the TempleHouse of the Temple in 2008Location within Washington, D.C.General informationArchitectural styleAmerican NeoclassicismAddress1733 16th St NWTown or cityWashington, D.C.CountryUnited StatesCoordinates38°54′50″N 77°02′09″W / 38.9138°N 77.0359°W / 38.9138; -77.0359Construction startedOctober 18, 1911CompletedOctober 18, 1915ClientScottish Rite of FreemasonryDesign and constructionArchitect(s)John Russell P...

 

 

Espoo Esbo (sv) Armoiries De haut en bas et de gauche à droite:centre de Matinkylä, rue à Espoonlahti, Université Aalto, Siège de Fortum, Cathédrale d'Espoo, Tapiola et le Centre culturel d'Espoo. Administration Pays Finlande Région Uusimaa Maire Jukka Mäkelä Code postal 42 codes compris entre 02100 et 02980 Indicatif téléphonique 09 Langue(s) parlée(s) finnois : 83,6 % (officielle)suédois : 8,3 % (officielle)autres : 8 % Démographie Population 300...

1960 Puerto Rican general election ← 1956 November 8, 1960 1964 → Gubernatorial electionTurnout84.60%   Nominee Luis Muñoz Marín Luis A. Ferré Salvador Perea Roselló Party Popular Democratic PER PAC Popular vote 459,759 253,242 52,275 Percentage 58.23% 32.08% 6.62% Results by municipalityMuñoz:      40-50%      50-60%      60-70%      70-80%Ferré:  &#...

 

 

Right and acceptance of an authority Not to be confused with Legitimacy (family law). John Locke, who argued that consent of the governed confers political legitimacy In political science, legitimacy is the right and acceptance of an authority, usually a governing law or a regime. Whereas authority denotes a specific position in an established government, the term legitimacy denotes a system of government—wherein government denotes sphere of influence. An authority viewed as legitimate ofte...