A question-answering implementation, usually a computer program, may construct its answers by querying a structured database of knowledge or information, usually a knowledge base. More commonly, question-answering systems can pull answers from an unstructured collection of natural language documents.
Some examples of natural language document collections used for question answering systems include:
Question-answering research attempts to develop ways of answering a wide range of question types, including fact, list, definition, how, why, hypothetical, semantically constrained, and cross-lingual questions.
Answering questions related to an article in order to evaluate reading comprehension is one of the simpler form of question answering, since a given article is relatively short compared to the domains of other types of question-answering problems. An example of such a question is "What did Albert Einstein win the Nobel Prize for?" after an article about this subject is given to the system.
Closed-book question answering is when a system has memorized some facts during training and can answer questions without explicitly being given a context. This is similar to humans taking closed-book exams.
Closed-domain question answering deals with questions under a specific domain (for example, medicine or automotive maintenance) and can exploit domain-specific knowledge frequently formalized in ontologies. Alternatively, "closed-domain" might refer to a situation where only a limited type of questions are accepted, such as questions asking for descriptive rather than procedural information. Question answering systems in the context of[vague] machine reading applications have also been constructed in the medical domain, for instance related to[vague] Alzheimer's disease.[3]
Open-domain question answering deals with questions about nearly anything and can only rely on general ontologies and world knowledge. Systems designed for open-domain question answering usually have much more data available from which to extract the answer. An example of an open-domain question is "What did Albert Einstein win the Nobel Prize for?" while no article about this subject is given to the system.
Another way to categorize question-answering systems is by the technical approach used. There are a number of different types of QA systems, including
Rule-based systems use a set of rules to determine the correct answer to a question. Statistical systems use statistical methods to find the most likely answer to a question. Hybrid systems use a combination of rule-based and statistical methods.
History
Two early question answering systems were BASEBALL[4] and LUNAR.[5] BASEBALL answered questions about Major League Baseball over a period of one year[ambiguous]. LUNAR answered questions about the geological analysis of rocks returned by the Apollo Moon missions. Both question answering systems were very effective in their chosen domains. LUNAR was demonstrated at a lunar science convention in 1971 and it was able to answer 90% of the questions in its domain that were posed by people untrained on the system. Further restricted-domain question answering systems were developed in the following years. The common feature of all these systems is that they had a core database or knowledge system that was hand-written by experts of the chosen domain. The language abilities of BASEBALL and LUNAR used techniques similar to ELIZA and DOCTOR, the first chatterbot programs.
SHRDLU was a successful question-answering program developed by Terry Winograd in the late 1960s and early 1970s. It simulated the operation of a robot in a toy world (the "blocks world"), and it offered the possibility of asking the robot questions about the state of the world. The strength of this system was the choice of a very specific domain and a very simple world with rules of physics that were easy to encode in a computer program.
In the 1970s, knowledge bases were developed that targeted narrower domains of knowledge. The question answering systems developed to interface with these expert systems produced more repeatable[clarification needed] and valid responses to questions within an area of knowledge. These expert systems closely resembled modern question answering systems except in their internal architecture. Expert systems rely heavily on expert-constructed and organized knowledge bases, whereas many modern question answering systems rely on statistical processing of a large, unstructured, natural language text corpus.
The 1970s and 1980s saw the development of comprehensive theories in computational linguistics, which led to the development of ambitious projects in text comprehension and question answering. One example was the Unix Consultant (UC), developed by Robert Wilensky at U.C. Berkeley in the late 1980s. The system answered questions pertaining to the Unix operating system. It had a comprehensive, hand-crafted knowledge base of its domain, and it aimed at phrasing the answer to accommodate various types of users. Another project was LILOG, a text-understanding system that operated on the domain of tourism information in a German city. The systems developed in the UC and LILOG projects never went past the stage of simple demonstrations, but they helped the development of theories on computational linguistics and reasoning.
Specialized natural-language question answering systems have been developed, such as EAGLi for health and life scientists.[6]
Applications
QA systems are used in a variety of applications, including
Fact-checking if a fact is verified, by posing a question like: is fact X true or false?
As of 2001[update], question-answering systems typically included a question classifier module that determined the type of question and the type of answer.[7]
Different types of question-answering systems employ different architectures. For example, modern open-domain question answering systems may use a retriever-reader architecture. The retriever is aimed at retrieving relevant documents related to a given question, while the reader is used to infer the answer from the retrieved documents. Systems such as GPT-3, T5,[8] and BART[9] use an end-to-end[jargon] architecture in which a transformer-based[jargon] architecture stores large-scale textual data in the underlying parameters. Such models can answer questions without accessing any external knowledge sources.
Question answering methods
Question answering is dependent on a good search corpus; without documents containing the answer, there is little any question answering system can do. Larger collections generally mean better question answering performance, unless the question domain is orthogonal to the collection. Data redundancy in massive collections, such as the web, means that nuggets of information are likely to be phrased in many different ways in differing contexts and documents,[10] leading to two benefits:
If the right information appears in many forms, the question answering system needs to perform fewer complex NLP techniques to understand the text.
Correct answers can be filtered from false positives because the system can rely on versions of the correct answer appearing more times in the corpus than incorrect ones.
The system takes a natural language question as an input rather than a set of keywords, for example: "When is the national day of China?" It then transforms this input sentence into a query in its logical form. Accepting natural language questions makes the system more user-friendly, but harder to implement, as there are a variety of question types and the system will have to identify the correct one in order to give a sensible answer. Assigning a question type to the question is a crucial task; the entire answer extraction process relies on finding the correct question type and hence the correct answer type.
Keyword extraction is the first step in identifying the input question type.[14] In some cases, words clearly indicate the question type, e.g., "Who", "Where", "When", or "How many"—these words might suggest to the system that the answers should be of type "Person", "Location", "Date", or "Number", respectively. POS (part-of-speech) tagging and syntactic parsing techniques can also determine the answer type. In the example above, the subject is "Chinese National Day", the predicate is "is" and the adverbial modifier is "when", therefore the answer type is "Date". Unfortunately, some interrogative words like "Which", "What", or "How" do not correspond to unambiguous answer types: Each can represent more than one type. In situations like this, other words in the question need to be considered. A lexical dictionary such as WordNet can be used for understanding the context.
Once the system identifies the question type, it uses an information retrieval system to find a set of documents that contain the correct keywords. A tagger and NP/Verb Group chunker can verify whether the correct entities and relations are mentioned in the found documents. For questions such as "Who" or "Where", a named-entity recogniser finds relevant "Person" and "Location" names from the retrieved documents. Only the relevant paragraphs are selected for ranking.[clarification needed]
A vector space model can classify the candidate answers. Check[who?] if the answer is of the correct type as determined in the question type analysis stage. An inference technique can validate the candidate answers. A score is then given to each of these candidates according to the number of question words it contains and how close these words are to the candidate—the more and the closer the better. The answer is then translated by parsing into a compact and meaningful representation. In the previous example, the expected output answer is "1st Oct."
Mathematical question answering
An open-source, math-aware, question answering system called MathQA, based on Ask Platypus and Wikidata, was published in 2018.[15] MathQA takes an English or Hindi natural language question as input and returns a mathematical formula retrieved from Wikidata as a succinct answer, translated into a computable form that allows the user to insert values for the variables. The system retrieves names and values of variables and common constants from Wikidata if those are available. It is claimed that the system outperforms a commercial computational mathematical knowledge engine on a test set.[15] MathQA is hosted by Wikimedia at https://mathqa.wmflabs.org/. In 2022, it was extended to answer 15 math question types.[16]
MathQA methods need to combine natural and formula language. One possible approach is to perform supervised annotation via Entity Linking. The "ARQMath Task" at CLEF 2020[17] was launched to address the problem of linking newly posted questions from the platform Math Stack Exchange to existing ones that were already answered by the community. Providing hyperlinks to already answered, semantically related questions helps users to get answers earlier but is a challenging problem because semantic relatedness is not trivial.[18] The lab was motivated by the fact that 20% of mathematical queries in general-purpose search engines are expressed as well-formed questions.[19] The challenge contained two separate sub-tasks. Task 1: "Answer retrieval" matching old post answers to newly posed questions, and Task 2: "Formula retrieval" matching old post formulae to new questions. Starting with the domain of mathematics, which involves formula language, the goal is to later extend the task to other domains (e.g., STEM disciplines, such as chemistry, biology, etc.), which employ other types of special notation (e.g., chemical formulae).[17][18]
The inverse of mathematical question answering—mathematical question generation—has also been researched. The PhysWikiQuiz physics question generation and test engine retrieves mathematical formulae from Wikidata together with semantic information about their constituting identifiers (names and values of variables).[20] The formulae are then rearranged to generate a set of formula variants. Subsequently, the variables are substituted with random values to generate a large number of different questions suitable for individual student tests. PhysWikiquiz is hosted by Wikimedia at https://physwikiquiz.wmflabs.org/.
Progress
Question answering systems have been extended in recent[may be outdated as of April 2023] years to encompass additional domains of knowledge[21] For example, systems have been developed to automatically answer temporal and geospatial questions, questions of definition and terminology, biographical questions, multilingual questions, and questions about the content of audio, images,[22] and video.[23] Current question answering research topics include:
Large Language Models (LLMs)[36] like GPT-4[37], Gemini[38] are examples of successful QA systems that are enabling more sophisticated understanding and generation of text. When coupled with Multimodal[39] QA Systems, which can process and understand information from various modalities like text, images, and audio, LLMs significantly improve the capabilities of QA systems.
^Woods, William A; Kaplan, R. (1977). "Lunar rocks in natural English: Explorations in natural language question answering". Linguistic Structures Processing 5. 5: 521–569.
^Raffel, Colin; Shazeer, Noam; Roberts, Adam; Lee, Katherine; Narang, Sharan; Matena, Michael; Zhou, Yanqi; Li, Wei; Liu, Peter J. (2019). "Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer". arXiv:1910.10683 [cs.LG].
^Lewis, Mike; Liu, Yinhan; Goyal, Naman; Ghazvininejad, Marjan; Mohamed, Abdelrahman; Levy, Omer; Stoyanov, Ves; Zettlemoyer, Luke (2019). "BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension". arXiv:1910.13461 [cs.CL].
^Moldovan, Dan, et al. "Cogex: A logic prover for question answering." Proceedings of the 2003 Conference of the North American Chapter of the Association for Computational Linguistics on Human Language Technology-Volume 1. Association for Computational Linguistics, 2003.
^Zhu, Linchao; Xu, Zhongwen; Yang, Yi; Hauptmann, Alexander G. (2015). "Uncovering Temporal Context for Video Question and Answering". arXiv:1511.04670 [cs.CV].
^Yih, Wen-tau, Xiaodong He, and Christopher Meek. "Semantic parsing for single-relation question answering." Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 2014.
^"BitCrawl by Hobson Lane". Archived from the original on October 27, 2012. Retrieved 2012-05-29.{{cite web}}: CS1 maint: bot: original URL status unknown (link)
John Prager, Eric Brown, Anni Coden, and Dragomir Radev. Question-answering by predictive annotationArchived 2011-08-23 at the Wayback Machine. In Proceedings, 23rd Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Athens, Greece, July 2000.
L. Fortnow, Steve Homer (2002/2003). A Short History of Computational Complexity. In D. van Dalen, J. Dawson, and A. Kanamori, editors, The History of Mathematical Logic. North-Holland, Amsterdam.
Prasasti Saqqara (Inggris: Saqqara Tabletcode: en is deprecated ) sekarang di Egyptian Museum adalah batu prasasti kuno yang diukiir dengan daftar nama firaun Mesir kuno yang berasal dari zaman Ramesside. Ditemukan pada tahun 1861 di Saqqara, Mesir, dalam makam Tjenry (atau Tjuneroy), seorang pejabat (chief lector priest dan Overseer of Works on All Royal Monuments) firaun Ramses II.[1] Inskripsi ini memuat daftar 58 raja, dari Anedjib dan Qa'a (Dinasti pertama) sampai Ramses II (Dina...
Electronic CityJenisPublikKode emitenIDX: ECIIDidirikan1 November 2001; 22 tahun lalu (2001-11-01)PendapatanRp 1.659 Trilyun (2016)Laba bersihRp - 20.721 Milyar (2016)Total asetRp 1.881 Trilyun (2016)Total ekuitasRp 1.727 Trilyun (2016) PT Electronic City Indonesia Tbk adalah perusahaan penjual produk elektronik yang didirikan pada tanggal 1 November 2001. Perusahaan ini merupakan salah satu penjual elektronik terbesar di Indonesia,[1] Electronic City menjadi pelopor retail elekt...
American football offensive formation This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Shotgun formation – news · newspapers · books · scholar · JSTOR (July 2009) (Learn how and when to remove this template message) The Green Bay Packers (left) in the shotgun in a game against the New York Giants in 2007 The ...
العلاقات الإكوادورية البوسنية الإكوادور البوسنة والهرسك الإكوادور البوسنة والهرسك تعديل مصدري - تعديل العلاقات الإكوادورية البوسنية هي العلاقات الثنائية التي تجمع بين الإكوادور والبوسنة والهرسك.[1][2][3][4][5] مقارنة بين البلدين هذه مق�...
Constituency of Bangladesh's Jatiya Sangsad Khulna-3Constituencyfor the Jatiya SangsadDistrictKhulna DistrictDivisionKhulna DivisionElectorate226,281 (2018)[1]Current constituencyCreated1973PartyAwami LeagueMember(s)Monnujan Sufian Khulna-3 is a constituency represented in the Jatiya Sangsad (National Parliament) of Bangladesh since 2008 by Monnujan Sufian of the Awami League. Boundaries The constituency encompasses Khulna City Corporation wards 1 through 15, and two union parishads o...
Sakon Nakhon สกลนคร LambangNegaraThailandProvinsiProvinsi Sakon NakhonPopulasi (2017)[1] • Total53.237 • Kepadatan976,11/km2 (252,810/sq mi)Zona waktuUTC+7 (Thailand) Sakon Nakhon (Thai: สกลนครcode: th is deprecated ) adalah sebuah kota metropolitan (thesaban nakhon) di Isan, kawasan timur-laut Thailand, dan merupakan ibu kota Provinsi Sakon Nakhon. Lokasi Kota ini memiliki populasi sebesar 53,237. Lokasi geografis di 17°9′2...
For other uses, see So Many Ways (disambiguation). 1996 studio album by The BraxtonsSo Many WaysStudio album by The BraxtonsReleasedAugust 6, 1996 (original version)November 19, 1996 (later pressings)RecordedNovember 1995 – April 1996Length59:20LabelAtlanticProducerJermaine DupriTricky StewartSean Sep HallDaryl SimmonsAllen Allstar GordonDonald ParksEmmanuel OfficerJohn HowcottKenny GonzalezLittle Louie VegaThe Braxtons chronology So Many Ways(1996) Braxton Family Christmas(2015) Si...
Bupati PurbalinggaLambang resmi Kabupaten PurbalinggaPetahanaDyah Hayuning Pratiwisejak 26 Februari 2021Masa jabatan5 tahunDibentuk1759Pejabat pertamaR. Tumenggung Dipayuda IIISitus webpurbalinggakab.go.id Berikut adalah Daftar Bupati Purbalingga dari masa ke masa. No Bupati Mulai menjabat Akhir menjabat Prd. Ket. Wakil Bupati 1 Raden Tumenggung Dipoyudo III 1759 1787 1 — 2 Raden Tumenggung Dipokusumo I 1792 1811 2 3 Raden Mas Tumenggung Brotosudiro 1811 1831 3 4 Raden Mas Tumenggung A...
Municipality in Aukštaitija, LithuaniaUkmergė District Municipality Ukmergės rajono savivaldybėMunicipality Coat of armsLocation of Ukmergė district municipalityMap of municipalityCountry LithuaniaEthnographic regionAukštaitijaCounty Vilnius CountyCapitalUkmergėElderships List DeltuvaLyduokiaiPabaiskasPivonijaSiesikaiLiaušiaiTaujėnaiUkmergėVepriaiVidiškiaiŽelvaŽemaitkiemis Area • Total1,395 km2 (539 sq mi)Population (2021[1]) •&...
Сельское поселение России (МО 2-го уровня)Новотитаровское сельское поселение Флаг[d] Герб 45°14′09″ с. ш. 38°58′16″ в. д.HGЯO Страна Россия Субъект РФ Краснодарский край Район Динской Включает 4 населённых пункта Адм. центр Новотитаровская Глава сельского пос�...
Formula One designer For the Australian rules footballer, see Geoff Willis (footballer). Geoff WillisBorn (1959-12-23) 23 December 1959 (age 64)Nationality BritishAlma materUniversity of Cambridge – Mechanical EngineeringOccupationFormula One Director of Digital Engineering TransformationEmployerMercedes AMG Petronas F1 TeamKnown forTechnology Director of F1 W05 Hybrid, F1 W06 Hybrid & F1 W07 Hybrid Geoffrey Willis (born 23 December 1959) is a British engineer and the Dir...
Iraqi politician (1936–2015) For other people named Tariq Aziz, see Tariq Aziz (disambiguation). Tariq Aziz طارق عزيزAziz in 2001Deputy Prime Minister of IraqIn office16 July 1979 – 9 April 2003Minister of Foreign AffairsIn office11 November 1983 – 19 December 1991PresidentSaddam HusseinPreceded bySa'dun HammadiSucceeded byMohammed Saeed al-SahhafMember of the Revolutionary Command CouncilIn office16 July 1979 – 9 April 2003Member of the Regional Com...
Deadly BlessingPosterSutradaraWes CravenProduserWilliam Gilmore,Patricia Herskovic,Max A. Keller,Micheline H. KellerSkenarioGlenn M. BenestMatthew BarrWes CravenCeritaGlenn M. BenestMatthew BarrPemeranMaren Jensen,Sharon Stone,Susan Buckner,Jeff EastPenata musikJames HornerSinematograferRobert C. JessupPenyuntingRichard BrackenDistributorUnited Artists (USA, theatrical),Embassy Home Entertainment (video)Tanggal rilis 14 Agustus 1981 (1981-08-14) [1]Durasi100 menitNegaraAmer...
17°08′09.3″N 42°26′43.6″E / 17.135917°N 42.445444°E / 17.135917; 42.445444 مدينة عثر التاريخيةمعلومات عامةنوع المبنى مدينة تاريخيةالمكان محافظة صبيا، منطقة جازانالبلد السعوديةأبرز الأحداثالهدم القرن 11 معلومات أخرىالإحداثيات 19°N 44°E / 19°N 44°E / 19; 44 تعديل - تعديل مصدري - تعدي�...
Родольфо ГраціаніRodolfo Graziani Родольфо Граціані Прапор Міністр оборони Італійської соціальної республіки Прапор 23 вересня 1943 — 25 квітня 1945 Попередник: посада запроваджена Наступник: посада скасована Прапор Генерал-губернатор Італійської Лівії Прапор 1 липня 1940 �...
This article is part of a series onOdisha Governance Governors Chief Ministers Legislative Assembly Political parties High Court Police Topics Arts Cinema Cuisine Culture Odia Hindu wedding Economy Education Elections Festivals Flora and fauna Geography Highest point History Historic sites Maritime history Rulers Language Script Act Literature Grammar People Tribes Odissi (dance) Odissi music Politics Sports Tourism Districts Divisions Angul Balangir Balasore Bargarh Bhadrak Boudh Cuttack De...