Parser combinator

In computer programming, a parser combinator is a higher-order function that accepts several parsers as input and returns a new parser as its output. In this context, a parser is a function accepting strings as input and returning some structure as output, typically a parse tree or a set of indices representing locations in the string where parsing stopped successfully. Parser combinators enable a recursive descent parsing strategy that facilitates modular piecewise construction and testing. This parsing technique is called combinatory parsing.

Parsers using combinators have been used extensively in the prototyping of compilers and processors for domain-specific languages such as natural-language user interfaces to databases, where complex and varied semantic actions are closely integrated with syntactic processing. In 1989, Richard Frost and John Launchbury demonstrated[1] use of parser combinators to construct natural-language interpreters. Graham Hutton also used higher-order functions for basic parsing in 1992[2] and monadic parsing in 1996.[3] S. D. Swierstra also exhibited the practical aspects of parser combinators in 2001.[4] In 2008, Frost, Hafiz and Callaghan[5] described a set of parser combinators in the functional programming language Haskell that solve the long-standing problem of accommodating left recursion, and work as a complete top-down parsing tool in polynomial time and space.

Basic idea

In any programming language that has first-class functions, parser combinators can be used to combine basic parsers to construct parsers for more complex rules. For example, a production rule of a context-free grammar (CFG) may have one or more alternatives and each alternative may consist of a sequence of non-terminal(s) and/or terminal(s), or the alternative may consist of a single non-terminal or terminal or the empty string. If a simple parser is available for each of these alternatives, a parser combinator can be used to combine each of these parsers, returning a new parser which can recognise any or all of the alternatives.

In languages that support operator overloading, a parser combinator can take the form of an infix operator, used to glue different parsers to form a complete rule. Parser combinators thereby enable parsers to be defined in an embedded style, in code which is similar in structure to the rules of the formal grammar. As such, implementations can be thought of as executable specifications with all the associated advantages such as readability.

The combinators

To keep the discussion relatively straightforward, we discuss parser combinators in terms of recognizers only. If the input string is of length #input and its members are accessed through an index j, a recognizer is a parser which returns, as output, a set of indices representing indices at which the parser successfully finished recognizing a sequence of tokens that begin at index j. An empty result set indicates that the recognizer failed to recognize any sequence beginning at index j.

  • The empty recognizer recognizes the empty string. This parser always succeeds, returning a singleton set containing the input index:
  • A recognizer term x recognizes the terminal x. If the token at index j in the input string is x, this parser returns a singleton set containing j + 1; otherwise, it returns the empty set.

Given two recognizers p and q, we can define two major parser combinators, one for matching alternative rules and one for sequencing rules:

  • The ‘alternative’ parser combinator, ⊕, applies each of the recognizers on the same index j and returns the union of the finishing indices of the recognizers:
  • The 'sequence' combinator, ⊛, applies the first recognizer p to the input index j, and for each finishing index applies the second recognizer q with that as a starting index. It returns the union of the finishing indices returned from all invocations of q:

There may be multiple distinct ways to parse a string while finishing at the same index, indicating an ambiguous grammar. Simple recognizers do not acknowledge these ambiguities; each possible finishing index is listed only once in the result set. For a more complete set of results, a more complicated object such as a parse tree must be returned.

Examples

Consider a highly ambiguous context-free grammar, s ::= ‘x’ s s | ε. Using the combinators defined earlier, we can modularly define executable notations of this grammar in a modern functional programming language (e.g., Haskell) as s = term ‘x’ <*> s <*> s <+> empty. When the recognizer s is applied at index 2 of the input sequence x x x x x it would return a result set {2,3,4,5}, indicating that there were matches starting at index 2 and finishing at any index between 2 and 5 inclusive.

Shortcomings and solutions

Parser combinators, like all recursive descent parsers, are not limited to the context-free grammars and thus do no global search for ambiguities in the LL(k) parsing Firstk and Followk sets. Thus, ambiguities are not known until run-time if and until the input triggers them. In such cases, the recursive descent parser may default (perhaps unknown to the grammar designer) to one of the possible ambiguous paths, resulting in semantic confusion (aliasing) in the use of the language. This leads to bugs by users of ambiguous programming languages, which are not reported at compile-time, and which are introduced not by human error, but by ambiguous grammar. The only solution that eliminates these bugs is to remove the ambiguities and use a context-free grammar.

The simple implementations of parser combinators have some shortcomings, which are common in top-down parsing. Naïve combinatory parsing requires exponential time and space when parsing an ambiguous context-free grammar. In 1996, Frost and Szydlowski demonstrated how memoization can be used with parser combinators to reduce the time complexity to polynomial.[6] Later Frost used monads to construct the combinators for systematic and correct threading of memo-table throughout the computation.[7]

Like any top-down recursive descent parsing, the conventional parser combinators (like the combinators described above) will not terminate while processing a left-recursive grammar (e.g. s ::= s <*> term ‘x’|empty). A recognition algorithm that accommodates ambiguous grammars with direct left-recursive rules is described by Frost and Hafiz in 2006.[8] The algorithm curtails the otherwise ever-growing left-recursive parse by imposing depth restrictions. That algorithm was extended to a complete parsing algorithm to accommodate indirect as well as direct left-recursion in polynomial time, and to generate compact polynomial-size representations of the potentially exponential number of parse trees for highly ambiguous grammars by Frost, Hafiz and Callaghan in 2007.[9] This extended algorithm accommodates indirect left recursion by comparing its ‘computed context’ with ‘current context’. The same authors also described their implementation of a set of parser combinators written in the Haskell language based on the same algorithm.[5][10]

Notes

  1. ^ Frost & Launchbury 1989.
  2. ^ Hutton 1992.
  3. ^ Hutton, Graham; Meijer, Erik. Monadic Parser Combinators (PDF) (Report). University of Nottingham. Retrieved 13 February 2023.
  4. ^ Swierstra 2001.
  5. ^ a b Frost, Hafiz & Callaghan 2008.
  6. ^ Frost & Szydlowski 1996.
  7. ^ Frost 2003.
  8. ^ Frost & Hafiz 2006.
  9. ^ Frost, Hafiz & Callaghan 2007.
  10. ^ cf. X-SAIGA — executable specifications of grammars

References

Read other articles:

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada Desember 2023. Kontainer Linux adalah implementasi virtualisasi tingkat sistem operasi untuk sistem operasi Linux. Beberapa implementasi yang ada semuanya berbasis pada virtualisasi, isolasi, dan mekanisme manajemen sumber daya yang disediakan oleh kernel Linux, khu...

 

 

American politician and businessman (1922–2019) This article is about American politician and governor of Michigan. For other uses, see William Milliken (disambiguation). William Milliken44th Governor of MichiganIn officeJanuary 22, 1969 – January 1, 1983LieutenantThomas F. Schweigert (acting)James H. BrickleyJames DammanJames H. BrickleyPreceded byGeorge W. RomneySucceeded byJim BlanchardChair of the National Governors AssociationIn officeSeptember 9, 1977 – August 29...

 

 

1 Raja-raja 15Kitab Raja-raja (Kitab 1 & 2 Raja-raja) lengkap pada Kodeks Leningrad, dibuat tahun 1008.KitabKitab 1 Raja-rajaKategoriNevi'imBagian Alkitab KristenPerjanjian LamaUrutan dalamKitab Kristen11← pasal 14 pasal 16 → 1 Raja-raja 15 (atau I Raja-raja 15, disingkat 1Raj 15) adalah pasal kelima belas Kitab 1 Raja-raja dalam Alkitab Ibrani dan Perjanjian Lama di Alkitab Kristen. Dalam Alkitab Ibrani termasuk Nabi-nabi Awal atau Nevi'im Rishonim [נביאים ראשוני�...

1992 studio album by Tangerine DreamRockoon1992 Miramar original cover.Studio album by Tangerine DreamReleasedMarch 1992RecordedMarch 1991 - January 1992StudioEastgate Studios (Vienna, Austria) and The Cave (Berlin)GenreBerlin School, electronica, rock,[1] new-ageLength57:48LabelVirgin (Europe), Miramar (US)ProducerEdgar FroeseTangerine Dream chronology Quinoa(1992) Rockoon(1992) Deadly Care(1992) Professional ratingsReview scoresSourceRatingAllMusic[1] Rockoon is the...

 

 

Синелобый амазон Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:ЧетвероногиеКлада:АмниотыКлада:ЗавропсидыКласс:Пт�...

 

 

Hö'elünPatung Hö'elün yang berada di dekat patung berkuda putranya di Tsonjin Boldog, Mongolia.Nama lengkap aksara Mongol: ᠥᠭᠡᠯᠦᠨ Ö’elün Üjin Kiril Mongolia: Өэлүн Nama anumertaPermaisuri Xuanyi (宣懿皇后) Hö'elün Hanzi tradisional: 訶額侖 Hanzi sederhana: 诃额仑 Alih aksara Mandarin - Hanyu Pinyin: Hēélún Hö'elün (Mongolia: ᠥᠭᠡᠯᠦᠨ, Ö’elün Üjin, terj. har. 'Lady Ö’elün'; fl. 1162–1210) adalah seorang bangsaw...

Literature from Wales in Welsh Welsh-language literature (Welsh: Llenyddiaeth Gymraeg) has been produced continuously since the emergence of Welsh from Brythonic as a distinct language in around the 5th century AD. [1] The earliest Welsh literature was poetry, which was extremely intricate in form from its earliest known examples, a tradition sustained today. Poetry was followed by the first British prose literature in the 11th century (such as that contained in the Mabinogion). Welsh...

 

 

River in Pennsylvania and New York, United States For other uses of Allegheny, see Allegheny (disambiguation) and Allegany (disambiguation). Allegheny RiverThe Allegheny River with Freeport, Pennsylvania in the backgroundNative nameAlikehane (Unami)LocationCountryUnited StatesStatePennsylvania, New YorkPhysical characteristicsSource  • locationAllegany Township, Pennsylvania, near Coudersport, Pennsylvania at the corner of Ben Green and Cobb Hill Roads R...

 

 

Israel Keanggotaan Perserikatan Bangsa-BangsaKeanggotaanAnggota penuhSejak1949 (1949)Kursi DK PBBNon-permanen, tak pernah terpilihPerwakilan PermanenDanny Danon Masalah-masalah terkait Israel dan aspek-aspek konflik Arab–Israel dan konflik Iran-Israel berilang kali menjadi debat, resolusi dan penyarian tahunan di Perserikatan Bangsa-Bangsa. Sejak didirikan pada 1948, Dewan Keamanan Perserikatan Bangsa-Bangsa telah mengadopsi 79 resolusi yang berkaitan langsung dengan konflik Arab-Israe...

2016年美國總統選舉 ← 2012 2016年11月8日 2020 → 538個選舉人團席位獲勝需270票民意調查投票率55.7%[1][2] ▲ 0.8 %   获提名人 唐納·川普 希拉莉·克林頓 政党 共和黨 民主党 家鄉州 紐約州 紐約州 竞选搭档 迈克·彭斯 蒂姆·凱恩 选举人票 304[3][4][註 1] 227[5] 胜出州/省 30 + 緬-2 20 + DC 民選得票 62,984,828[6] 65,853,514[6]...

 

 

Antimon triselenida Nama Nama lain antimonselit selenoksiantimon Penanda Nomor CAS 1315-05-5 Y Model 3D (JSmol) Gambar interaktif 3DMet {{{3DMet}}} ChemSpider 11483776 Y Nomor EC PubChem CID 6391662 Nomor RTECS {{{value}}} CompTox Dashboard (EPA) DTXSID30895002 InChI InChI=1S/2Sb.3Se/q2*+3;3*-2 YKey: WWUNXXBCFXOXHC-UHFFFAOYSA-N YInChI=1S/2Sb.3Se/q2*+3;3*-2Key: WWUNXXBCFXOXHC-UHFFFAOYSA-N SMILES [SbH3+3].[SbH3+3].[Se-2].[Se-2].[Se-2] Sifat Rumus kimia Sb2Se3 ...

 

 

Частина серії проФілософіяLeft to right: Plato, Kant, Nietzsche, Buddha, Confucius, AverroesПлатонКантНіцшеБуддаКонфуційАверроес Філософи Епістемологи Естетики Етики Логіки Метафізики Соціально-політичні філософи Традиції Аналітична Арістотелівська Африканська Близькосхідна іранська Буддій�...

First American lesbian civil rights group This article is about the American organization. For the Australian organization, see Daughters of Bilitis (Australia). Daughters of BilitisThe Ladder, set up by the Daughters of Bilitis, was published from 1956 to 1972.Formation1955 (1955)Dissolved1995 (1995) (last chapter)TypeGrassrootsPurposeLesbian civil and political rightsHeadquartersSan Francisco, California, United StatesOfficial language EnglishKey peopleDel Martin and Phyllis Lyon...

 

 

List of events ← 2018 2017 2016 2019 in the State of Palestine → 2020 2021 2022 Decades: 1990s 2000s 2010s 2020s See also: Other events of 2019 Timeline of Palestinian state history Events of 2019 in the State of Palestine. Incumbents State of Palestine (UN observer non-member State) Mahmoud Abbas (PLO), President, 8 May 2005-current Rami Hamdallah, Prime Minister, 6 June 2013 – 13 April 2019 Mohammad Shtayyeh, Prime Minister, 13 April 2019-current Gaza Strip (Hamas administrati...

 

 

Swiss film director (1929–2022) Alain TannerBorn(1929-12-06)6 December 1929Geneva, SwitzerlandDied11 September 2022(2022-09-11) (aged 92)Geneva, SwitzerlandOccupationFilmmakerYears active1957–2012SpouseJanineChildren2 Alain Tanner (6 December 1929 – 11 September 2022)[1] was a Swiss film director. Early years and education Tanner was born in Geneva, and studied economics at the University of Geneva.[2] In 1951, he joined the film club which Claude Goretta had r...

Raya Airways IATA ICAO Kode panggil TH RMY Raya Express Didirikan1993; 31 tahun lalu (1993) (sebagai Transmile Air Services)PenghubungBandar Udara Sultan Abdul Aziz ShahArmada5Tujuan9SloganDelivering Trust And ValueKantor pusatBandar Udara Sultan Abdul Aziz ShahSubang, MalaysiaSitus webrayaairways.com Raya Airways Sdn Bhd adalah maskapai penerbangan kargo dengan kantor pusat di Raya Airways Centre di Kompleks Kargo di Bandar Udara Sultan Abdul Aziz Shah di Subang, Selangor, Malaysia.[...

 

 

Tintoretto: Dogaresa, tahun 1590-an. Dogaresa merupakan sebuah gelar resmi dari istri Doge Venesia. Sejarah Posisi dogaresa diatur oleh undang-undang republik, yang menentukan tugas dan hak yang dimilikinya, dan apa yang dilarang oleh pemegang gelar. Hak-hak ini berubah beberapa kali selama sejarah republik ini. Pemegang gelar dogaresa pertama dilaporkan adalah Carola pada tahun 800-an, dan yang terakhir adalah Elisabetta Querini pada tahun 1790-an. Referensi Staley, Edgcumbe: The dogaressas ...

 

 

Electronic keyboard instrument For the organ found in electric fish, see Electric organ (biology). For pipe organs activated by electricity, see Pipe organ § Action. This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Electric organ – news · newspapers · books · scholar · JSTOR (August 2009) (Learn how and...

Period of political repression in Morocco This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Years of Lead Morocco – news · newspapers · books · scholar · JSTOR (May 2013) (Learn how and when to remove this message) This article is part of a series aboutHassan II of Morocco Early life Exile Death and funer...

 

 

File:Kelebek.gif This is a Wikipedia user talk page.This is not an encyclopedia article or the talk page for an encyclopedia article. If you find this page on any site other than Wikipedia, you are viewing a mirror site. Be aware that the page may be outdated and that the user whom this page is about may have no personal affiliation with any site other than Wikipedia. The original talk page is located at https://en.wikipedia.org/wiki/User_talk:7%266%3Dthirteen. Beware! This user's talk page i...