Chomsky normal form

In formal language theory, a context-free grammar, G, is said to be in Chomsky normal form (first described by Noam Chomsky)[1] if all of its production rules are of the form:[2][3]

ABC,   or
Aa,   or
S → ε,

where A, B, and C are nonterminal symbols, the letter a is a terminal symbol (a symbol that represents a constant value), S is the start symbol, and ε denotes the empty string. Also, neither B nor C may be the start symbol, and the third production rule can only appear if ε is in L(G), the language produced by the context-free grammar G.[4]: 92–93, 106 

Every grammar in Chomsky normal form is context-free, and conversely, every context-free grammar can be transformed into an equivalent one[note 1] which is in Chomsky normal form and has a size no larger than the square of the original grammar's size.

Converting a grammar to Chomsky normal form

To convert a grammar to Chomsky normal form, a sequence of simple transformations is applied in a certain order; this is described in most textbooks on automata theory.[4]: 87–94 [5][6][7] The presentation here follows Hopcroft, Ullman (1979), but is adapted to use the transformation names from Lange, Leiß (2009).[8][note 2] Each of the following transformations establishes one of the properties required for Chomsky normal form.

START: Eliminate the start symbol from right-hand sides

Introduce a new start symbol S0, and a new rule

S0S,

where S is the previous start symbol. This does not change the grammar's produced language, and S0 will not occur on any rule's right-hand side.

TERM: Eliminate rules with nonsolitary terminals

To eliminate each rule

AX1 ... a ... Xn

with a terminal symbol a being not the only symbol on the right-hand side, introduce, for every such terminal, a new nonterminal symbol Na, and a new rule

Naa.

Change every rule

AX1 ... a ... Xn

to

AX1 ... Na ... Xn.

If several terminal symbols occur on the right-hand side, simultaneously replace each of them by its associated nonterminal symbol. This does not change the grammar's produced language.[4]: 92 

BIN: Eliminate right-hand sides with more than 2 nonterminals

Replace each rule

AX1 X2 ... Xn

with more than 2 nonterminals X1,...,Xn by rules

AX1 A1,
A1X2 A2,
... ,
An-2Xn-1 Xn,

where Ai are new nonterminal symbols. Again, this does not change the grammar's produced language.[4]: 93 

DEL: Eliminate ε-rules

An ε-rule is a rule of the form

A → ε,

where A is not S0, the grammar's start symbol.

To eliminate all rules of this form, first determine the set of all nonterminals that derive ε. Hopcroft and Ullman (1979) call such nonterminals nullable, and compute them as follows:

  • If a rule A → ε exists, then A is nullable.
  • If a rule AX1 ... Xn exists, and every single Xi is nullable, then A is nullable, too.

Obtain an intermediate grammar by replacing each rule

AX1 ... Xn

by all versions with some nullable Xi omitted. By deleting in this grammar each ε-rule, unless its left-hand side is the start symbol, the transformed grammar is obtained.[4]: 90 

For example, in the following grammar, with start symbol S0,

S0AbB | C
BAA | AC
Cb | c
Aa | ε

the nonterminal A, and hence also B, is nullable, while neither C nor S0 is. Hence the following intermediate grammar is obtained:[note 3]

S0AbB | AbB | AbB | AbB   |   C
BAA | AA | AA | AεA   |   AC | AC
Cb | c
Aa | ε

In this grammar, all ε-rules have been "inlined at the call site".[note 4] In the next step, they can hence be deleted, yielding the grammar:

S0AbB | Ab | bB | b   |   C
BAA | A   |   AC | C
Cb | c
Aa

This grammar produces the same language as the original example grammar, viz. {ab,aba,abaa,abab,abac,abb,abc,b,ba,baa,bab,bac,bb,bc,c}, but has no ε-rules.

UNIT: Eliminate unit rules

A unit rule is a rule of the form

AB,

where A, B are nonterminal symbols. To remove it, for each rule

BX1 ... Xn,

where X1 ... Xn is a string of nonterminals and terminals, add rule

AX1 ... Xn

unless this is a unit rule which has already been (or is being) removed. The skipping of nonterminal symbol B in the resulting grammar is possible due to B being a member of the unit closure of nonterminal symbol A.[9]

Order of transformations

Mutual preservation
of transformation results
Transformation X always preserves (Green tickY)
resp. may destroy (Red XN) the result of Y:
Y
X
START TERM BIN DEL UNIT
START Yes Yes No No
TERM Yes No Yes Yes
BIN Yes Yes Yes Yes
DEL Yes Yes Yes No
UNIT Yes Yes Yes (Green tickY)*
*UNIT preserves the result of DEL
  if START had been called before.

When choosing the order in which the above transformations are to be applied, it has to be considered that some transformations may destroy the result achieved by other ones. For example, START will re-introduce a unit rule if it is applied after UNIT. The table shows which orderings are admitted.

Moreover, the worst-case bloat in grammar size[note 5] depends on the transformation order. Using |G| to denote the size of the original grammar G, the size blow-up in the worst case may range from |G|2 to 22 |G|, depending on the transformation algorithm used.[8]: 7  The blow-up in grammar size depends on the order between DEL and BIN. It may be exponential when DEL is done first, but is linear otherwise. UNIT can incur a quadratic blow-up in the size of the grammar.[8]: 5  The orderings START,TERM,BIN,DEL,UNIT and START,BIN,DEL,UNIT,TERM lead to the least (i.e. quadratic) blow-up.

Example

Abstract syntax tree of the arithmetic expression "a^2+4*b" wrt. the example grammar (top) and its Chomsky normal form (bottom)

The following grammar, with start symbol Expr, describes a simplified version of the set of all syntactical valid arithmetic expressions in programming languages like C or Algol60. Both number and variable are considered terminal symbols here for simplicity, since in a compiler front end their internal structure is usually not considered by the parser. The terminal symbol "^" denoted exponentiation in Algol60.

Expr Term | Expr AddOp Term | AddOp Term
Term Factor | Term MulOp Factor
Factor Primary | Factor ^ Primary
Primary number | variable | ( Expr )
AddOp → + | −
MulOp → * | /

In step "START" of the above conversion algorithm, just a rule S0Expr is added to the grammar. After step "TERM", the grammar looks like this:

S0 Expr
Expr Term | Expr AddOp Term | AddOp Term
Term Factor | Term MulOp Factor
Factor Primary | Factor PowOp Primary
Primary number | variable | Open Expr Close
AddOp → + | −
MulOp → * | /
PowOp → ^
Open → (
Close → )

After step "BIN", the following grammar is obtained:

S0 Expr
Expr Term | Expr AddOp_Term | AddOp Term
Term Factor | Term MulOp_Factor
Factor Primary | Factor PowOp_Primary
Primary number | variable | Open Expr_Close
AddOp → + | −
MulOp → * | /
PowOp → ^
Open → (
Close → )
AddOp_Term AddOp Term
MulOp_Factor MulOp Factor
PowOp_Primary PowOp Primary
Expr_Close Expr Close

Since there are no ε-rules, step "DEL" does not change the grammar. After step "UNIT", the following grammar is obtained, which is in Chomsky normal form:

S0 number | variable | Open Expr_Close | Factor PowOp_Primary | Term MulOp_Factor | Expr AddOp_Term | AddOp Term
Expr number | variable | Open Expr_Close | Factor PowOp_Primary | Term MulOp_Factor | Expr AddOp_Term | AddOp Term
Term number | variable | Open Expr_Close | Factor PowOp_Primary | Term MulOp_Factor
Factor number | variable | Open Expr_Close | Factor PowOp_Primary
Primary number | variable | Open Expr_Close
AddOp → + | −
MulOp → * | /
PowOp → ^
Open → (
Close → )
AddOp_Term AddOp Term
MulOp_Factor MulOp Factor
PowOp_Primary PowOp Primary
Expr_Close Expr Close

The Na introduced in step "TERM" are PowOp, Open, and Close. The Ai introduced in step "BIN" are AddOp_Term, MulOp_Factor, PowOp_Primary, and Expr_Close.

Alternative definition

Chomsky reduced form

Another way[4]: 92 [10] to define the Chomsky normal form is:

A formal grammar is in Chomsky reduced form if all of its production rules are of the form:

or
,

where , and are nonterminal symbols, and is a terminal symbol. When using this definition, or may be the start symbol. Only those context-free grammars which do not generate the empty string can be transformed into Chomsky reduced form.

Floyd normal form

In a letter where he proposed a term Backus–Naur form (BNF), Donald E. Knuth implied a BNF "syntax in which all definitions have such a form may be said to be in 'Floyd Normal Form'",

or
or
,

where , and are nonterminal symbols, and is a terminal symbol, because Robert W. Floyd found any BNF syntax can be converted to the above one in 1961.[11] But he withdrew this term, "since doubtless many people have independently used this simple fact in their own work, and the point is only incidental to the main considerations of Floyd's note."[12] While Floyd's note cites Chomsky's original 1959 article, Knuth's letter does not.

Application

Besides its theoretical significance, CNF conversion is used in some algorithms as a preprocessing step, e.g., the CYK algorithm, a bottom-up parsing for context-free grammars, and its variant probabilistic CKY.[13]

See also

Notes

  1. ^ that is, one that produces the same language
  2. ^ For example, Hopcroft, Ullman (1979) merged TERM and BIN into a single transformation.
  3. ^ indicating a kept and omitted nonterminal N by N and N, respectively
  4. ^ If the grammar had a rule S0 → ε, it could not be "inlined", since it had no "call sites". Therefore it could not be deleted in the next step.
  5. ^ i.e. written length, measured in symbols

References

  1. ^ Chomsky, Noam (1959). "On Certain Formal Properties of Grammars". Information and Control. 2 (2): 137–167. doi:10.1016/S0019-9958(59)90362-6. Here: Sect.6, p.152ff.
  2. ^ D'Antoni, Loris. "Page 7, Lecture 9: Bottom-up Parsing Algorithms" (PDF). CS536-S21 Intro to Programming Languages and Compilers. University of Wisconsin-Madison. Archived (PDF) from the original on 2021-07-19.
  3. ^ Sipser, Michael (2006). Introduction to the theory of computation (2nd ed.). Boston: Thomson Course Technology. Definition 2.8. ISBN 0-534-95097-3. OCLC 58544333.
  4. ^ a b c d e f Hopcroft, John E.; Ullman, Jeffrey D. (1979). Introduction to Automata Theory, Languages and Computation. Reading, Massachusetts: Addison-Wesley Publishing. ISBN 978-0-201-02988-8.
  5. ^ Hopcroft, John E.; Motwani, Rajeev; Ullman, Jeffrey D. (2006). Introduction to Automata Theory, Languages, and Computation (3rd ed.). Addison-Wesley. ISBN 978-0-321-45536-9. Section 7.1.5, p.272
  6. ^ Rich, Elaine (2007). "11.8 Normal Forms". Automata, Computability, and Complexity: Theory and Applications (PDF) (1st ed.). Prentice-Hall. p. 169. ISBN 978-0132288064. Archived from the original (PDF) on 2023-01-17.
  7. ^ Wegener, Ingo (1993). Theoretische Informatik - Eine algorithmenorientierte Einführung. Leitfäden und Mongraphien der Informatik (in German). Stuttgart: B. G. Teubner. ISBN 978-3-519-02123-0. Section 6.2 "Die Chomsky-Normalform für kontextfreie Grammatiken", p. 149–152
  8. ^ a b c Lange, Martin; Leiß, Hans (2009). "To CNF or not to CNF? An Efficient Yet Presentable Version of the CYK Algorithm" (PDF). Informatica Didactica. 8. Archived (PDF) from the original on 2011-07-19.
  9. ^ Allison, Charles D. (2022). Foundations of Computing: An Accessible Introduction to Automata and Formal Languages. Fresh Sources, Inc. p. 176. ISBN 9780578944173.
  10. ^ Hopcroft et al. (2006)[page needed]
  11. ^ Floyd, Robert W. (1961). "Note on mathematical induction in phrase structure grammars" (PDF). Information and Control. 4 (4): 353–358. doi:10.1016/S0019-9958(61)80052-1. Archived (PDF) from the original on 2021-03-05. Here: p.354
  12. ^ Knuth, Donald E. (December 1964). "Backus Normal Form vs. Backus Naur Form". Communications of the ACM. 7 (12): 735–736. doi:10.1145/355588.365140. S2CID 47537431.
  13. ^ Jurafsky, Daniel; Martin, James H. (2008). Speech and Language Processing (2nd ed.). Pearson Prentice Hall. p. 465. ISBN 978-0-13-187321-6.

Further reading

Read other articles:

Artikel ini sebatang kara, artinya tidak ada artikel lain yang memiliki pranala balik ke halaman ini.Bantulah menambah pranala ke artikel ini dari artikel yang berhubungan atau coba peralatan pencari pranala.Tag ini diberikan pada November 2022. Janusz Paluszkiewiczdalam sandiwara Friedrich Wolf Beaumarchais 1950Lahir(1912-03-20)20 Maret 1912Łódź, Kekaisaran Rusia (kini Łódź, Polandia)Meninggal19 Februari 1990(1990-02-19) (umur 77)Glowno, PolandiaPekerjaanPemeranTahun aktif195...

 

' PublikasiJP: 18 Desember 2008NA: TBA 17 Maret 2009EUR: TBA 13 Maret 2009GenrePermainan peranKarakteristik teknisPlatformNintendo DS ModePermainan video pemain tunggal Format kode Daftar 30 Informasi pengembangPengembangKonamiPenyuntingKonami KomponisYoshino Aoki (en) PenerbitKonamiPenilaianESRB Informasi tambahanMobyGamessuikoden-tierkreis Portal permainan videoSunting di Wikidata • L • B • PWBantuan penggunaan templat ini Suikoden Tierkreis (bahasa Jerman dari zodiak, ...

 

U.S. federal government business loan program Not to be confused with Public–private partnership, also often abbreviated as PPP. President Trump signs the Paycheck Protection Program and Health Care Enhancement Act (H.R. 266), April 24, 2020 The Paycheck Protection Program (PPP) is a $953-billion business loan program established by the United States federal government during the Trump administration in 2020 through the Coronavirus Aid, Relief, and Economic Security Act (CARES Act) to help ...

For related races, see 2018 United States Senate elections. 2018 United States Senate election in Ohio ← 2012 November 6, 2018 (2018-11-06) 2024 → Turnout54.65% 9.95pp   Nominee Sherrod Brown Jim Renacci Party Democratic Republican Popular vote 2,355,924 2,053,963 Percentage 53.41% 46.57% County results Congressional district results Precinct results Township resultsBrown:      40–50%      50�...

 

ABC affiliate in Burlington, Vermont WVNYBurlington, VermontPlattsburgh, New YorkMontreal, QuebecUnited States–CanadaCityBurlington, VermontChannelsDigital: 7 (VHF)Virtual: 22BrandingABC 22[1]ProgrammingAffiliations22.1: ABCfor others, see § SubchannelsOwnershipOwnerMission Broadcasting, Inc.OperatorNexstar Media GroupSister stationsWFFF-TVHistoryFirst air dateAugust 19, 1968 (55 years ago) (1968-08-19)Former call signsWVNY-TV (1968–1974)WEZF-TV (1974–1982)Fo...

 

† Человек прямоходящий Научная классификация Домен:ЭукариотыЦарство:ЖивотныеПодцарство:ЭуметазоиБез ранга:Двусторонне-симметричныеБез ранга:ВторичноротыеТип:ХордовыеПодтип:ПозвоночныеИнфратип:ЧелюстноротыеНадкласс:ЧетвероногиеКлада:АмниотыКлада:Синапсиды�...

Chemical element, symbol Dy and atomic number 66Dysprosium, 66DyDysprosiumPronunciation/dɪsˈproʊziəm/ ​(dis-PROH-zee-əm)Appearancesilvery whiteStandard atomic weight Ar°(Dy)162.500±0.001[1]162.50±0.01 (abridged)[2] Dysprosium in the periodic table Hydrogen Helium Lithium Beryllium Boron Carbon Nitrogen Oxygen Fluorine Neon Sodium Magnesium Aluminium Silicon Phosphorus Sulfur Chlorine Argon Potassium Calcium Scandium Titanium Vanadium Chromium M...

 

Komisi VII Dewan Perwakilan Rakyat Republik IndonesiaJenisJenisKomisi DPR RI dengan lingkup tugas di bidang Energi, Riset dan Teknologi, dan Lingkungan Hidup PimpinanKetuaSugeng Suparwoto (NasDem) Wakil KetuaDony Maryadi Oekon (PDI-P) Wakil KetuaMaman Abdurrahman (Golkar) Wakil KetuaBambang Haryadi (Gerindra) Wakil KetuaEddy Soeparno (PAN) KomposisiPartai & kursi   PDI-P (10)   Golkar (8)   Gerindra (6)   NasDem (6)   PKB (5)   Demokrat (5...

 

Sebuah pesawat F/A-18 Hornet milik Angkatan Laut Amerika Serikat yang diluncurkan menggunakan katapel landasan dengan kekuatan penuh dari pembakar lanjut. Pembakar lanjut atau pembakaran lanjut adalah sebuah komponen tambahan yang dipasang pada sesetengah mesin jet, terutama pada pesawat militer berkecepatan supersonik untuk memberi lonjakan daya dorong sementara saat terbang baik dalam kecepatan supersonik atau saat lepas landas (disebabkan beban sayap yang tinggi pada tipikal pesawat supers...

Uthra in Mandaeism NbaṭOther namesNbaṭ RbaNbaṭ ZiwaAbodeWorld of LightArmyGubran, Yawar, Bihram, and YukabarBattlesLeader of the battle against Yushamin Part of a series onMandaeism Prophets Adam Seth Noah Shem John the Baptist Names for adherents Mandaeans Sabians Nasoraeans Gnostics Scriptures Ginza Rabba Right Ginza Left Ginza Mandaean Book of John Qolasta Niana Haran Gawaita The Wedding of the Great Shishlam The Baptism of Hibil Ziwa Diwan Abatur The Thousand and Twelve Questions Sc...

 

This article relies largely or entirely on a single source. Relevant discussion may be found on the talk page. Please help improve this article by introducing citations to additional sources.Find sources: Zhongsha Islands – news · newspapers · books · scholar · JSTOR (December 2023) Town in Hainan, People's Republic of ChinaZhongsha Islands 中沙岛礁镇TownZhongsha Daojiao TownCountryPeople's Republic of ChinaProvinceHainanPrefecture-level citySansh...

 

Cet article est une ébauche concernant un aéroport et le Kosovo. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les recommandations des projets correspondants. Aeroporti ndërkombëtar i Prishtinës Adem Jashari(sq)Aéroport international de Pristina Adem Jashari(fr) Aеродром приштина (sr) L'aéroport international de Pristina vu de nuit Localisation Pays Kosovo et Serbie Ville Lipjan Coordonnées 42° 34′ 25″ nord, 21°&#...

Fourth Division 1961-1962 Competizione Fourth Division Sport Calcio Edizione 4ª Organizzatore Football League Date dal 19 agosto 1961al 3 maggio 1962 Luogo  Inghilterra Galles Partecipanti 24 Formula girone all'italiana A/R Risultati Vincitore Millwall(1º titolo) Altre promozioni Carlisle UnitedColchester UnitedWrexham Statistiche Miglior marcatore Bobby Hunt (37) Incontri disputati 506 Gol segnati 1 690 (3,34 per incontro) Cronologia della competizione 196...

 

Поле направлений (штрихи) и изоклины По́ле направле́ний — геометрическая интерпретация множества линейных элементов, соответствующих системе обыкновенных дифференциальных уравнений x ˙ i = f i ( t , x 1 , . . . , x n ) , i = 1 , . . . , n {\displaystyle {\dot {x}}_{i}=f_{i}(t,x_{1},...,x_{n}),i=1,...,n} . Для систе...

 

International trips made by Boris Johnson while Prime Minister World map highlighting the 18 countries visited by Boris Johnson during his premiership, as of September 2022   United Kingdom   One visit   Two visits   Three visits   Four visits   Five visits   Six visits   7 or more visits This article is part of a series aboutBoris Johnson Political positions Electoral history Public image MP for Uxbridge and Sout...

Regional road in Ireland R940 roadBóthar R940Route informationLength1.2 km (0.75 mi)Major junctionsFrom R229 Letterkenny (Port Road)To N56 Letterkenny (Ramelton Road) LocationCountryIreland Highway system Roads in Ireland Motorways Primary Secondary Regional The R940 road is a short regional road in Ireland, located in Letterkenny, County Donegal.[1][2] References ^ R940 - Roader's Digest: The SABRE Wiki. www.sabre-roads.org.uk. ^ Book (eISB), electronic Irish ...

 

Largest city in Pennsylvania, United States Philly redirects here. For other uses, see Philly (disambiguation) and Philadelphia (disambiguation). Consolidated city-county in the United StatesPhiladelphiaConsolidated city-countySkyline of Center CityIndependence National Historical ParkPhiladelphia City HallUniversity of PennsylvaniaElfreth's AlleyPhiladelphia Museum of Art FlagSealLogoEtymology: Ancient Greek: φίλος phílos (beloved, dear) and ἀδελφός adelphós (brother, brotherl...

 

Disambiguazione – Se stai cercando altri membri del casato Sforza con questo nome, vedi Francesco Sforza (disambigua). Francesco II SforzaRitratto di Francesco II Sforza, copia di un maestro lombardo di un originale perduto di Tiziano, XVI secolo, Villa del Principe, GenovaDuca di MilanoStemma In carica4 febbraio 1521 – 24 ottobre 1535 PredecessoreFrancesco I di Francia SuccessoreFilippo II Principe di PaviaIn carica4 febbraio 1521 – 24 ottobre 1535 Altri titoliDuca di Bari Nasci...

Città metropolitana di Reggio Calabriacittà metropolitana Città metropolitana di Reggio Calabria – VedutaIl Palazzo Corrado Alvaro, sede istituzionale della città metropolitana, in un'antica cartolina dei primi del Novecento LocalizzazioneStato Italia Regione Calabria AmministrazioneCapoluogo Reggio Calabria Sindaco metropolitanoGiuseppe Falcomatà (PD) dal 2-2-2017 Data di istituzione7 agosto 2016 TerritorioCoordinatedel capoluogo38°06′41″N 15°39′43″E38°...

 

Voce principale: Venezia Football Club Società Sportiva Dilettantistica. Calcio VeneziaMestreStagione 1987-1988Sport calcio Squadra Venezia-Mestre Allenatore Ferruccio Mazzola Presidente Maurizio Zamparini Serie C22º posto nel girone B. Promosso in Serie C1. Maggiori presenzeCampionato: Dore, Rastelli (34) Miglior marcatoreCampionato: Fiorini, Marchetti (10) 1986-1987 1988-1989 Si invita a seguire il modello di voce Questa pagina raccoglie le informazioni riguardanti il Calcio Venezia...