In many programming languages, one may write conditionally executed code in two forms: the if-then form, or the if-then-else form. (The else clause is optional.):
if a then s
if b then s1 else s2
Ambiguous interpretation becomes possible when there are nested statements; specifically when an if-then-else form replaces the statement s inside the above if-then construct:
if a thenif b then s1 else s2
In this example, s1 gets executed if and only if a is true andb is true. But what about s2? One person might be sure that s2 gets executed whenever a is false (by attaching the else to the first if), while another person might be sure that s2 gets executed only when a is true andb is false (by attaching the else to the second if). In other words, someone could interpret the previous statement as being equivalent to either of the following unambiguous statements:
if a then { if b then s1 } else s2
if a then { if b then s1 else s2 }
The dangling-else problem dates back to ALGOL 60,[1] and subsequent languages have resolved it in various ways. In LR parsers, the dangling else is the archetypal example of a shift-reduce conflict.
Avoiding ambiguity while keeping the syntax
This problem often comes up in compiler construction, especially scannerless parsing. The convention when dealing with the dangling else is to attach the else to the nearby if statement,[2] allowing for unambiguous context-free grammars, in particular. Programming languages like Pascal,[3] C,[4] and Java[5] follow this convention, so there is no ambiguity in the semantics of the language, though the use of a parser generator may lead to ambiguous grammars. In these cases alternative grouping is accomplished by explicit blocks, such as begin...end in Pascal[6] and {...} in C.
Depending on the compiler construction approach, one may take different corrective actions to avoid ambiguity:
If the parser is produced by an SLR, LR(1), or LALR LR parser generator, the programmer will often rely on the generated parser feature of preferring shift over reduce whenever there is a conflict.[2] Alternatively, the grammar can be rewritten to remove the conflict, at the expense of an increase in grammar size (see below).
If the parser is hand-written, the programmer may use a non-ambiguous context-free grammar. Alternatively, one may rely on a non-context-free grammar or a parsing expression grammar.
Avoiding ambiguity by changing the syntax
The problem can also be solved by making explicit the link between an else and its if, within the syntax. This usually helps avoid human errors.[7]
Disallowing the statement following a "then" to be an "if" itself (it may however be a pair of statement brackets containing only an if-then-clause). ALGOL 60 follows this approach.[8]
Requiring braces (parentheses) when an "else" follows an "if".[9]
Requiring every "if" to be paired with an "else". To avoid a similar problem concerning semantics rather than syntax, Racket deviates from Scheme by considering an if without a fallback clause to be an error, effectively distinguishing conditional expressions (i.e if) from conditional statements (i.e when and unless, which do not have fallback clauses).
Using different keywords for the one-alternative and two-alternative "if" statements. S-algol uses if e do s for the one-alternative case and if e1 then e2 else e3 for the general case.[10]
Requiring braces unconditionally, like Swift. This is effectively true in Python as its indentation rules delimit every block, not just those in "if" statements.
Any other statement-related grammar rules may also have to be duplicated in this way if they may directly or indirectly end with a statement or selection-statement non-terminal.
However, we give grammar that includes both of if and while statements.
statement: open_statement
| closed_statement
;
open_statement: IF '(' expression ')' statement
| IF '(' expression ')' closed_statement ELSE open_statement
| WHILE '(' expression ')' open_statement
;
closed_statement: simple_statement
| IF '(' expression ')' closed_statement ELSE closed_statement
| WHILE '(' expression ')' closed_statement
;
simple_statement: ...
;
Finally, we give the grammar that forbids ambiguous IF statements.
statement: open_statement
| closed_statement
;
open_statement: IF '(' expression ')' statement
| IF '(' expression ')' closed_statement ELSE open_statement
| WHILE '(' expression ')' open_statement
;
closed_statement: simple_statement
| IF '(' expression ')' closed_statement ELSE closed_statement
| WHILE '(' expression ')' closed_statement
;
simple_statement: ...
;
With this grammar the statement if (a) if (b) c else d can only be parsed one way, because the other interpretation (if (a) {if (b) c} else d) is produced as
and then the parsing fails trying to match closed_statement to "if (b) c". An attempt with closed_statement fails in the same way. The other parse, if (a) {if (b) c else d}) succeeds:
statement
open_statement
IF '(' expression ')' statement
IF '(' expression ')' closed_statement
IF '(' a ')' (IF '(' expression ')' closed_statement ELSE closed_statement)
IF '(' a ')' (IF '(' b ')' c ELSE 'd')
^ISO 7185:1990 (Pascal) 6.8.3.4: An if-statement without an else-part shall not be immediately followed by the token else.
^ abISO 9899:1999 (C): 6.8.4.1(3): "An else is associated with the lexically nearest preceding if that is allowed by the syntax.", available at WG14 N1256, p. 134
^Davie, Antony J. T.; Ronald Morrison (1981), Brian Meek (ed.), Recursive Descent Compiling, Ellis Horwood series in computers and their applications, Chichester, West Sussex: Ellis Horwood, p. 20, ISBN0-470-27270-8