A homomorphism is based on a function which maps symbols from an alphabet to words over another alphabet ; If the domain of this function is extended to words over in the natural way, by letting for all words and , this yields a homomorphism.
A matched alphabet is an alphabet with two equal-sized sets; it is convenient to think of it as a set of parentheses types, where contains the opening parenthesis symbols, whereas the symbols in contains the closing parenthesis symbols. For a matched alphabet , the typed Dyck language is given by
For example, the following is a valid sentence in the 3-typed Dyck language:
( [ [ ] { } ] ( ) { ( ) } )
Theorem
A language L over the alphabet is context-free if and only if there exists
a matched alphabet
a regular language over ,
and a homomorphism
such that .
We can interpret this as saying that any CFG language can be generated by first generating a typed Dyck language, filtering it by a regular grammar, and finally converting each bracket into a word in the CFG language.
Davis, Martin D.; Sigal, Ron; Weyuker, Elaine J. (1994). Computability, Complexity, and Languages: Fundamentals of Theoretical Computer Science (2nd ed.). Elsevier Science. p. 306. ISBN0-12-206382-1.