Sipser Exercise 2.35

Theorem: Let be a CFG in Chomsky normal form with variables. If generates a string under a derivation of or more steps, then is infinite.


Proof: Note from 2.26, any derivation in a CNF grammar has odd length, so suppose generates some string with a derivation of at least steps. Then, also from 2.26, it must be that . That is, to generate , we must first generate a string containing at least variables. Write to denote this final string containing only variables that will then generate .

The parse tree for the derivation of is strictly a binary tree, because is CNF, and consists only of variables. There are (at least) leaf nodes on this tree. Note that necessarily, the tree is full because every node has either 0 or 2 children. The height of the tree is minimal (that is, can be bounded below) if it also balanced. In that case, the height of a full, balanced binary tree is . We have that:

So is strictly greater than , which means .

But, if the height is greater than , then by the pigeonhole principle we can conclude that in any particular path (walk) from the root node to a leaf node, we will encounter a repeated variable. That is, there's some variable such that can derive another string containing . In other words, where are strings of variables. Then, in particular, , which is an infinite family of variable strings that can be generated by , and thus an infinite family of strings in , which makes infinite.