Sipser Exercise 2.23 | Chase Meadors

Theorem: The language $D = \setbuilder{xy}{x,y \in \set{0,1}^* \tand \abs{x} = \abs{y} \tand x \neq y}$ is context-free.

Proof:

$D$$ is the language of binary strings that can be split into unequal halves. We will show the context-free grammar, and then prove that it is correct, since the construction is not very intuitive. Let $$G$$ be:$

\begin{align} S &\rightarrow X_0 X_1 \mid X_1 X_0 \ X_0 &\rightarrow 0X0 \mid 0X1 \mid 1X0 \mid 1X1 \mid 0 \ X_1 &\rightarrow 0Y0 \mid 0Y1 \mid 1Y0 \mid 1Y1 \mid 1 \end{align}

$First, it is a relatively simple observation that a CFG with starting variable $$X_i$$ generates the language of binary strings symmetric about $$i$$ where $$i$$ is 1 or 0. Now, we show that $$D$$ is the language generated by $$G$$: --- *Claim*: Every string generated by $$G$$ is in $$D$$. We can see from the observation above that every string generated by $$G$$ is the concatenation of two strings, one symmetric about 0 and the other symmetric about 1. WLOG let the first one by symmetric about 0 and the second about 1 (that is, the first derivation step is $$S \rightarrow X_0 X_1$$). Then the word $$z$$ will be of the form $$z = a0bc1d$$, with $$|a| = |b| = n \geq 0$$ and $$|c| = |d| = m \geq 0$$. If $$n = m$$ then clearly, $$z$$ can be split in two halves, one containing the 0 at the middle index, and one with the 1 at the middle index, and $$z \in D$$. If not, suppose they are unequal and WLOG that $$n > m$$. The length of $$z$$ is $$2n+1+2m+1$$, so its even and we have that $$z = xy$$ where $$|x| = |y| = n + m + 1$$. Note the following indices: * The index of the 0 in $$z$$ is $$n$

The index of the $1$ is $2n+1+m$
The index of the start of $y$ is $n + m + 1$ .

So the index of the $1$ with respect to $y$ is $(2n+1+m) - (n+m+1) = n$ . So $x$ and $y$ differ in the $n$ 'th spot, and $z \in D$ .

Claim: Every word in $D$ can be generated by $G$ :

Consider some word $z \in D$ . We can write $z=xy$ , where $x \neq y$ and $|x| = |y|$ (note the length of $z$ is even). Then there is some index $i$ such that the corresponding positions in $x$ and $y$ are unequal, that is, $z_i \neq z_{i+|x|}$ . Let $z_i = p$ and $z_{i+|x|} = q$ with $p, q \in \set{0, 1}$ . There are $i$ characters in $z$ before $z_i$ , so label $a = z_0 \dots z_{i-1}$ and $b = z_{i+1} \dots z_{2i}$ , note $|a| = |b|$ . At this point $z$ can be written $z = apb \dots$ .

Note we have $|abp| = 2i+1$ . Therefore, we have $|z| - (2i+1)$ characters left in the string, an odd number. This odd-length remainder is symmetric about $i+|x|$ . So we can split the remaining odd length string around the middle again. That is, let $c = z_{2i+1} \dots z_{i+|x|-1}$ and $d = z_{i+|x|+1} \dots z_{|z|-1}$ .

Now we can write $z = (apb)(cqd)$ where $|a| = |b|$ and $|c| = |d|$ . This is the concatenation of two binary strings symmetric about $p$ and $q$ , one of which is 1 and the other 0. Then, this string can be generated by $G$ .