We have shown how to convert a regular expression to an NFA; the proof that
these represent the same language is straightforward (by induction). However,
we have asserted that there is a correspondence in *both* directions; this
is formalised in **Kleene's Theorem**:

KLEENE'S THEOREM, PART 1: To each regular expression there corresponds a NFA

Strategy: The corresponding NFA is the one constructed by Thompson's Algorithm;
proof that it is *equivalent* is by induction over regular expressions.

Proof: Boring!

KLEENE'S THEOREM, PART 2: To each NFA there corresponds a
regular expression

Strategy: Proof by induction over the states of
the NFA (sort of).

The language represented by a NFA can be
partitioned into the union of a number of smaller languages, each
defined as follows:

Let the states of the NFA be numbered from 1 to .

Let and be states, and let be a number such that .

Then:

Note that if is the start state, then the union of all for all finish states is the language accepted by the NFA.

Equally, if we can show that has a corresponding regular
expression for all , and , we will have proved the theorem.
The proof of this will proceed by induction over .

Proof: (Kleene's Theorem, part 2)

- Base Case:

We must be able to get from to without passing through*any*states; therefore either =, or we go directly in one transition from to . The only strings that can be recognised in this case are and single characters from . The rules and give us a regular expression corresponding to these situations. - Inductive Step:

The induction hypothesis is that for some such that , the language has a corresponding regular expression. Now we must prove that, based on this assumption, has a corresponding regular expression.Suppose the machine consumes some string ; we must find a regular expression corresponding to . Now suppose also that it passes through state on its way (if it doesn't, then is in , which has a corresponding regular expression by the induction hypothesis). Since the machine can ``loop'' on arbitrarily many times, we can split up into three substrings:

- a
- which moves the machine from state to state
- b
- which causes the machine to ``loop'' on state
- c
- which moves the machine from state to state

*through*. Thus we have:By the induction hypothesis each of these have corresponding regular expressions, say

`A`,`B`and`C`respectively, and thus the regular expression for is`A(B*)C`.Since our choice of was arbitrary, we have proved the theorem.