## The order of elements in a free semigroup

Let $X$ be a set and let $S = \mathsf{F}(X)$ denote the free semigroup on $X$. Find the orders of elements in $S$. Does $S$ have a kernel?

To answer this question the Right Way, we really need to have a more rigorous definition for $\mathsf{F}(X)$. To accomplish this, we will introduce ‘lists’. The next several paragraphs borrow heavily from the book “Set Theory” by Monk and from the paper “Functional Programming with Bananas, Lenses, Envelopes and Barbed Wire” by Meijer, Fokkinga, and Paterson (primarily from the latter, with a bit of influence from the former). Both of those publications are very good.

This post was originally written for an experimental blog I started writing a while ago but quickly abandoned, so the style is a bit different.

One of the most basic “data structures” we use is the list; that is, a finite sequence of elements from some set. In this post we will give a precise definition for lists and prove some basic facts about them. The definition of list that we will use might seem strange, but it lends itself well to recursive proofs and to implementation in software. This will allow us to implement our proofs in a computer program, reasonably confident that the program does as we expect.

Definition: Let $A$ be a set. We define a function $L$ on $\mathbb{N}$ recursively as follows: $L_A(0) = \{\emptyset\}$ and $L_A(k+1) = A \times L_A(k)$. Then we define the set of lists in $A$ to be the set $\mathcal{L}(A) = \bigcup_{k \in \mathbb{N}} L_A(k)$.

Note that for all sets $A$, $\emptyset \in \mathcal{L}(A)$. We will call this the empty list and denote it by $\varnothing$. By definition, every element of $\mathcal{L}(A)$ is either $\varnothing$ or of the form $(a,\ell)$ for some $a \in A$ and $\ell \in \mathcal{L}(A)$.

Theorem (Basic facts about $\mathcal{L}(A)$):

1. $L_A(k) \neq \emptyset$ for all nonzero $k \in \mathbb{N}$ if and only if $A \neq \emptyset$.
2. If $A$ is nonempty and $\ell \in \mathcal{L}(A)$, then there exists a unique $k \in \mathbb{N}$ such that $\ell \in L_A(k)$. We call this unique $k$ the length of $\ell$ and denote it $\mathsf{len}(\ell)$.
3. If $\ell \in \mathcal{L}(A)$ with $\mathsf{len}(\ell) = k+1$, then $\ell = (a,\ell^\prime)$ for some $a \in A$ and $\ell^\prime \in \mathcal{L}(A)$ with $\mathsf{len}(\ell^\prime) = k$.

Proof:

1. ($\Rightarrow$) Since in particular $L_A(1) = A \times \{\varnothing\} \neq \emptyset$, we have $A \neq \emptyset$. ($\Leftarrow$) Suppose to the contrary, and choose $k$ minimal such that $L_A(k+1)$ is empty. Then $A \times L_A(k)$ is empty, so that $L_A(k)$ is empty, a contradiction.
2. Let $S \subseteq \mathcal{L}(A)$ be the set of all lists which are not contained in a unique $L_A(k)$. Suppose $S$ is not empty. Note that each $\ell \in S$ is contained in a smallest $L_A(k)$, say $L_A(k_\ell)$. Now let $k \in \mathbb{N}$ be the unique least element among the $k_\ell$, and let $\ell$ be in $L_A(k)$. Then we have $\ell \in L_A(n)$ for some $n > k$. If $k = 0$, then we have $\ell = \varnothing$ since $L_A(0) = \{\varnothing\}$. On the other hand, we have $n = m+1$ for some $m$, and $\ell \in A \times L_A(m)$. This is a contradiction because no element of $A \times L_A(m)$ is the empty set. Suppose now that $k \neq 0$; then we have $k = t+1$ and $n = m+1$. Now $\ell = (a,\ell_1)$, where $\ell_1 \in L_A(t) \cap L_A(m)$. This is a contradiction since $k$ was chosen to be minimal.
3. This follows from the definition of $L_A(k+1)$. $\square$

Note that by definition, every list has finite length. Elements of $\mathcal{L}(A)$ are ordered pairs which may be nested on the right; for instance, $(1,\varnothing)$, $(2,(1,\varnothing))$, and $(4,(2,(1,\varnothing)))$ are lists. To cut down on the number of parentheses we have to write, we will frequently use bracket notation as a shorthand. For instance, $[1]$ means $(1,\varnothing)$ and $[4,2,1]$ means $(4,(2,(1,\varnothing)))$.

One of the more useful things we can do with a list is deconstruct it. Functions that do this in a certain well-behaved way are called catamorphisms, and they have a nice uniqueness property.

Theorem (List catamorphisms): Let $A$ and $B$ be sets with $A$ nonempty, let $e \in B$, and let $\oplus : A \times B \rightarrow B$ be a function. Then there exists a unique function $h : \mathcal{L}(A) \rightarrow B$ such that $h(\varnothing) = e$ and $h(a,\ell) = a \oplus h(\ell)$. We will denote this function by $(\!|\oplus,e|\!)$, and call functions of this kind list catamorphisms.

Proof:
(Existence) Define a function $h_k : L_A(k) \rightarrow B$ recursively as follows: $h_0(\varnothing) = e$ and $h_{k+1}(a,\ell) = a \oplus h_k(\ell)$. Certainly then $h = \bigcup_{k \in \mathbb{N}} h_k$ is a function from $\mathcal{L}(A)$ to $B$. Moreover, we have $h(\varnothing) = h_0(\varnothing) = e$, and if $(a,\ell) \in L_A(k+1)$, then $h(a,\ell) = h_{k+1}(a,\ell)$ $= a \oplus h_k(\ell)$ $= a \oplus h(\ell)$. (Uniqueness) Suppose $\theta : \mathcal{L}(A) \rightarrow B$ is a function such that $\theta(\varnothing) = e$ and $\theta(a,\ell) = a \oplus \theta(\ell)$. We prove that $\theta = h$ by induction on the length. Certainly $\theta(\varnothing) = e = h(\varnothing)$. Now suppose that for all lists $\ell$ of length $k$ we have $\theta(\ell) = h(\ell)$, and let $\tau = (a,\ell)$ have length $k+1$. Then $\theta(a,\ell)$ $= a \oplus \theta(\ell)$ $= a \oplus h(\ell) = h(a,\ell)$. Thus $h$ is unique. $\square$

A catamorphism takes a binary operator and an element of $B$, and returns a function which takes lists over $A$ and deconstructs them into a single element of $B$ using the operator. More literally, a catamorphism takes a list an replaces all commas with the operator $\oplus$ and replaces the empty list with the fixed element $e$. This may seem strange at first, but as we will see, several useful functions which take lists as input can be written as a catamorphism or a chain of catamorphisms composed together. One advantage of this is that certain functional transformation rules can be used to rewrite a given function to be “more efficient”, whatever that means. Two basic rules are called the Fusion law and the Split law; roughtly speaking, the Fusion law says when the composition of a function and a catamorphism is a catamorphism, and the Split law says that catamorphisms interact with direct products in a nice way.

Theorem:

1. (Fusion law) Let $A$, $B$, and $C$ be nonempty sets, let $e_B \in B$ and $e_C \in C$, and let $\oplus : A \times B \rightarrow B$ and $\otimes : A \times C \rightarrow C$. Finally, let $f : B \rightarrow C$ be a function. Then $f \circ (\!|\oplus,e_B|\!) = (\!|\otimes,e_C|\!)$ if and only if $f(e_B) = e_C$ and for all $a \in A$ and $b \in B$ such that $b \in \mathsf{im}\ (\!|\oplus,e_B|\!)$, $f(a \oplus b) = a \otimes f(b)$.
2. (Split law) Let $A$, $B$, and $C$ be nonempty sets, let $b \in B$ and $c \in C$, and let $\oplus : A \times B \rightarrow B$ and $\otimes : A \times C \rightarrow C$. Define $\star : A \times (B \times C) \rightarrow B \times C$ by $a \star (b,c) = (a \oplus b, a \otimes c)$, and let $\pi_1$ and $\pi_2$ denote the coordinate projections. Then $\pi_1 \circ (\!|\star,(b,c)|\!) = (\!|\oplus,b|\!)$ and $\pi_2 \circ (\!|\star,(b,c)|\!) = (\!|\otimes,c|\!)$.

Proof:

1. ($\Rightarrow$) Note that $c = (\!|\otimes,c|\!)(\varnothing) = (f \circ (\!|\oplus,b|\!))(\varnothing)$ $= f((\!|\oplus,b|\!)(\varnothing) = f(b)$. Moreover, if $a \in A$ and $b = (\!|\oplus,e_B|\!)(\ell)$, then $f(a \oplus b) = f(a \oplus (\!|\oplus,e_B|\!)(\ell))$ $= f((\!|\oplus,e_B|\!)(a,\ell))$ $= (f \circ (\!|\oplus,e_B|\!))(a,\ell) = (\!|\otimes,e_C|\!)(a,\ell)$ $= a \otimes (\!|\otimes,e_C|\!)(\ell)$ $= a \otimes (f \circ (\!|\oplus,e_B|\!))(\ell)$ $= a \otimes f((\!|\oplus,e_B|\!)(\ell)) = a \otimes f(b)$. ($\Leftarrow$) Note that $(f \circ (\!|\oplus,e_B|\!))(\varnothing) = f((\!|\oplus,e_B|\!)(\varnothing))$ $= f(e_B) = e_C$ and that $(f \circ (\!|\oplus,e_B|\!))(a,\ell) = f((\!|\oplus,e_B|\!)(a,\ell))$ $= f(a \oplus (\!|\oplus,e_B|\!)(\ell))$ $= a \otimes f((\!|\oplus,e_B|\!)(\ell))$ $= a \otimes (f \circ (\!|\oplus,e_B|\!))(\ell)$. By the uniqueness of catamorphisms, $f \circ (\!|\oplus,e_B|\!) = (\!|\otimes,e_C|\!)$.
2. We proceed by induction on the length of the input $\ell$. If $\ell = \varnothing$, then $(\!|\star,(e_B,e_C)|\!)(\varnothing) = (e_B,e_C)$ $= ((\!|\oplus,e_B|\!), (\!|\otimes,e_C|\!))$. Suppose now that the result holds for all lists of length $k$, and suppose $\ell = (a, \ell^\prime)$ has length $k+1$. Then $(\!|\star,(e_B,e_C)|\!)(a,\ell^\prime) = a \star (\!|\star,(e_B,e_C)|\!)(\ell^\prime)$ $= a \star ((\!|\oplus,e_B|\!)(\ell^\prime), (\!|\otimes,e_C|\!)(\ell^\prime))$ $= (a \star (\!|\oplus,e_B|\!)(\ell^\prime), a \star (\!|\otimes,e_C|\!)(\ell^\prime))$ $= ((\!|\oplus,e_B|\!)(\ell^\prime), (\!|\otimes,e_C|\!)(\ell^\prime))$. $\square$

The fusion and split laws for catamorphisms will be useful both for proving theorems about functions defined in terms of catamorphisms (as we will see) and for developing efficient programs for performing explicit computations.

Definition: (List Concatenation) Let $A$ be a nonempty set. We define a binary operator $\ddagger : \mathcal{L}(A) \times \mathcal{L}(A) \rightarrow \mathcal{L}(A)$ recursively as follows: $\varnothing \ddagger \ell = \ell$ and $(a,\ell^\prime) \ddagger \ell = (a, \ell^\prime \ddagger \ell)$. This oeprator is called concatenation.

Lemma: $\ddagger$ is associative. Proof: Let $(\alpha,\beta,\gamma)$ be a triple of lists in $\mathcal{L}(A)$. We proceed by induction on the length of $\alpha$. For the base case, we have $(\varnothing \ddagger \beta) \ddagger \gamma = \beta \ddagger \gamma$ $= \varnothing \ddagger (\beta \ddagger \gamma)$. For the inductive step, suppose $\ddagger$ is associative on all triples $(\alpha^\prime,\beta,\gamma)$ where $\alpha^\prime$ has length $k$, and that $\alpha$ has length $k+1$. Now $((a,\alpha^\prime) \ddagger \beta) \ddagger \gamma$ $= (a, \alpha^\prime \ddagger \beta) \ddagger \gamma$ $= (a, (\alpha^\prime \ddagger \beta) \ddagger \gamma)$ $= (a, \alpha^\prime \ddagger (\beta \ddagger \gamma))$ $= (a,\alpha^\prime) \ddagger (\beta \ddagger \gamma)$. So $\ddagger$ is associative. $\square$

Lemma: $\varnothing \ddagger \alpha = \alpha \ddagger \varnothing = \alpha$ for all lists $\alpha$. Proof: $\varnothing \ddagger \alpha = \alpha$ by definition. To see the second equality, we use induction on the length of $\alpha$. Certainly $\varnothing \ddagger \varnothing = \varnothing$. If the result holds for $\alpha^\prime$, then $(a,\alpha^\prime) \ddagger \varnothing = (a,\alpha^\prime \ddagger \varnothing)$ $= (a,\alpha^\prime)$ as desired. $\square$

So if $X$ is nonempty, the set $\mathcal {L}(X)$ with the operator $\ddagger$ is a semigroup, and in fact is a monoid with identity element $\varnothing$. We define the free monoid on $X$ to be $\mathsf{F}_1(X) = (\mathcal{L}(X), \ddagger)$.

Lemma: $\mathsf{len}(\alpha \ddagger \beta) = \mathsf{len}(\alpha) + \mathsf{len}(\beta)$. Proof: By induction on the length of $\alpha$. Certainly $\mathsf{len}(\varnothing \ddagger \beta) = \mathsf{len}(\beta)$ $= \mathsf{len}(\varnothing) + \mathsf{len}(\beta)$. If the result holds for $\alpha^\prime$, then $\mathsf{len}((a,\alpha^\prime) \ddagger \beta) = \mathsf{len}((a, \alpha^\prime \ddagger \beta))$ $= 1 + \mathsf{len}(\alpha^\prime \ddagger \beta)$ $= 1 + \mathsf{len}(\alpha^\prime) + \mathsf{len}(\beta)$ $= \mathsf{len}((a,\alpha^\prime)) + \mathsf{len}(\beta)$. $\square$

Lemma: $\mathsf{F}_0(X)$ has no zero divisors. Proof: If $\alpha \ddagger \beta = \varnothing$, then comparing lengths, $\alpha = \beta = \varnothing$. $\square$

Since $\mathsf{F}_0(X)$ has no zero divisors, the set $\mathsf{F}(X) = \mathsf{F}_0(X) \setminus \{\varnothing\}$ is a subsemigroup. We call $\mathsf{F}(X)$ the free semigroup on $X$.

We are finally in a position to answer the original problem: what are the orders of elements in $\mathsf{F}(X)$? Since every element of $\mathsf{F}(X)$ has positive length, and since length is additive over products, the powers of any element $\alpha \in \mathsf{F}(X)$ have monotonically increasing length. That is, the subsemigroup generated by $\alpha$ is infinite. So every element of $\mathsf{F}(X)$ has infinite order.

Next we can consider the kernel of $\mathsf{F}(X)$. Note that for any word $\alpha$ of length $k$, every element of the two sided ideal $J(\alpha)$ has length at least $k$. As a consequence, for any element $\alpha \in \mathsf{F}(X)$ of length $k$, we can construct an ideal in which every element has length at least $k+1$– and so the ideal does not include $\alpha$. In particular, the intersection of all the ideals in $\mathsf{F}(X)$ is empty. So this semigroup has no kernel.