Problem: Determine if a linked list contains a cycle, using O(1) space.

Robert Floyd invented an algorithm for solving this problem in the late 60s, which is called “The Tortoise and the Hare”[1].

(This is supposedly a popular question to ask in technical interviews. I’m not a fan of expecting interviewees to re-invent the algorithms of famous computer scientists on the spot).

As an exercise, I implemented the algorithm in TLA+, using PlusCal.

The algorithm itself is very simple: the hardest part was deciding how to model linked lists in TLA+. We want to model *the set of all linked lists, *to express that algorithm should work for any element of the set. However, the model checker can only work with finite sets. The typical approach is to do something like “the set of all linked lists which contain up to *N* nodes”, and then run the checker against different values of *N*.

What I ended up doing was generating *N* nodes, giving each node a successor

(a *next* element, which could be the terminal value `NIL`

), and then selecting

the start of the list from the set of nodes:

```
CONSTANTS N
ASSUME N \in Nat
Nodes == 1..N
NIL == CHOOSE NIL : NIL \notin Nodes
start \in Nodes
succ \in [Nodes -> Nodes \union {NIL}]
```

*(Note: I originally defined NIL as a constant, but this is incorrect, see the comment by Ron Pressler* for more details).

This is not the most efficient way from a model checker point of view, because the model checker will generate nodes that are irrelevant because they aren’t in the list. However, it does generate all possible linked lists up to length *N*.

Superficially, this doesn’t look like the pointer-and-structure linked list you’d see in C, but it behaves the same way at a high level. It’s possible to model a memory address space and pointers in TLA+, but I chose not to do so.

In addition, the nodes of a linked list typically have an associated value, but Floyd’s algorithm doesn’t use this value, so I didn’t model it.

Here’s my implementation of the algorithm:

```
EXTENDS Naturals
CONSTANTS N
ASSUME N \in Nat
Nodes == 1..N
NIL == CHOOSE NIL : NIL \notin Nodes
(*
--fair algorithm TortoiseAndHare
variables
start \in Nodes,
succ \in [Nodes -> Nodes \union {NIL}],
cycle, tortoise, hare, done;
begin
h0: tortoise := start;
hare := start;
done := FALSE;
h1: while ~done do
h2: tortoise := succ[tortoise];
hare := LET hare1 == succ[hare] IN
IF hare1 \in DOMAIN succ THEN succ[hare1] ELSE NIL;
h3: if tortoise = NIL \/ hare = NIL then
cycle := FALSE;
done := TRUE;
elsif tortoise = hare then
cycle := TRUE;
done := TRUE;
end if;
end while;
end algorithm
*)
```

I wanted to use the model checker to verify the the implementation was correct:

```
PartialCorrectness == pc="Done" => (cycle <=> HasCycle(start))
```

(See the Correctness wikipedia page for why this is called “partial correctness”).

To check correctness, I needed to implement my `HasCycle`

operator (without resorting to Floyd’s algorithm). I used the transitive closure of the successor function for this, which is called `TC`

here. If the transitive closure contains the pair `(node, NIL)`

, then the list that starts with `node`

does not contain a cycle:

```
HasCycle(node) == LET R == {<<s, t>> \in Nodes \X (Nodes \union {NIL}): succ[s] = t }
IN <<node, NIL>> \notin TC(R)
```

The above definition is “cheating” a bit, since we’ve defined a cycle by what it isn’t. It also wouldn’t work if we allowed infinite lists, since those don’t have cycles nor do they have *NIL* nodes.

Here’s a better definition: a linked list that starts with *node* contains a cycle if there exists some node *n* that is reachable from *node* and from itself.

```
HasCycle2(node) ==
LET R == {<<s, t>> \in Nodes \X (Nodes \union {NIL}): succ[s] = t }
IN \E n \in Nodes : /\ <<node, n>> \in TC(R)
/\ <<n, n>> \in TC(R)
```

To implement the transitive closure in TLA+, I used an existing implementation

from the TLA+ repository itself:

```
TC(R) ==
LET Support(X) == {r[1] : r \in X} \cup {r[2] : r \in X}
S == Support(R)
RECURSIVE TCR(_)
TCR(T) == IF T = {}
THEN R
ELSE LET r == CHOOSE s \in T : TRUE
RR == TCR(T \ {r})
IN RR \cup {<<s, t>> \in S \X S :
<<s, r>> \in RR /\ <<r, t>> \in RR}
IN TCR(S)
```

The full model is the lorin/tla-tortoise-hare repo on GitHub.

- Thanks to Reginald Braithwaite for the reference in his excellent book Javascript Allongé. ↩

In case you’re interested, I looked at total correctness here: https://davecturner.github.io/2018/04/11/tortoise-and-hare.html

One nitpick: defining NIL as a CONSTANT like that is incorrect (even though in this particular case, TLC would work, but that’s only because TLC doesn’t check the actual spec, but a derived one). The reason is that a CONSTANT is an unbound variable, so your partial correctness theorem is essentially: \A N, NIL : … You’d see that if you tried proving it in TLAPS. You have two options: either adding an `ASSUME NIL \notin Nodes`, in which case your theorem becomes \A N, NIL : NIL \notin Nodes => (…), or, better yet, define NIL directly as `NIL == CHOOSE NIL : NIL \notin Nodes`. Recent versions of the Toolbox would automatically generate a model value, NIL, to override that definition, so it has the advantage of being not only more correct, but less work 🙂

Thanks for pointing that out, I’ve corrected the spec with a note that links to your comment.

Also, you should add `ASSUME N ∈ Nat`, or else Nodes == 1..N could be undefined.