A data structure that I've been more and more interested in recently is the disjoint set data structure or union-find. https://en.wikipedia.org/wiki/Disjoint-set_data_structure It's used in egraphs, unification, prolog, and graph connectivity.

I realized a cute representation that is easy to use and prove stuff about. It is however very inefficient compared to the usual version of disjoint set (outrageously so since the inverse Ackermann complexity of union find is a crown jewel of algorithms), hence the expectation lowering title of the post. It's linear time in the number of unions I believe. Oh well. It uses a simple functional representation based off the observation that the preimages of a function form disjoint sets. This representation is somewhat analogous to using functions to represent Maps in Coq like in Software Foundations. This is another inefficient but very convenient representation. https://softwarefoundations.cis.upenn.edu/lf-current/Maps.html

The nice thing about this is that it really avoids any termination stickiness. Termination of the find_root operation of a union find in an explicit array or tree is not at all obvious and requires side proof or a refined type. The termination ultimately comes from the fact that you used a finite number of unions to construct the disjoint sets.

Code of post here https://gist.github.com/philzook58/82f4a22c194587e3f63904d33c0f2b3d

Definitions

Require Import Arith.

For convenience we define disjoint sets of nats. See comments at end of post for some other options

Definition ds := nat -> nat.

The completely disjoint set is the identity function

Definition init_ds : ds := fun x => x.

find_root is just application

Definition find_root (g : ds) x := g x.

in_same_set is just nat equality checking

Definition in_same_set (g : ds) x y := 
    Nat.eq_dec (g x) (g y).

And finally the only interesting operation is union. Some comments: It is useful to lift the find_root operations out of the body of the returned function to compute them eagerly and share the result. This version unfortunately means that the cost of a find_root operation becomes proportional to the number of union operations used to construct the ds

Definition union (g : ds) x y : ds :=
    let px := find_root g x in
    let py := find_root g y in
    fun z =>
    let pz := find_root g z in
    if Nat.eq_dec px pz
    then py
    else pz.

Some proofs

I couldn't find this useful lemma in the standard library. Maybe I just missed it?


forall x : nat, exists p : x = x, Nat.eq_dec x x = left p
x:nat

exists p : x = x, Nat.eq_dec x x = left p
x:nat
e:x = x

exists p : x = x, left e = left p
x:nat
n:x <> x
exists p : x = x, right n = left p
x:nat
e:x = x

left e = left ?p
x:nat
n:x <> x
exists p : x = x, right n = left p
x:nat
n:x <> x

exists p : x = x, right n = left p
x:nat
n:x = x -> False

exists p : x = x, right n = left p
x:nat
n:x = x -> False

False
x:nat
n:x = x -> False

x = x
reflexivity. Qed.

A useful definition is In_Same_Set, which states that two nats x and y are in the same set if the in_same_set function returns a left

Definition In_Same_Set g x y := exists p, 
   in_same_set g x y = left p.

An element is always in the same set with itself


forall (g : ds) (x : nat), In_Same_Set g x x
g:ds
x:nat

In_Same_Set g x x
g:ds
x:nat

exists p : g x = g x, Nat.eq_dec (g x) (g x) = left p
g:ds
x:nat
e:g x = g x

exists p : g x = g x, left e = left p
g:ds
x:nat
n:g x <> g x
exists p : g x = g x, right n = left p
g:ds
x:nat
e:g x = g x

exists p : g x = g x, left e = left p
g:ds
x:nat
e:g x = g x

left e = left ?p
reflexivity.
g:ds
x:nat
n:g x <> g x

exists p : g x = g x, right n = left p
g:ds
x:nat
n:g x <> g x

False
g:ds
x:nat
n:g x <> g x

g x = g x
reflexivity. Qed.

The In_Same_Set g relation is symmettric.


forall (g : ds) (x y : nat), In_Same_Set g x y -> In_Same_Set g y x

forall (g : ds) (x y : nat), (exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p) -> exists p : g y = g x, Nat.eq_dec (g y) (g x) = left p
g:ds
x, y:nat
H:exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p

exists p : g y = g x, Nat.eq_dec (g y) (g x) = left p
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0

exists p : g y = g x, Nat.eq_dec (g y) (g x) = left p
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
e:g y = g x
E:Nat.eq_dec (g y) (g x) = left e

exists p : g y = g x, left e = left p
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
n:g y <> g x
E:Nat.eq_dec (g y) (g x) = right n
exists p : g y = g x, right n = left p
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
e:g y = g x
E:Nat.eq_dec (g y) (g x) = left e

exists p : g y = g x, left e = left p
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
e:g y = g x
E:Nat.eq_dec (g y) (g x) = left e

left e = left ?p
reflexivity.
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
n:g y <> g x
E:Nat.eq_dec (g y) (g x) = right n

exists p : g y = g x, right n = left p
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
n:g y <> g x
E:Nat.eq_dec (g y) (g x) = right n

False
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
n:g y <> g x
E:Nat.eq_dec (g y) (g x) = right n

g y = g x
g:ds
x, y:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
n:g y <> g x
E:Nat.eq_dec (g y) (g x) = right n

g y = g y
reflexivity. Qed.

The In_Same_Set g relation is transtive.


forall (g : ds) (x y z : nat), In_Same_Set g x y -> In_Same_Set g y z -> In_Same_Set g x z

forall (g : ds) (x y z : nat), (exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p) -> (exists p : g y = g z, Nat.eq_dec (g y) (g z) = left p) -> exists p : g x = g z, Nat.eq_dec (g x) (g z) = left p
g:ds
x, y, z:nat
H:exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
H0:exists p : g y = g z, Nat.eq_dec (g y) (g z) = left p

exists p : g x = g z, Nat.eq_dec (g x) (g z) = left p
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
H0:exists p : g y = g z, Nat.eq_dec (g y) (g z) = left p

exists p : g x = g z, Nat.eq_dec (g x) (g z) = left p
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1

exists p : g x = g z, Nat.eq_dec (g x) (g z) = left p
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
e:g x = g z
E:Nat.eq_dec (g x) (g z) = left e

exists p : g x = g z, left e = left p
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
n:g x <> g z
E:Nat.eq_dec (g x) (g z) = right n
exists p : g x = g z, right n = left p
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
e:g x = g z
E:Nat.eq_dec (g x) (g z) = left e

exists p : g x = g z, left e = left p
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
e:g x = g z
E:Nat.eq_dec (g x) (g z) = left e

left e = left ?p
reflexivity.
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
n:g x <> g z
E:Nat.eq_dec (g x) (g z) = right n

exists p : g x = g z, right n = left p
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
n:g x <> g z
E:Nat.eq_dec (g x) (g z) = right n

False
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
n:g x <> g z
E:Nat.eq_dec (g x) (g z) = right n

g x = g z
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
n:g x <> g z
E:Nat.eq_dec (g x) (g z) = right n

g y = g z
g:ds
x, y, z:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0
x1:g y = g z
H0:Nat.eq_dec (g y) (g z) = left x1
n:g x <> g z
E:Nat.eq_dec (g x) (g z) = right n

g z = g z
reflexivity. Qed.

The initial disjoint set only has equal elements in the same set


forall x y : nat, x <> y -> exists p : init_ds x <> init_ds y, in_same_set init_ds x y = right p
x, y:nat
H:x <> y

exists p : init_ds x <> init_ds y, in_same_set init_ds x y = right p
x, y:nat
H:x <> y
e:init_ds x = init_ds y

exists p : init_ds x <> init_ds y, left e = right p
x, y:nat
H:x <> y
n:init_ds x <> init_ds y
exists p : init_ds x <> init_ds y, right n = right p
x, y:nat
H:x <> y
e:init_ds x = init_ds y

False
x, y:nat
H:x <> y
n:init_ds x <> init_ds y
exists p : init_ds x <> init_ds y, right n = right p
x, y:nat
H:x <> y
e:init_ds x = init_ds y

x = y
x, y:nat
H:x <> y
n:init_ds x <> init_ds y
exists p : init_ds x <> init_ds y, right n = right p
x, y:nat
H:x <> y
e:x = y

x = y
x, y:nat
H:x <> y
n:init_ds x <> init_ds y
exists p : init_ds x <> init_ds y, right n = right p
x, y:nat
H:x <> y
n:init_ds x <> init_ds y

exists p : init_ds x <> init_ds y, right n = right p
x, y:nat
H:x <> y
n:init_ds x <> init_ds y

right n = right ?p
reflexivity. Qed.

After unioning x and y, they are now in the same set. This proof could be tightened up and automated more.


forall (g : ds) (x y : nat), In_Same_Set (union g x y) x y
g:ds
x, y:nat

In_Same_Set (union g x y) x y
g:ds
x, y:nat

exists p : (if Nat.eq_dec (g x) (g x) then g y else g x) = (if Nat.eq_dec (g x) (g y) then g y else g y), Nat.eq_dec (if Nat.eq_dec (g x) (g x) then g y else g x) (if Nat.eq_dec (g x) (g y) then g y else g y) = left p
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
e0:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e0
e1:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e1

exists p : g y = g y, left e1 = left p
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
e0:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e0
n:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n
exists p : g y = g y, right n = left p
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
n:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n
e0:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e0
exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
n:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n
n0:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n0
exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
e:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e
e0:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e0
exists p : g y = g y, left e0 = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
e:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e
n0:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n0
exists p : g y = g y, right n0 = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
n0:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n0
e:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e
exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
n0:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n0
n1:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n1
exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
e0:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e0
e1:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e1

exists p : g y = g y, left e1 = left p
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
e0:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e0
e1:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e1

left e1 = left ?p
reflexivity.
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
e0:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e0
n:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n

exists p : g y = g y, right n = left p
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
e0:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e0
n:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n

False
auto.
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
n:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n
e0:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e0

exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
firstorder.
g:ds
x, y:nat
e:g x = g y
E1:Nat.eq_dec (g x) (g y) = left e
n:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n
n0:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n0

exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
firstorder.
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
e:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e
e0:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e0

exists p : g y = g y, left e0 = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
e:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e
e0:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e0

left e0 = left ?p
reflexivity.
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
e:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e
n0:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n0

exists p : g y = g y, right n0 = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
e:g x = g x
E2:Nat.eq_dec (g x) (g x) = left e
n0:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n0

False
auto.
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
n0:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n0
e:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e

exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
n0:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n0
e:g y = g y
E3:Nat.eq_dec (g y) (g y) = left e

False
auto.
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
n0:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n0
n1:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n1

exists p : g x = g y, Nat.eq_dec (g x) (g y) = left p
g:ds
x, y:nat
n:g x <> g y
E1:Nat.eq_dec (g x) (g y) = right n
n0:g x <> g x
E2:Nat.eq_dec (g x) (g x) = right n0
n1:g y <> g y
E3:Nat.eq_dec (g y) (g y) = right n1

False
auto. Qed.

Things remain in the same set after unioning any other elements


forall (g : ds) (x y z w : nat), In_Same_Set g x y -> In_Same_Set (union g z w) x y
g:ds
x, y, z, w:nat
H:In_Same_Set g x y

In_Same_Set (union g z w) x y
g:ds
x, y, z, w:nat
x0:g x = g y
H:in_same_set g x y = left x0

In_Same_Set (union g z w) x y
g:ds
x, y, z, w:nat
x0:g x = g y
H:in_same_set g x y = left x0

exists p : (if Nat.eq_dec (find_root g z) (find_root g x) then find_root g w else find_root g x) = (if Nat.eq_dec (find_root g z) (find_root g y) then find_root g w else find_root g y), in_same_set (fun z0 : nat => if Nat.eq_dec (find_root g z) (find_root g z0) then find_root g w else find_root g z0) x y = left p
g:ds
x, y, z, w:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0

exists p : (if Nat.eq_dec (find_root g z) (find_root g x) then find_root g w else find_root g x) = (if Nat.eq_dec (find_root g z) (find_root g y) then find_root g w else find_root g y), in_same_set (fun z0 : nat => if Nat.eq_dec (find_root g z) (find_root g z0) then find_root g w else find_root g z0) x y = left p
g:ds
x, y, z, w:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0

exists p : (if Nat.eq_dec (find_root g z) (find_root g x) then find_root g w else find_root g x) = (if Nat.eq_dec (find_root g z) (find_root g y) then find_root g w else find_root g y), Nat.eq_dec (if Nat.eq_dec (find_root g z) (find_root g x) then find_root g w else find_root g x) (if Nat.eq_dec (find_root g z) (find_root g y) then find_root g w else find_root g y) = left p
g:ds
x, y, z, w:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0

exists p : (if Nat.eq_dec (g z) (g x) then g w else g x) = (if Nat.eq_dec (g z) (g y) then g w else g y), Nat.eq_dec (if Nat.eq_dec (g z) (g x) then g w else g x) (if Nat.eq_dec (g z) (g y) then g w else g y) = left p
g:ds
x, y, z, w:nat
x0:g x = g y
H:Nat.eq_dec (g x) (g y) = left x0

exists p : (if Nat.eq_dec (g z) (g y) then g w else g y) = (if Nat.eq_dec (g z) (g y) then g w else g y), Nat.eq_dec (if Nat.eq_dec (g z) (g y) then g w else g y) (if Nat.eq_dec (g z) (g y) then g w else g y) = left p
apply nat_eq_refl. Qed.

Here's a small enhancement. We can define a new version of union that checks if the two elements being unioned are already in the same set.

Definition union2 (g : ds) x y : ds :=
        let px := find_root g x in
        let py := find_root g y in
        if Nat.eq_dec px py then
            g
        else
        fun z =>
        let pz := find_root g z in
        if Nat.eq_dec px pz
        then py
        else pz.
in_same_set : forall (g : ds) (x y : nat), {g x = g y} + {g x <> g y}

forall (g : ds) (x y z : nat), union2 g x y z = union g x y z
g:ds
x, y, z:nat

union2 g x y z = union g x y z
g:ds
x, y, z:nat

(if Nat.eq_dec (g x) (g y) then g else fun z : nat => if Nat.eq_dec (g x) (g z) then g y else g z) z = (if Nat.eq_dec (g x) (g z) then g y else g z)
destruct (Nat.eq_dec (g x) (g y)) eqn:E2; destruct (Nat.eq_dec (g x) (g z)) eqn:E1; congruence. Qed.

Possible Refinements

You could abstract over the things that you're forming disjoint sets over This means you need an enumeration to define init_ds

Definition ds' (a : Type) := a -> nat.

Or you can use a self map. This requires decidable equality in for checking canonical elements in the codomain

Definition ds'' (a : Type) := a -> a.

Or you could do a fun proof relevant version for equivalence relations R. Now to perform a union you'll need to supply a proof R x y

Definition ds_pf (a : Type) (R : a -> a -> Prop) := 
    forall x : a,  {y : a | R x y}.

Another possibility is to use the fairly new feature of Coq, persistent arrays

This may indeed why these were added. They do require dealing with termination though. Here's one suggestion of an encoding. I dunno if this will bite you in the butt.

Require Import PArray.

Open Scope array_scope.
Fixpoint find_root' gas (a : array (sum nat Int63.int)) 
         (i : Int63.int) :=
    match gas with
    | O => None
    | S gas' => match a .[i] with
        | inl x => Some i
        | inr j => find_root' gas' a j
    end
    end.

Record disjoint_set := {
    gas : nat; (* unions, or height *)
    parents  : array (sum nat Int63.int) ;
    term_pf : forall i, exists n, 
              (find_root' gas parents i = Some n)
}.

Built using Alectryon https://github.com/cpitclaudel/alectryon

A stack overflow question https://stackoverflow.com/questions/66630519/how-to-implement-a-union-find-disjoint-set-data-structure-in-coq/66875872#66875872