Stateful Testing
So far, we've only seem examples of very simple & trivial properties, doing little more than showcasing syntax. However, what if we're operating on some more complicated datastructure and want to check whether the operations we can perform on it uphold the invariants we expect? This too can, in a for now basic form, be done with Supposition.jl.
Juggling Jugs
Consider this example from the movie Die Hard With A Vengeance:
The problem John McClane & Zeus Carver have to solve is the well known 3L & 5L variation on the water pouring puzzle. You have two jugs, one that can hold 3L of liquid and one that can hold 5L. The task is to measure out precisely 4L of liquid, using nothing but those two jugs. Let's model the problem and have Supposition.jl solve it for us:
struct Jugs
small::Int
large::Int
end
Jugs() = Jugs(0,0)
Main.Jugs
We start out with a struct holding our two jugs; one Int
for the small jug and one Int
for the large jug. Next, we need the operations we can perform on these jugs. These are
- Filling a jug to the brim
- No partial filling! That's not accurate enough.
- Emptying a jug
- No partial emptying! That's not accurate enough.
- Pouring one jug into the other
- Any leftover liquid stays in the jug we poured from - don't spill anything!
Defining them as functions returning a new Jugs
, we get:
# filling
fill_small(j::Jugs) = Jugs(3, j.large)
fill_large(j::Jugs) = Jugs(j.small, 5)
# emptying
empty_small(j::Jugs) = Jugs(0, j.large)
empty_large(j::Jugs) = Jugs(j.small, 0)
# pouring
function pour_small_into_large(j::Jugs)
nlarge = min(5, j.large + j.small)
nsmall = j.small - (nlarge - j.large)
Jugs(nsmall, nlarge)
end
function pour_large_into_small(j::Jugs)
nsmall = min(3, j.small + j.large)
nlarge = j.large - (nsmall - j.small)
Jugs(nsmall, nlarge)
end
From the top, we have filling either jug (note that we can only fill the small jug up to 3L, and the large up to 5L), emptying either jug, and finally pouring one into the other, taking care not to spill anything (i.e., any leftovers stay in the jug we poured out of).
We can very easily now generate a sequence of operations:
using Supposition
raw_ops = (fill_small, fill_large, empty_small, empty_large, pour_small_into_large, pour_large_into_small)
gen_ops = Data.Vectors(Data.SampledFrom(raw_ops))
example(gen_ops)
5-element Vector{Function}:
empty_small (generic function with 1 method)
empty_small (generic function with 1 method)
empty_small (generic function with 1 method)
empty_large (generic function with 1 method)
empty_small (generic function with 1 method)
Generating a sequence of operations is simply generating a vector from all possible ones! This is the input to our property. We declare that for all sequences of operations we can do with a Jug
, all invariants we expect must hold true.
So far, we've only seen properties (i.e., predicate functions that return a Bool
) that we'd like to test with @check
. Here, we use an invariant, which is a predicate function that must always be true
for all inputs, no matter what we do with that input. If we can ever find an input where the invariant isn't true
, we have found a counterexample for the overall property!
Speaking of invariants, we need three of them that must be preserved at all times:
- The small jug must ALWAYS have a fill level between 0 and 3 (inclusive).
- The large jug must ALWAYS have a fill level between 0 and 5 (inclusive).
- The large just must NEVER have a fill level of exactly 4.
The first two predicates are a result of the problem statement, which defines that we have two jugs of different sizes with some fill level. The upper bound is from either being large or small, and the lower bound of 0
is from not being able to have a negative fill level.
The last invariant may look a bit odd, but remember that Supposition.jl is trying to find a falsifying example. The first two invariants are sanity checks to make sure that our pouring functions are well behaved; the last invariant is the solution we want to find, by combining the operations above in an arbitrary order. Let's translate these into functions as well:
small_jug_invariant(j::Jugs) = 0 <= j.small <= 3
large_jug_invariant(j::Jugs) = 0 <= j.large <= 5
level_invariant(j::Jugs) = j.large != 4
invariants = (small_jug_invariant, large_jug_invariant, level_invariant)
And now, to finally combine all of these:
@check function solve_die_hard(ops = gen_ops)
jugs = Jugs()
for op in ops
# apply the rule
jugs = op(jugs)
# check invariants
for f in invariants
f(jugs) || return false
end
end
return true
end
Found counterexample
Context: solve_die_hard
Arguments:
ops::Vector{Function} = Function[Main.fill_large, Main.pour_large_into_small, Main.empty_small, Main.pour_large_into_small, Main.fill_large, Main.pour_large_into_small]
This pattern is very extensible, and a good candidate for the next UX overhaul (getting a reported failure for the target we actually want to find is quite bad UX). Nevertheless, it already works right now!
Balancing a heap
The previous example showed how we can check these kinds of operations based invariants on an immutable struct. There is no reason why we can't do the same with a mutable struct (or at least, a struct containing a mutable object) though, so let's look at another example: ensuring a heap observes its heap property. As a quick reminder, the heap property for a binary heap is that each child of a node is <=
than that node, resulting in what's called a "Max-Heap" (due to the maximum being at the root). Similarly, if the property for children is >=
, we get a "Min-Heap". Here, we're going to implement a Min-Heap.
First, we need to define our datastructure:
struct Heap{T}
data::Vector{T}
end
Heap{T}() where T = Heap{T}(T[])
as well as the usual operations (isempty
, push!
, pop!
) on that heap:
isempty
: Whether the heap has elementspush!
: Put an element onto the heappop!
: Retrieve the smallest element of the heap (i.e., remove the root)
Written in code, this might look like this:
Base.isempty(heap::Heap) = isempty(heap.data)
function Base.push!(heap::Heap{T}, value::T) where T
data = heap.data
push!(data, value)
index = lastindex(data)
while index > firstindex(data)
parent = index >> 1
if data[parent] > data[index]
data[parent], data[index] = data[index], data[parent]
index = parent
else
break
end
end
heap
end
Base.pop!(heap::Heap) = popfirst!(heap.data)
In this implementation, we're simply using an array as the backing store for our heap. The first element is the root, followed by the left subtree, followed by the right subtree. As implemented, pop!
will return the correct element if the heap is currently balanced, but because pop!
doesn't rebalance the heap after removing the root, pop!
may leave it in an invalid state. A subsequent pop!
may then remove an element that is not the smallest currently stored.
We can very easily test this manually:
using Supposition
intvec = Data.Vectors(Data.Integers{UInt8}())
@check function test_pop_in_sorted_order(ls=intvec)
h = Heap{eltype(ls)}()
# push all items
for l in ls
push!(h, l)
end
# pop! all items
r = eltype(ls)[]
while !isempty(h)
push!(r, pop!(h))
end
# the pop!ed items should be sorted
r == sort(ls)
end
Found counterexample
Context: test_pop_in_sorted_order
Arguments:
ls::Vector{UInt8} = UInt8[0x00, 0x01, 0x00]
And as expected, the minimal counterexample is [0x0, 0x1, 0x0]
. We first pop!
0x0
, followed by 0x1
while it should be 0x0
again, and only then 0x1
, resulting in [0x0, 0x0, 0x1]
instead of [0x0, 0x1, 0x0]
.
Replacing this with a (presumably) correct implementation looks like this:
function fixed_pop!(h::Heap)
isempty(h) && throw(ArgumentError("Heap is empty!"))
data = h.data
isone(length(data)) && return popfirst!(data)
result = first(data)
data[1] = pop!(data)
index = 0
while (index * 2 + 1) < length(data)
children = [ index*2+1, index*2+2 ]
children = [ i for i in children if i < length(data) ]
@assert !isempty(children)
sort!(children; by=x -> data[x+1])
broke = false
for c in children
if data[index+1] > data[c+1]
data[index+1], data[c+1] = data[c+1], data[index+1]
index = c
broke = true
break
end
end
!broke && break
end
return result
end
fixed_pop! (generic function with 1 method)
Me telling you that this is correct though should only be taken as well-intentioned, but not necessarily as true. There might be more bugs that have sneaked in after all, that aren't caught by our naive "pop in order and check that it's sorted" test. There could be a nasty bug waiting for us that only happens when various push!
and pop!
are interwoven in just the right way. Using stateful testing techniques and the insight that we can generate sequences of operations on our Heap
with Supposition.jl too! We're first going to try with the existing, known broken pop!
:
gen_push = map(Data.Integers{UInt}()) do i
(push!, i)
end
gen_pop = Data.Just((pop!, nothing))
gen_ops = Data.Vectors(Data.OneOf(gen_push, gen_pop); max_size=10_000)
We either push!
an element, or we pop!
from the heap. Using (pop!, nothing)
here will make it a bit easier to actually define our test. Note how the second element acts as the eventual argument to pop!
.
There's also an additional complication - because we don't have the guarantee anymore that the Heap
contains elements, we have to guard the use of pop!
behind a precondition check. In case the heap is empty, we can just consume the operation and treat it as a no-op, continuing with the next operation:
@check function test_heap(ops = gen_ops)
heap = Heap{UInt}()
for (op, val) in ops
if op === push!
# we can always push
heap = op(heap, val)
else
# check our precondition!
isempty(heap) && continue
# the popped minimum must always == the minimum
# of the backing array, so retrieve the minimum
# through alternative internals
correct = minimum(heap.data)
val = op(heap)
# there's only one invariant this time around
# and it only needs checking in this branch:
val != correct && return false
end
end
# by default, we pass the test!
# this happens if our `ops` is empty or all operations
# worked successfully
return true
end
Found counterexample
Context: test_heap
Arguments:
ops::Vector{Union{Tuple{typeof(pop!), Nothing}, Tuple{typeof(push!), UInt64}}} = Union{Tuple{typeof(pop!), Nothing}, Tuple{typeof(push!), UInt64}}[(push!, 0x0000000000000001), (push!, 0x0000000000000000), (push!, 0x0000000000000000), (pop!, nothing), (pop!, nothing)]
Once again, we find our familiar example UInt[0x0, 0x1, 0x0]
, though this time in the form of operations done on the heap:
ops = Union{Tuple{typeof(pop!), Nothing}, Tuple{typeof(push!), UInt64}}[
(push!, 0x0000000000000001),
(push!, 0x0000000000000000),
(push!, 0x0000000000000000),
(pop!, nothing),
(pop!, nothing)
]
We push three elements (0x1, 0x0 and 0x0) and when popping two, the second doesn't match the expected minimum anymore!
Now let's try the same property with our (hopefully correct) fixed_pop!
:
gen_fixed_pop = Data.Just((fixed_pop!, nothing))
gen_fixed_ops = Data.Vectors(Data.OneOf(gen_push, gen_fixed_pop); max_size=10_000)
@check test_heap(gen_fixed_ops)
Test passed
Context: test_heap
Now this is much more thorough testing!