Performance Tuning Clojure Generative Tests
Generative testing is using semi-random data instead of preset inputs and outputs in tests. Clojure’s QuickCheck port test.check provides this functionality[1]. While test.check makes it quite easy to write generative tests, it takes a bit of hand optimization to run avoid exponentially increased run times.
Before we get into the how, let’s think about why we are trying to optimize generative test’s performance. Consider this test:
(deftest zero-is-identity-element-for-addition
(is (= 0 (+ 0 0)))
(is (= 1 (+ 1 0)))
(is (= 2 (+ 2 0)))
(is (= 5 (+ 5 0)))
(is (= 10 (+ 10 0)))
(is (= -1 (+ -1 0)))
(is (= -10 (+ -10 0))))
We can turn this into a test.check generative test like so:
(defspec zero-is-identity-element-for-addition
1000
(prop/for-all [x (gen/int)]
(is (= x (+ x 0)))))
This test encodes the invariant (zero is the identity element) very well and I can’t think of anything to optimize here. It would run reasonably fast anyway. We are using it as an example because it has the major characteristic every other generative test has; non-determinism. You didn’t see it coming when I said random, didn’t you? :)
That number (1000) in the beginning of the definition is the sample size of the test. Each time this test is run, 1000 random integers will be generated and the body will be executed with these numbers bound to x. The upside is the generative version will most likely test all the cases in the original version in the first run, and then many more. The downside is we might never get to test all possible inputs.
We can increase the sample size to get better coverage at the cost of a slower test suite. For this particular example, one integer is as good as another so this is our only choice. For other tests we can choose more relevant inputs and make each sample count. Well, count is perhaps somewhat subjective here.
Caching
If variance of some inputs we are testing with are less critical than others we can cache a certain number of them and then reuse. Consider this test for assoc:
;; gen-map & gen-value is defined elsewhere.
(defspec assoc-with-keywords
1000
(prop/forall [[m k v] (gen/tuple gen-map gen/keyword gen-value)]
(= v
(get (assoc m k v) k))))
In this example gen/keyword[2] would need to generate 1000 different keywords. But suppose our domain expertise (!) tells us we can trade variance in keys for more variance in maps and values. gen/elements is a lot faster than gen/keyword[3]:
(def gen-key
(let [ks (gen/sample gen/keyword 30)]
(gen/elements ks)))
We would be reusing these 30 keywords for each sample but note that every time we compile we would be using a different set of keywords. As far as I understand this breaks replayability of tests, since we are calling gen/sample instead of gen/bind. I haven’t ever needed to replay a test, a shrunk input has been enough to figure out what’s wrong.
Limiting Collection Sizes
If I was generating a list of parameter names for functions, I would limit this collection to have a maximum size of 100. Even at a low number like 20 it doesn’t make sense[4]. If someone creates a function of 1000 parameters and some bug is surfaced, I would like him to come find me. We have things to talk.
Sometimes it is not so easy to determine a ceiling, but certain ranges are known to be more frequent. For instance we can say a mesh with 50-200 vertices is a low-poly model and one with 1000-5000 vertices is a high-poly[5] model:
(def gen-mesh
(gen/one-of [(gen/vector gen-vertex 50 200)
(gen/vector gen-vertex 1000 5000)]))
This chooses either the low-poly generator or the high-poly generator. But it would never generate a mesh with, say, 350 vertices. Also if we are more likely to work with high-poly meshes this generator doesn’t suit our use case. The version below is more complicated but it would generate better test data:
(def gen-mesh
(gen/frequency [[2 (gen/vector gen-vertex 1 49)]
[2 (gen/vector gen-vertex 50 199)]
[5 (gen/vector gen-vertex 200 999)]
[10 (gen/vector gen-vertex 1000 4999)]
[1 (gen/vector gen-vertex 5000 10000)]]))
Using this technique doesn’t break replayability like caching does. But unfortunately it cannot offer the same speed-up for as caching offers for the same number of samples. Instead it helps choose samples that better represent real world data.
I think there are two take aways here:
- The obvious; some generators are faster than others. A gen/keyword sample takes longer to calculate than a gen/int one.
- The less obvious; it’s OK to write your own generators. And I mean combining and constraining existing generators by this.
Let me know if you have good tips for generative testing!
[1] | There are other implementations and this post should still be useful if your are using one of them. |
[2] | gen in this post is an alias for clojure.test.check.generators. |
[3] | Sorry I don’t have any benchmarks for you. It would be cool if I had. |
[4] | Let’s assume our functions only have positional, required parameters. |
[5] | 1000 vertices is high-poly, but 5000 is no way a meaningful ceiling for real life models. Neither is 10k. |
If you have any questions, suggestions or corrections feel free to drop me a line.