I don't really blog anymore. Click here to go to my main website.

muhuk's blog

Nature, to Be Commanded, Must Be Obeyed

April 18, 2016

Performance Comparison of Annotate, Herbert & Schema

Since last month I have made some improvements to the validation-benchmark code. I have cleaned it up and added some more benchmarks. Steve Miner was kind enough to fix my herbert benchmark implementation.

UPDATE: Jason Wolfe was kind enough to fix Schema implementation. Results below and the comments about Schema being slow is no longer true. See this pull request for the results Jason got with quick mode benchmark.

The benchmarks are still lacking complex schemas. But I think it covers the basics and complex schemas are compound versions of basic schemas. So I run it once more:

$ lein run
...

+-------------------------------------+-----------+------------+
| Test name                           | Library   | Mean (ns)  |
+-------------------------------------+-----------+------------+
| [:associative-collections :valid]   | :annotate |  25108.219 |
| [:associative-collections :valid]   | :herbert  |  19976.914 |
| [:associative-collections :valid]   | :schema   |  36178.701 |
| [:associative-collections :invalid] | :annotate |  36786.849 |
| [:associative-collections :invalid] | :herbert  |  16434.656 |
| [:associative-collections :invalid] | :schema   | 148398.967 |
| [:atomic-values :valid]             | :annotate |   1172.294 |
| [:atomic-values :valid]             | :herbert  |   6169.575 |
| [:atomic-values :valid]             | :schema   |   2098.525 |
| [:atomic-values :invalid]           | :annotate |   9078.756 |
| [:atomic-values :invalid]           | :herbert  |   6920.640 |
| [:atomic-values :invalid]           | :schema   | 104190.919 |
| [:custom-predicate :valid]          | :annotate |   9961.961 |
| [:custom-predicate :valid]          | :herbert  |  30388.447 |
| [:custom-predicate :valid]          | :schema   | 540614.396 |
| [:custom-predicate :invalid]        | :annotate |  10794.146 |
| [:custom-predicate :invalid]        | :herbert  |  10503.089 |
| [:custom-predicate :invalid]        | :schema   | 607110.002 |
| [:nil-allowed :valid]               | :annotate |   7118.476 |
| [:nil-allowed :valid]               | :herbert  |  10706.036 |
| [:nil-allowed :valid]               | :schema   |  22234.187 |
| [:nil-allowed :invalid]             | :annotate |  45962.879 |
| [:nil-allowed :invalid]             | :herbert  |  32337.527 |
| [:nil-allowed :invalid]             | :schema   | 333949.845 |
| [:sequential-collections :valid]    | :annotate |  23630.737 |
| [:sequential-collections :valid]    | :herbert  |  35883.094 |
| [:sequential-collections :valid]    | :schema   |  51283.213 |
| [:sequential-collections :invalid]  | :annotate |  35591.975 |
| [:sequential-collections :invalid]  | :herbert  |  24918.544 |
| [:sequential-collections :invalid]  | :schema   | 393485.225 |
+-------------------------------------+-----------+------------+

The raw data is here. Graphical representation of the results are below:

Benchmark results chart.

Lastly relative performance chart. For every row 1.0 is the fastest timing and the other values are relative to that:

Comparison.

I did not add totals for all benchmarks or valids/invalids to the table above. Adding the timings of two benchmarks would produce a meaningless result. Each use case has a different frequency of use in real world. I would rather not declare an overall winner than set arbitrary standards for performance.

Conclusion

Performance-wise, Annotate and Herbert seems to be on par. The reason why Schema is behind, especially when dealing with invalid values, is probably because it is throwing an exception when the validation fails. I don’t think throwing an exception is inappropriate or unfunctional in the second use case[1]. However when dealing with external input you will probably need to catch that exception near the function that calls for the validation rather than delegating exception handling to somewhere down in the stack. This defeats the purpose of using an exception in the first place. A value would work just as well, without the performance hit of throwing an exception.

When validation fails Herbert just tells it has failed, Schema and Annotate provide an explanation why. A notable detail is they both mention the original input in the explanation.

Having worked with these three libraries for a short while; I think each schema definition DSLs is reasonably well designed. Their capabilities seem to differ in minor ways but I can’t think of a validation use case that is not supported. Herbert’s DSL differs from the other two by using only data structures in core language. Both Annotate and Schema makes use of records/types they define.

My personal choice after all is Herbert. For now. It does not support function or value annotations and I see this as a feature. Annotations are fantastic for static checkers (like core.typed) because they can be isolated from the rest of the code. But for runtime validation it either means custom def forms or replacing vars. In either case it is too invasive for my taste. I would like to avoid it. On the other hand calling a function to validate some value is idiomatic. Aside from this Herbert is fast and I have found its codebase easy to understand. This is not a recommendation yet, since the benchmark has only few of the existing validation libraries.


[1]See the previous post on validation benchmark.

If you have any questions, suggestions or corrections feel free to drop me a line.