I don't really blog anymore. Click here to go to my main website.

muhuk's blog

Nature, to Be Commanded, Must Be Obeyed

January 29, 2017

Domain Specific Languages in Clojure

Lisp is an excellent choice for implementing Domain Specific Language’s (DSL for short). We are not going to focus on why (or how) Lisp is a good choice for DSLs in this post. Instead I will try to give a few interesting DSL examples written in Clojure.

The concept of DSL is rather broad and many things can be said on the subject. Some of them would, arguably, be true as well. In this post I divide DSLs into two categories; embedded and independent. An embedded DSL runs within the host environment. As a result host language is fully (or mostly) available. An independent DSL runs above the host environment. Therefore host language is not directly available. All interaction must be performed through declared interfaces. This opacity also means the host environment can be replaced without any changes to the independent DSL.

Embedded DSLs

Propaganda

Propaganda is an implementation of The Art of the Propagator paper. If you are not familiar with propagators, I suggest you start with this video. For now suffice it to say it is a DSL for constraint programming.

Here is an example from the project’s README:

(def squarer
  (function->propagator-constructor
   (fn [val] (* val val))))

(def sqrter
  (function->propagator-constructor
   (fn [val] (Math/sqrt val))))

(defn quadratic
  [x x-squared]
  (squarer x x-squared)
  (sqrter  x-squared x))

(let [x (make-cell)
      x-squared (make-cell)]
  (binding [*merge* my-merge]
    (quadratic x x-squared)
    (add-content x 10.0)
    (get-content x-squared)))
;; => 100.0

Above code defines a relationship between x and x-squared. Last block sets the value for x and queries x-squared. It could have been run the other direction to figure out x from the value of x-squared.

squarer and sqrter are plain old Clojure functions. But quadratic and also the body of the binding in the last form is a departure from the everyday Clojure code. It is high level expression of the solution using the terms of the domain.

Quil

Quil is succintly explained with this little pseudo code; (mix Processing Clojure). Processing here is the excellent interactive visualization library. Here is an example of defining a presentationfrom its README:

(q/defsketch example                  ;; Define a new sketch named example
  :title "Oh so many grey circles"    ;; Set the title of the sketch
  :settings #(q/smooth 2)             ;; Turn on anti-aliasing
  :setup setup                        ;; Specify the setup fn
  :draw draw                          ;; Specify the draw fn
  :size [323 200])                    ;; You struggle to beat the golden ratio

I didn’t put the whole thing here. draw and setup are plain Clojure functions that call various other functions Quil provides. Nothing unlibrarylike. However defsketch is designed as a mini embedded DSL for defining main loops.

If you are a Clojure programmer none of this should be unfamiliar for you. You are probably not calling these DSLs when you encounter them. But look again and contrast defsketch call above with the (slightly edited) definition of draw below:

(defn draw []
  (quil/stroke (quil/random 255))
  (quil/stroke-weight (quil/random 10))
  (quil/fill (quil/random 255))

Also it is no coincidence DSL code is more declarative than imperative. Declarative style is generally more suited to higher level expressions. I mean specifically lack of flow of control by that. In draw above, we know quil/stroke call will be executed before quil/fill. And it matters. But in defsketch call there is no such ordering. Under the hood there might be, but it does not matter.

LaTTe

LaTTe is a proof assistant library. I must admit I had not used it myself before. But it is such a good example of an embedded DSL, I could not leave it out of this post. Running the example in its README produces following result:

(defthm impl-refl
  "Implication is reflexive."
  [[A :type]]
  (==> A A))
;; => #latte.kernel.defenv.Theorem{:name impl-refl,
;;                                 :params [[A ✳]],
;;                                 :arity 1,
;;                                 :type (Π [⇧ A] A),
;;                                 :proof false}

(proof impl-refl
       :script
       (assume [x A]
         (have concl A :by x)
         (qed concl)))
;; => [:qed impl-refl]

;; Now the value of impl-refl has changed:
;; => #latte.kernel.defenv.Theorem{:name impl-refl,
;;                                 :params [[A ✳]],
;;                                 :arity 1,
;;                                 :type (Π [⇧ A] A),
;;                                 :proof (λ [x A] x)}

Notice that in the code above (and also the earlier examples) accidental complexity is pushed below to the implementation and the problem is defined in the domain’s essential terms. Contrast this with core.async’s go macro and channel operations; they add a new way of doing concurrency to Clojure language. But it’s still an accidental part of your code.

External DSLs

YeSPARQL

YeSPARQL is an implementation of SPARQL (SQL for RDF) in Clojure. As you can see form the example in its README, query itself is written in SPARQL and read from a file:

(require '[yesparql.core :refer [defquery]])

(defquery select-intellectuals "some/where/query.sparql"
  {:connection "http://dbpedia.org/sparql"})

(with-open [result (select-intellectuals)]
  (do-something-with-result! result))

As I mentioned above, host language is not available to external DSLs. Technically it is possible to evaluate query.sparql in the active Clojure context. The reason why this is not the case is not technical though. SPARQL is preferred over some embedded DSL because it is already a familiar and mature DSL. Once SPARQL query does its job, Clojure function do-something-with-result! take the stage. Of course you can also take the same script and run it using another engine.

Instaparse

Final library we will look at is one of my favorite. Instaparse is a parser generator. It builds parsers from a BNF description. For example YeSPARQL uses Instaparse to to parse SPARQL:

queries = <blank-line*> query*
query = name docstring? statement

docstring = comment+

statement = line (line | <comment>)*

name = <whitespace? COMMENT_MARKER whitespace? NAME_TAG whitespace?> non-whitespace <whitespace? newline>
comment = <whitespace? COMMENT_MARKER whitespace?> !NAME_TAG (non-whitespace whitespace?)* newline
line = whitespace? !COMMENT_MARKER (non-whitespace whitespace?)* newline

COMMENT_MARKER = '--'
NAME_TAG = "name:"

blank-line = whitespace* newline
any = (whitespace | non-whitespace)+
newline = '\n' | '\r\n'
whitespace = (' ' | '\t')+
non-whitespace = #'\S+'

Unless your language calls for full host support, such as the ability define functions within, I would suggest bootstrapping your DSL using Instaparse. Just write some small snippets and see how their AST’s look like. Can you work with that? Do this before building your own parser bottom up or smashing regexes into one another. If you have existing functionality that you want to access to, perhaps going the embedded route is better. You can still use Instaparse to play with the language.

If you have any questions, suggestions or corrections feel free to drop me a line.