Matching in Clojure

I was trying to write a PDDL parser in Clojure, and realized it could be handy to have something that does the opposite of `(~x ~@y). In other words, it takes a two versions of a syntax-tree, one with variables and one without, and returns the unique binding for the variables that will produce the given expansion (or throws an error otherwise). Here’s the code I came up with:

(defn match-vars   "Return a seq of the variables mentioned in the tree."
  [var-tree]
  (cond (not (coll? var-tree)) nil
	(= (first var-tree) 'unquote) [(second var-tree)]
	(and (coll? (first var-tree)) (= (ffirst var-tree) 'unquote-seq)) [(second (first var-tree))]
	:else (concat (match-vars (first var-tree)) (match-vars (rest var-tree)))))
 
(defn match-mapping "Return a map of variable bindings for this matching, or 
                     throw an error if a matching is not possible."
  [var-tree match-tree]
  (cond (not (coll? var-tree))
	  (when (not= var-tree match-tree)
	    (throw (Exception. (str "Bad Match: " var-tree " " match-tree))))
	(= (first var-tree) 'unquote)
	  {(second var-tree) match-tree}
	(and (coll? (first var-tree)) (= (ffirst var-tree) 'unquote-seq))
	  {(second (first var-tree)) match-tree}
	(not (coll? match-tree))
	  (throw (Exception. (str "Bad Match: " var-tree " " match-tree)))
	:else 
	  (merge (match-mapping (first var-tree) (first match-tree))
		 (match-mapping (rest var-tree) (rest match-tree)))))	
 
(defmacro match "Take a var-tree with (unquote x) and (unquote-seq y) expressions
                 and match it with match-tree, binding these variables, and
                 throwing an exception if a valid matching cannot be found."
  [[var-tree match-tree] & body]
  (let [vars (match-vars var-tree) g (gensym)]
    `(let [~g (match-mapping '~var-tree ~match-tree)]
       (let ~(apply vector (mapcat #(vector % `(safe-get ~g '~%))  vars))
	 ~@body))))
 
user> (match [[a [b [unquote-seq x]] [unquote y]] 
	     '[a [b c d] [c d e f]]] 
	[x y])
[(c d) [c d e f]]

In the future I think I’ll need to extend this to support unordered sets of subexpressions and optional subexpressions; if and when I write that code, I’ll post it here.

Update: Per this post on the group, Clojure now supports unquoting of unquoted expressions, so I’ll also be updating the match code to take advantage of this at some point.

4 Comments

Randall Schulz

Jason,

This sounds a bit like unification, a well-known algorithm used in automatic theorem provers. Unification does two-way matching in the sense that variables on either side are bound consistently. It is symmetric w.r.t. argument order, so there is not a pattern vs. target distinction in unification. The result of unification, a “substitution” that assigns values to all the variables in either input, will if applied to each of the inputs yield the same formula (tree) from each of them (that’s its definition).

So to the extent this is an accurate characterization of your problem, you can avoid re-inventing it and just find one of the well-known unification algorithms and implement that in Clojure.

Randall Schulz

January 3, 2009 Reply
- admin
  
  Randall,
  
  Good point … a unification algorithm would be a useful thing to have, especially since I’ll probably end up implementing some first-order logic stuff at some point. If I do implement that, I’ll make sure to post it. What I had in mind for match was a bit different though: more of an easy way to parse structured documents, including some features that have no correlate in unification (i.e., a set in the match template should match any permutation of the items in the target, optional sections, etc.). I’m sure there are more powerful and general tools for doing this sort of thing too, but anyway I did this as much as an interesting exercise as anything else.
  
  Thanks,
  Jason
  
  January 3, 2009 Reply
Robert Goldman

Well, actually, pattern-matching is a widely recognized simplification of unification in which variables can appear in only one of the two sides of the operation. See, for example, Norvig’s Paradigms of AI Programming. Pattern matching is often useful in expert systems (e.g., for forward-chaining).

One thing I was wondering was why there are these (to me) odd “unquote” things floating around in here. In standard lisp versions of this, one would see something like

(match ‘(a (b . ?x) ?y) ‘(a (b c d) (c d e f)))

To me “unquote” doesn’t say “[logical] variable.” Is there something about Clojure that makes this idiomatic?

This is, btw, a place where Clojure’s decision not to permit user-defined reader macros seems problematic. Implementing ? as a reader macro that makes a variable type object would be a nice thing to do in a CL pattern-matcher.

January 9, 2009 Reply
- Jason Wolfe
  
  Thanks for your post.
  
  I used [unquote x] and [unquote-seq x] as placeholders for ~ and ~@ (i.e., lisp , and ,@), which the next version of Clojure will support outside of a syntax-quote (i.e., lisp back-quote), producing some sort of special “Unquote” or “UnquoteSplicing” object. When that comes out, I’ll replace [unquote x] and [unquote-seq x] with ~x and ~@x. This will have the nice symmetry that (= (match [var-expr match-expr] `var-expr) match-expr), so that match will be the dual of syntax-quote. So, nothing idiomatic about this yet, although when the update comes out ~ and ~@ may become idiomatic for this purpose. (And yes, in Lisp I could define my own reader macro here; this is probably the only place so far where I’ve missed reader macros a little bit.)
  
  You’re right that as above, match is just a shoddy implementation of unification. I’ve already implemented and am using an improved version, however, which goes beyond unification towards tree-regular-expressions, allowing matching of unordered set expressions, and optional or repeated subexpressions. I’ve found this to be exactly what I need to do simple parsing of many structures, while maintaining a nice and easy-to-read syntax. I’ll post this soon, probably after the above Clojure update happens.
  
  January 9, 2009 Reply

w01fe

Matching in Clojure

4 Comments

Randall Schulz

admin

Robert Goldman

Jason Wolfe

Leave a Reply