So far in this Clojure tutorial/NLP tutorial, we’ve mainly been looking at Clojure’s functions, but the last posting actually included a good chunk of code for the Porter Stemmer. Today, we’ll review functions. Some of this we’ve seen before, but some of it will be new.
As with any functional language, functions are the building blocks of Clojure. I’m going to briefly summarize what we’ve learned about functions so far, and then we’ll explore one important topic—recursion—in more detail.
Generally, functions are defined using
defn, followed by the name of the
function, a vector listing the parameters it takes, and the expressions in the
body of the function. You can also include a documentation string between the
function name and the parameter vector:
user=> (defn greetings "Say hello." [name] (str "Hello, " name)) #'user/greetings user=> (greetings 'Eric) "Hello, Eric"
str function here just creates strings out of all its arguments and
concatenates them together.)
Higher-order functions are functions that take other functions as values.
These may use the other function in a calculation, or they may create a new
function from the existing one. For example,
map calls a function on each
element in a sequence:
user=> (map greetings '(Eric Elsa)) ("Hello, Eric" "Hello, Elsa")
complement, however, creates a new function that is equivalent to
user=> (def not-zero? (complement zero?)) #'user/not-zero? user=> (not-zero? 0) false user=> (not-zero? 1) true
Sometimes, particularly when you’re using a higher-order function, you may
want to create a short function just for that one place, and giving it a name
would clutter up your program. Or you may want to define a function inside
another function, in a
let expression, to using only within that function.
For either of these, use a function literal. A function literal looks like a
regular function definition, except instead of
fn; the name is
optional; and generally it cannot have a documentation string. For example,
greeting, which we defined above, doesn’t exist, and we want to
greet a list of people. We could do this by creating a function literal and
passing it to
user=> (map (fn [n] (str "Hello, " n)) '(Eric Elsa)) ("Hello, Eric" "Hello, Elsa")
Or, we could temporarily define
let, and pass that to
user=> (let [greet (fn [n] (str "Hello, " n))] (map greet '(Eric Elsa))) ("Hello, Eric" "Hello, Elsa")
I’ve already mentioned that Clojure allows you to provide different versions of a function for different argument lists. Do this by grouping each set of parameter vector and expressions in its own list. The documentation string, if there is one, comes before all of the groups:
user=> (defn count-parameters "This returns the number of parameters passed to the function." ( 0) ([a] 1) ([a b] 2) ([a b c] 3) ([a b c d] 4) ([a b c d e] 5)) #'user/count-parameters user=> (count-parameters 'p0 'p1 'p2) 3 user=> (count-parameters) 0
This also works in function literals. This creates a shortened version of
cp and calls it twice within a vector. The two
calls are collected in a vector because
let only returns one value, and we
want to see the value of both calls to
user=> (let [cp (fn ( 0) ([a] 1) ([a b] 2) ([a b c] 3))] [(cp 'p0 'p1) (cp)]) [2 0]
Many problems can be broken into smaller versions of the same problem.
For example, you can test whether a list contains a particular item by looking at the first element of the list. At each place in the list, ask yourself: is the list empty? If it is, the item is not in the list. However, if the first element is what you’re looking for, good; if it’s not, strip the first element off the list and start over again.
This way of attacking problems—by calling the solution within itself—is called recursion. Recursive problems are fundamental to functional programming, so let’s look at it in more detail.
The main pitfall in creating a recursive problem is to make sure that it will end eventually. To do this, you need to make sure that you have an end condition. In the list-membership problem I described above, the end conditions are the empty list and the first element being the item you are looking for.
Let’s look at what this would look like in Clojure.
user=> (defn member? [sequence item] (if (not (seq sequence)) nil (if (= (peek sequence) item) sequence (member? (pop sequence) item)))) #'user/member?
In this, the first
if tests whether the sequence is empty (that is, if
cannot create a sequence out of it; if it is, it returns
false. The second
if tests whether the first element in the
sequence is what we’re looking for; if it is, this returns the sequence at
that point. This is a common idiom in Clojure. Since the sequence has at least
one item, it will evaluate as true, fulfilling its contract as a test, but it
returns more information than that. This function is useful for more than
just determining whether an item is in a sequence: it also finds the item and
returns it. Finally, if the first element is not the item being sought, this
member? again with everything except the first item of the list*.
This is the recursive part of the function. Let’s test it.
user=> (member? '(1 2 3 4) 2) (2 3 4) user=> (member? '(1 2 3 4) 5) nil
Sometimes it’s helpful to map this out. In the diagram below, a call within another call is indented under it. A function call returning is indicated by a “=>” and a value at the same level of indentation as the original call. Here’s the sequence of calls in the first example above.
(member? '(1 2 3 4) 2) (member? '(2 3 4) 2) => '(2 3 4) => '(2 3 4)
The second example call to
member? would graph out like this:
(member? '(1 2 3 4) 5) (member? '(2 3 4) 5) (member? '(3 4) 5) (member? '(4) 5) (member? '() 5) => nil => nil => nil => nil => nil
In this case, we’re just returning the results of the membership test back unchanged, but we could modify the results as they’re being returned. For example, if we wanted to count how many times as item occurs in a sequence, we could do this:
user=> (defn count-item [sequence item] (if (not (seq sequence)) 0 (if (= (peek sequence) item) (inc (count-item (pop sequence) item)) (count-item (pop sequence) item)))) #'user/count-item
In many ways, this is very similar to
member?. It first tests whether the
sequence is empty, and if it is, it returns zero. If the first element is the
item, this calls itself (it recurses) and increments the result by one, to
count the current item. Otherwise, it recurses, but it does not increment the
result. Let’s see this in action.
user=> (count-item '(1 2 3 2 1 2 3 4 3 2 1) 2) 4 user=> (count-item '(1 2 3 2 1 2 3 4 3 2 1) 3) 3 user=> (count-item '(1 2 3 2 1 2 3 4 3 2 1) 5) 0
Here, the first example (in shortened form) graphs out like so:
(count-item '(1 2 3 2 1) 2) (count-item '(2 3 2 1) 2) (count-item '(3 2 1) 2) (count-item '(2 1) 2) (count-item '(1) 2) (count-item '() 2) => 0 => 0 => 1 => 1 => 2 => 2
You see that the results gets incremented as the computer leaves each function call where 2 is the first element in the input list.
While these two examples of recursion are superficially similar, on a deeper
level they are very different. In
member?, once you’ve computed the result,
you can return it immediately back to the function that originally called
member?. If Clojure had some way to jump completely out of a function from
any level, you could do this. On the other hand, when
it is not finished with its calculation yet. It has to wait for itself to
return and possibly add one to the result. (Sentences like that remind me why
recursion can be confusing. Don’t worry. Eventually your brain will get used
to being twisted into a pretzel.)
There’s a term to describe functions like
member? that are finished with
their calculations when they recurse. It’s called tail-call recursion or
tail-recursive. This is a very good quality. The computer
can optimize these calls to make them very fast and very efficient.
But the Java Virtual Machine doesn’t recognize tail recursion on its own, so
Clojure needs a little help to make these optimizations. You signal a
tail-recursive function call by using the
recur built-in instead of the
function name when you recurse. Thus, we could re-write
member? to be
tail-recursive like this.
user=> (defn member? [sequence item] (if (not (seq sequence)) nil (if (= (peek sequence) item) sequence (recur (pop sequence) item)))) #'user/member?
recur in the last line? That’s all that has changed.
This should work exactly the same (and it does—try it), but for long lists, it should be much more efficient.
In fact, it’s often worth putting in a little extra work to make
non-tail-recursive functions tail-recursive. There’s a straightforward
transformation you can use to make almost any function tail recursive. Just
add an extra parameter and use it to accumulate the results before you make
the recursive function call. The first time you call the function, you need to
pass the base value into the function for that parameter. For example,
count-item with tail recursion would look like this:
user=> (defn count-item [sequence item accum] (if (not (seq sequence)) accum (if (= (peek sequence) item) (recur (pop sequence) item (inc accum)) (recur (pop sequence) item accum)))) #'user/count-item
The difference here is the
accum parameter. When
item equals the first
element of the sequence,
accum is incremented before the recursive
functional call is made. When the function starts again on the shorter list,
accum is incremented. Finally, when the end of the list is reached,
is returned. It contains the counts accumulated as
count-item walked down
Of course, now we have to call
count-item differently also:
user=> (count-item '(1 2 3 2 3 4 3 2 1) 2 0) 3 user=> (count-item '(1 2 3 2 3 4 3 2 1) 1 0) 2 user=> (count-item '(1 2 3 2 3 4 3 2 1) 5 0) 0
Whenever we call
count-item, we have to include a superfluous zero that we
really don’t care about. That seems messy and error-prone. How can we get rid
Essentially, we want to hide this version of
count-item and replace it
with a new version that handles the extra zero for us. Then, we call the new
version and forget about this one. There are several ways to actually do this:
Have the public function named
count-itemand create a private version named something like
letto define the private function inside
loop? Yes, this is new.
loop is a cross between a function call and
It looks a lot like
let because it allows you to define variables. But it
also acts as a target for
recur. How would
count-item look with
(defn count-item [sequence item] (loop [sq sequence, accum 0] (if (not (seq sq)) accum (if (= (peek sq) item) (recur (pop sq) (inc accum)) (recur (pop sq) accum))))) #'user/count-item
Notice that the first line of
loop looks a lot like the first line of
Both declare and initialize a series of variables. Just to keep things clear,
sq within the
item isn’t included
in the list of variables that
loop declares, since it doesn’t change.
Finally, at the end of the
loop are two
recur statements. When they are
evaluated, they cause the program to jump back to the
loop statement, but
this time, instead of the original values used to initialize the
variables, the values in the
recur call are used.
Now we can again call
count-item without the extra parameter:
user=> (count-item '(1 2 3 2 3 4 3 2 1) 2) 3 user=> (count-item '(1 2 3 2 3 4 3 2 1) 1) 2 user=> (count-item '(1 2 3 2 3 4 3 2 1) 5) 0
With recursion, we can define another utility to use on the
structures. This function will take a predicate and a stemmer. For each step,
it will test the stemmer with the predicate, and if true, it will pop one
character from the word and recurse. If there are no letters left in the word
or if the predicate evaluates to false, it will return the stemmer the way it
(defn pop-stemmer-on "This is an amalgam of a number of different functions: pop (it walks through the :word sequence using pop); drop-while (it drops items off while testing the sequence against drop-while); and maplist from Common Lisp (the predicate is tested against the entire current stemmer, not just the first element)." [predicate stemmer] (if (and (seq (:word stemmer)) (predicate stemmer)) (recur predicate (pop-word stemmer)) stemmer))
I’m ready for a break. Next time, We’ll look at some more of the functions that Clojure provides, and we’ll define a few predicates of our own.