One of the many advantages of programming in a functional style (by this, I mean manipulating your data through the operations, map, fold, and filter) is that your program winds up being made up a bunch of tiny and composable pieces. Since each piece is so small, usually only a few lines each, it becomes trivial to unit test the entire program. Additionally, it is easy to express new features as just the composition of several existing functions. One disadvantage of programming through map and friends, is that there is fairly large time penalty for allocating the intermediate results. For example, every time filter is called on a list, a new list needs to be allocated. These costs add up pretty quickly and can make a functional program much slower than its imperative equivalent.
One solution to this problem is laziness. Instead of allocating a new list every time an operation is performed on a list, you instead keep track of all of the transformations made on the list. Then when you fold over the list, you perform all of the transformations as you are folding over it. By doing this, you dont need to allocate intermediate lists. Although laziness doesn’t allocate any intermediate lists, there is still a small cost for keeping track of the laziness. An alternative solution that makes functional programming just as fast as imperative programming is provided by the Series library.1 Series lets you write your program in a functional style without any runtime penalty at all!
Personally, the Series library is my favorite example of the magic that can be pulled off with macros. In short, Series works by taking your functional code and compiling it down into a single loop. In this loop, there is one step per transformation performed on the original list. The loop iterates over the values of the original sequence on at a time. On each iteration, the loop takes a single element, performs all of the transformations performed on the list on that single element, and then accumulates that value into the result according to the folding operation. This loop requires no additional memory allocation at runtime, and their is no time penalty either! As an example, here is a program that sums the first N squares, written using Series:
(defun integers () "Returns a 'series' of all of the natural numbers." (declare (optimizable-series-function)) (scan-range :from 1)) (defun squares () "Returns a 'series' of all of the square numbers." (declare (optimizable-series-function)) (map-fn t (lambda (x) (* x x)) (integers))) (defun sum-squares (n) "Returns the sum of the first N square numbers." (collect-sum (subseries (squares) 0 n))) (sum-squares 10) => 385
The above code certainly looks functional, there are no side effects in sight. Now lets look at the code generated by Series. Here is what the macroexpansion of collect-sum looks like:
(common-lisp:let* ((#:out-969 n)) (common-lisp:let ((#:numbers-966 (coerce-maybe-fold (- 1 1) 'number)) #:items-967 (#:index-965 -1) (#:sum-959 0)) (declare (type number #:numbers-966) (type (integer -1) #:index-965) (type number #:sum-959)) (tagbody #:ll-970 (setq #:numbers-966 (+ #:numbers-966 (coerce-maybe-fold 1 'number))) (setq #:items-967 ((lambda (x) (* x x)) #:numbers-966)) (incf #:index-965) (locally (declare (type nonnegative-integer #:index-965)) (if (>= #:index-965 #:out-969) (go end)) (if (< #:index-965 0) (go #:ll-970))) (setq #:sum-959 (+ #:sum-959 #:items-967)) (go #:ll-970) end) #:sum-959))
What series does it looks at the entire lifetime of the sequence from its creation until it is folded. It uses this information to build the above loop which simultaneously generates the original sequence, maps over it, filters elements out of it, and folds it into the final result. Here is the breakdown of the expansion. Lines 1-9 are just initialization. They define all of the variables the loop will be using and set them to their starting values. The important variables to keep track of are #:NUMBERS-966, #:ITEMS-967, and #:SUM-959. As the code iterates over the original sequence, #:NUMBERS-966 is the value of the original sequence, #:ITEMS-967 is the square of that value, and #:SUM-959 is the sum of the squares so far. The rest of the code is the actual loop.
The loop first takes #:NUMBERS-966, the previous value of the sequence, and increments it in order to set it to current value of the sequence (since the sequence is the range from 1 to infinity). Next the loop takes the square of #:NUMBERS-966 to get the ith square number and stores that in #:ITEMS-967. Then the loop checks if it ha taken more than N elements out of the sequence, and if so, terminates. Finally the loop takes the value in #:ITEMS-967 and accumulates that into #:SUM-959.
Although the imperative version is equivalent to the original functional code, it is much faster than the functional code if the functional code were to allocate intermediate results or use laziness. This idea of turning transformations on a list into a loop doesn’t just work for this simple example, it also works for much more complicated programs. I just find it incredible that Series is able to take such pretty code and compile it into code that is extremely fast.
- The technique used by Series is sometimes referred to as stream fusion.