08 July 2013

Clojure Data Analysis Cookbook - a Book Review

Like yogthos, I was recently asked to review Clojure Data Analysis Cookbook.  With Incanter, data analysis has been one of the "selling points" of Clojure as a practical language.  A practical lisp for practical data analysis.

(Edit 2016: a second edition is available!)

The book is very example oriented, basically being a collection of code recipes for accomplishing apparently common tasks for data analysis.  It gives you recipes to go from taking raw data in the form of CSV, JSON, or whatever, to making an Incanter dataset, to doing analysis on those datasets (e.g. clustering the data by using a self-organizing map), to saving, viewing, or charting the resultant data.  Each recipe is accompanied by brief explanations, and cross-references to other related recipes in the book.

Each recipe is more or less self-contained, without much in building on top of previous recipes.  It makes the book more "random access".  It's less a book to read through cover to cover, and more of a handy reference to use by full-text searching for key terms, clicking on the relevant topic in the table of contents, or by looking up terms in the index.  It's definitely a book I'd rather have as a PDF ebook so that I can access it anywhere in the world, and so I can do full-text search in.

Having said that, you can tell whether a book was made to be seriously used as a reference or not by looking at its index.  There are 10 pages of indices, equivalent to about 3.2% of the number of pages previous to the index.  This counts as a book to be seriously used as a reference.

As a reference book, it's great for people who have already a familiarity with Clojure (and better yet, Incanter) in general.  If you don't know Clojure, this book won't teach it to you.  If you don't know Incanter, you can pick it up from this book if you're a fast learner (don't expect a lot of hand holding in learning Incanter though).

Similarly, I'd say you had best be familiar with how to do data analysis as a discipline in itself.  If you don't know whether to do clustering or regression, or whether to use a SOM or K-means, this book won't teach it to you.

Also, as a reference book, it is not comprehensive.  For example, as far as neural networks go, it only includes self-organizing maps.  There are no other kinds mentioned.  If you want another kind of neural network, you best know where to look for another Java or Clojure library.

Even with all those caveats, I'd still say this is a pretty decent book.  Why?  Because if you have some familiarity of Clojure, played around with Incanter for a bit to learn that library, have taken a class or two of data analysis in university, and aren't expecting a lot of hand holding from the book, then this book is a great guide to start you off on the road to doing data analysis with Clojure, Incanter, Weka, OpenCL, Cascalog, etc.

No comments: