31 August 2014

Haskell Data Analysis Cookbook - a Book Review

As with my previous post, Clojure Data Analysis Cookbook - a Book Review, I was this time offered to review Haskell Data Analysis Cookbook by Nishant Shukla.  First impressions: those are two very similar and related books that have some overlapping ideas, but not only are the programming languages used totally different in "genre", the content itself also cover some different data analysis grounds and could be treated as complementary books in that way.


The book itself is very example oriented (much like the Clojure Data Analysis Cookbook), basically being a collection of code recipes for accomplishing various common tasks for data analysis.  It does give you some quick explanations of why and what else to "see also".

It gives you recipes to take in raw data in the form of CSV, JSON, XML, or whatever, including data that lives on web servers (via HTTP GET or POST requests).  Then there are recipes to build up datasets in MongoDB, or SQLite databases.  To recipes to clean up that data, do analysis (e.g. clustering with k-means), to visualizing, presenting, and exporting that analysis.

Each recipe is more or less self-contained, without much in building on top of previous recipes.  It makes the book more "random access".  It's less a book to read through cover to cover, and more of a handy reference to use by full-text searching for key terms, clicking on the relevant topic in the table of contents, or by looking up terms in the index.  It's definitely a book I'd rather have as a PDF ebook so that I can access it anywhere in the world, and so I can do full-text search in.  It does come in Mobi as well as ePub formats, and code samples are provided in a separate zipped download as well.

Having said that, you can tell whether a book was made to be seriously used as a reference or not by looking at its index.  There are 9 pages of indices, equivalent to about 2.9% of the number of pages previous to the index.  This book can certainly be used as a reference.

As a reference book, it's great for people who have already a familiarity with Haskell in general.  If you don't know Haskell, this book won't teach it to you.  That is, unfortunately, possibly a missed marketing opportunity, as those who don't know Haskell (but have knowledge of another programming language) really only needs a small bit to understand enough of how functions are written in Haskell to pick up what's going on in the book.  This means if you know another programming language, know a bit about data analysis, you could use this book to learn some Haskell so long as you pick up the basic syntax with another tutorial in hand (so it's really not a show stopper to using this book).

Similarly, I'd say you had best be familiar with how to do data analysis as a discipline in itself.  If you don't know whether to do clustering or regression, or whether to use a K-NN or K-means, this book won't teach it to you.

Much of that is, of course, echoing the Clojure Data Analysis Cookbook.  Where the Haskell Data Analysis Cookbook differs, makes the two books have a set of complementary ideas.  Whereas both books talk about concurrency and parallelism, the Clojure DAC goes into those topics (including distributed computing) in much more detail.

On the other hand, whereas both books talk about preparing and processing data (prior to performing statistics or machine learning on it), the Haskell DAC goes into much more detail on topics like processing strings with more advanced algorithms (as in computing the Jaro-Winkler distance between strings, not like doing substring/concat operations), computing hashes and using bloom filters, and working with trees and graphs (as in node-and-link graph theory graphs, not grade-school bar graphs).

So in some sense, the Haskell Data Analysis Cookbook has more theory heavy topics (graphs and trees!), whilst the Clojure Data Analysis Cookbook has more "engineering" topics (concurrency, parallelism, and distributed computing).

Neither books are comprehensive treatise on the topic, but someone who needs a practical refresher on working with graphs and trees may find Haskell Data Analysis Cookbook to be quite useful.

All in all, I'd say this is a decent book, because if you have some familiarity of Haskell, have some familiarity with some of the basic technologies like JSON, MongoDB, or SQLite, have taken a class or two of data analysis or machine learning in university (or a MOOC?), and aren't expecting a lot of hand holding from the book, then this book is a great guide to start you off to doing some data analysis with Haskell.

15 August 2014

Java has deep expression problem for beginning students

There are many problems with Java as the first programming language to teach students if we wish to provide the most effective learning experience.  I've written on this in Learn Python instead of Java as your first language in the past even.  So what now?

Newbie, meet the Expression Problem

Stuart Sierra provides a very lucid explanation of the Expression Problem, a classic problem in software programming, in Solving the Expression Problem with Clojure 1.2.  Needless to say, Clojure provides a very clean solution.

Java, however, is a quagmire and requires some heavy OOP software engineering concepts to solve the Expression Problem.  One wouldn't ordinarily think this has anything to do with beginning students just learning to program though, but it does, and here's how.

Imagine our beginning student, "Sam", starts to learn Java and eventually starts to write a classic game of asteroids.  Sam plugs away and gets a decent game of a single player ship shooting lasers at one kind of asteroids to begin working.  Not bad!  But Sam wants to do more.  Sam wants to not just have one kind of (big) asteroids, he also wants to have small asteroids to shoot at.

Alright, so Sam begins to modify the BigAsteroids class to also be able to represent a smaller sized kind of asteroids.  The teacher catches wind of this and tells Sam, "no, that's not good", and that Sam needs to use OOP principles to write a different class for SmallAsteroids.

Now most students would say "why, Mr. Teach", my way works.  But Sam is a good student and does as he's told.

So Sam goes and creates a second class for SmallAsteroids.  Except his program was built presuming that the only things to draw, to shoot lasers at, and to move around, were BigAsteroids.  None of those methods he wrote to draw, to shoot lasers at, and to move around BigAsteroids work for SmallAsteroids.  hmm...  Welcome to the Expression Problem, Sam.

22 August 2013

Photo album sync fail on iPad with Mac

I'm not usually one to blog complaints about products, but this seems outrageous.  Using the built-in Photo app on an iPad, I've got a bunch of photos organized into albums.  You can download the photos to a Mac in a mass download using the Mac's Image Capture program, or using Mac's iPhoto.  You can backup the iPad's Photo and albums using iTunes, but the backups are inaccessible on the Mac as albums, photos, or files, just pure backup to be reloaded to an iPad in case of "emergency" I guess.  But you cannot download the photo albums to the Mac from the iPad for further use or organization.

Apparently, you cannot download photo albums created on any iOS devices to a Mac for further use or organization at all, and it's been this way since forever.  There is a third-party app called "Phone View" that reportedly can let you do this.

But really?  A simple feature like syncing albums in an Apple built-in app requires a third-party solution?  And it's not obvious when using the beautifully created Apple Photo app on the iPad that album sync was not possible, luring unsuspecting users into creating albums that could not be synced.

This is an especially sad situation for users on iPads or other iOS devices on which they don't have access to install apps.  Where does that ever happen?  Well, corporate and school based usage comes to mind.

20 August 2013

Where are the Canadian STEM students?

A while back I wrote with some details of Canadian educational attainment in Inflated expectations: Are students living in a dream? I've since become interested in the numbers of the so-called STEM fields (Science, Technology, Engineering, and Math) and thought I'd write a supplement to the old post.

Let's start with a highlight that stood out from the old post first. Based on the 2006 Canadian census, and focusing in on the 20 to 24 year old cohort: roughly 78% or so (based on statistics from ACT in Spring 2004, regarding the USA) of high school students expected to get a college or higher degree, but only 35.9% of the above cohort actually got anything of the sort.

I think the 2011 census is now available, but to keep the comparison focused on 2006 (I don't want to redo the old post), I will continue to cite numbers from the 2006 Census [1].

Looking at just the Canadian population between the ages of 20 to 24 years, by highest certificate, diploma or degree attained:

type of highest certificate, diploma or degree attainedtotal [2]% of cohort size
20-24 cohort, total size2,071,895100%
only a high school certificate or equivalent889,27542.9%
no certificate, diploma or degree at all286,05013.8%
some kind of post-secondary qualification (trades certificate, diploma below bachelor level, PhD, etc.)896,57543.3%
some kind of post-secondary qualification (trades certificate, diploma below bachelor level, PhD, etc.) in STEM fields258,30512.5%
some kind of university certificate, diploma or degree344,79516.6%
some kind of university certificate, diploma or degree in STEM fields87,0004.2%

Look at that, unemployment notwithstanding, the education "system" converted only 4.2% of the 20 to 24 cohort into STEM university credentialed workers after a long arduous process for the students involved.

Sure, the STEM university credentialed group represents 25.2% of all those who attained any university credentials at all, but it turns out in the context of the entire cohort, it's just a drop in the bucket. That is seriously concerning especially from an economics policy standpoint.

According to University Completion, "It has long been argued...university graduates, as a group, earn more, on average, than college graduates do", which might imply we ought to "sell" more university education, any university education. "However, recent research suggests the field of study may be more important...One study, for example, found that males with university degrees in academic disciplines—such as the humanities, education, biology, and agriculture science—earned less than half of that earned by males with university degrees in vocational and applied disciplines—such as commerce, medicine, and engineering" (ibid.).

So, not surprisingly, what field you study makes a difference! What's more, "Canadians with scientific degrees tend to earn more. Five years after graduation, engineers earn about $10,000 more annually than fine arts and humanities graduates, and upwards of $5,000 more than social science graduates. These earnings are in line with computer and physical sciences" (Percentage of Graduates in Science, Math, Computer Science, and Engineering).

What the research suggests, then, is that we ought to be "selling" more university education specifically in STEM and commerce fields. It's not just about helping students earn more money after graduation, but as I've noted in the past [3], the very fact the labour market provides greater incentive for a given occupation is evidence there isn't enough people entering that career path in the economy.

So let's get out there and sell more university STEM education!

Easier said than done, of course. Certainly for STEM fields, but to some extent for commerce fields as well (especially economics), a strong basis in math is a pre-requisite to success. That means to get more students into STEM and commerce fields, we may also have to get more students to learn more math and to learn it better as a pre-requisite — and that means high school math.

Therein, I'd argue, lies a big part of the problem with getting more students into STEM fields. It turns out high school math is hard, and getting more students to attain a high-level of math skills is also hard. In fact, "The proportion of Canadian [15 year old] students with high-level mathematics skills dropped slightly between 2003 and 2009" (Students With High-Level Math Skills) — that's six years of stagnation, if not decline.

So where are the Canadian STEM students? Don't be surprised you can't find them: literally over 95% of the 20 to 24 cohort just aren't into STEM.

[1] The numbers in the table were pulled from two Statistics Canada sources: Population 15 years and over by highest certificate, diploma or degree, by age groups (2006 Census), and Major Field of Study - Classification of Instructional Programs, 2000 (13), Highest Postsecondary Certificate, Diploma or Degree (12), Age Groups (10A) and Sex (3) for the Population 15 Years and Over With Postsecondary Studies of Canada, Provinces, Territories, Census Metropolitan Areas and Census Agglomerations, 2006 Census - 20% Sample Data.

[2] I should come clean and say that the numbers are off by five (5). I don't really know why, seeing the numbers are pulled straight from Statistics Canada. I assume it has something to do with the normalization adjustments they do to different tables, or some small error in counting. When the numbers are in the millions, I doubt being off by five is a big deal in this particular case.

[3] See Why push math education onto students?


15 August 2013

2010 MacBook Pro black screen graphics switching bug workaround

Apple 15 inch MacBook Pros from mid 2010 were manufactured with some hardware defects causing it to intermittently freeze or stop displaying video on the built-in or external display. Apple has been replacing affected machines' motherboards, but the problem seems to persist even afterwards, but less frequently.

Any program that makes the system switch graphics processor can trigger the defect.  Typically, for example, the system starts up using the Intel integrated graphics, then you open Chrome, which kicks it over to the Nvidia discrete graphics.  Later, you quit Chrome and when it switches back to the Intel integrated graphics, the screen may now flicker and then go black.

Strangely enough, a workaround exists.  Sleep the computer at least once after starting up.  Once it's gone to sleep at least once, the defeat seems to no longer be triggered upon graphics switching.

Resources:

  1. MacBook Pro (15-inch, Mid 2010): Intermittent black screen or loss of video
  2. Apple acknowledges 2010 MacBook Pro black-screen bug
  3. Lion randomly crashes - black screen
  4. Black screen with flickering on top on MacBook Pro 6,2