14 May 2015

Getting sidelined: Computing Science vs Engineering degrees

There are those down in Silicon Valley who are telling teens and youngsters to go into “Real” Engineering™ instead of Computing Science in university, claiming that or else they will get sidelined or “passed over” when they start developing their careers.  They are surely well meaning in advising that, but...

Engineers: please stop telling people what to do with their own lives.

This is speaking to the Computer, Electrical, or so-called Software Engineers out there, but really, this applies to those in any engineering field, e.g. including Petroleum Engineers.

Was there a course in Engineering School that taught you to propagate the Cult of the Engineer?  Did you Silicon Valley types ever stop to think that maybe telling people what to do by appealing to your having a job in “Silicon Valley” is just an appeal to authority (by association to a geographic region)?  It’s no more convincing than appealing to having a job on Wall Street, or being an American (in a more international context).

Who do these engineers think they work for anyway?  Well, why don’t we have a look at who many of these Engineers™ work for and what their educational background is (according to Wikipedia anyway, for what that’s worth...):

27 March 2015

Lynda.com for learning computing science as high school students: a review

Recently, I had a chance to spend some time reviewing Lynda.com's offerings on various computing science topics.  The review here is geared towards using Lynda.com with high school students as the target audience.  I thought I'd share a few observations.

Lynda.com is ok for what it is, that is, as a quick introduction to topics for what seems to be its intended audience of busy professionals working in the field.  At least I think that's their target audience.

For high school students, however, especially for beginner, intermediate, or even middle to lower level advanced high school students, Lynda.com's offerings for computing science topics are simply inadequate, missing curricular connections, and uses pedagogy ill-fitting for students in the high school age range.  That's not really a bad thing for Lynda.com, as I don't think high school students are their target audience at all.

15 October 2014

The last TrueCrypt 7.1a: cross checking hashes

TrueCrypt has had a good run, and the latest version was intentionally crippled so users can only read from but not otherwise use TrueCrypt volumes anymore.  Internet drama aside, some people have hosted the last full-featured TrueCrypt version 7.1a.

When downloading such security software, you should always check the source of what you download, and this is especially the case with the aftermath of this particular incident.  So I list below some sources to cross check the last version of TrueCrypt.

Open Crypto Audit Project started as a crowd funded way to get a full audit done on TrueCrypt.  They have posted a link to a verified source and binary repository on GitHub under the AuditProject account.  OCAP is explained a bit in this ArsTechnica article, noting that Thomas Ptacek is running Phase II, and Ptacek describes a bit more on this HN thread.

Of course, if you check the files from that repository against the hashes hosted on that repository, you'd expect them to match, even if it was maliciously set up.  So let's cross check with other, hopefully independent and trusted, sources.

Gibson Research Corp. has hosted a TrueCrypt Final Release Repository as well.  Gibson notes the same issue noted above, that you cannot check files against hashes hosted from the same location.  He references a PGP signed file of hashes hosted at Defuse Security.

As discussed on another HN thread, TCnext is hosting another TrueCrypt repository. TCnext refers to a set of "Independent" hashes hosted by German IT-News Golem.de.

At the time of writing, in cross-referencing all hashes mentioned above, the SHA256 hashes were all identical.  Further, the source and binaries hosted by AuditProject on GitHub matches against those SHA256 hashes.  You should check for yourself when you download them, of course.

11 September 2014

Installing Lubuntu 14.04 LTS with Full Disk Encryption

I'm going to walk through, complete with screenshots, my installing Lubuntu 14.04 LTS, a modified distribution of Ubuntu Linux that uses the lightweight LXDE desktop environment and OpenBox window manager.

The last time I installed a fresh copy of Ubuntu was probably when I wrote up Installing Windows 7 & Ubuntu UNR side-by-side on Dell Mini more than four years ago.  Before that, I installed Ubuntu on a desktop, which I upgraded to Lubuntu by installing the required packages but without uninstalling any of the Unity shell items from Ubuntu.

With the new LTS release of Lubuntu, I felt it's ready for conservative users like myself to install.  LTS means it has three years of long term support, which means I don't have to do any major upgrades for at least that long (of course, normal minor upgrades from week to week is still necessary).

A fresh install gives us a chance to clear out the cobwebs, idle packages that were installed but is no longer needed by us, etc.  It also gives us a chance to install it with full disk encryption (FDE), which was available before, but didn't seem quite ready for prime time for conservative users.

Let's begin!

31 August 2014

Haskell Data Analysis Cookbook - a Book Review

As with my previous post, Clojure Data Analysis Cookbook - a Book Review, I was this time offered to review Haskell Data Analysis Cookbook by Nishant Shukla.  First impressions: those are two very similar and related books that have some overlapping ideas, but not only are the programming languages used totally different in "genre", the content itself also cover some different data analysis grounds and could be treated as complementary books in that way.

The book itself is very example oriented (much like the Clojure Data Analysis Cookbook), basically being a collection of code recipes for accomplishing various common tasks for data analysis.  It does give you some quick explanations of why and what else to "see also".

It gives you recipes to take in raw data in the form of CSV, JSON, XML, or whatever, including data that lives on web servers (via HTTP GET or POST requests).  Then there are recipes to build up datasets in MongoDB, or SQLite databases.  To recipes to clean up that data, do analysis (e.g. clustering with k-means), to visualizing, presenting, and exporting that analysis.

Each recipe is more or less self-contained, without much in building on top of previous recipes.  It makes the book more "random access".  It's less a book to read through cover to cover, and more of a handy reference to use by full-text searching for key terms, clicking on the relevant topic in the table of contents, or by looking up terms in the index.  It's definitely a book I'd rather have as a PDF ebook so that I can access it anywhere in the world, and so I can do full-text search in.  It does come in Mobi as well as ePub formats, and code samples are provided in a separate zipped download as well.

Having said that, you can tell whether a book was made to be seriously used as a reference or not by looking at its index.  There are 9 pages of indices, equivalent to about 2.9% of the number of pages previous to the index.  This book can certainly be used as a reference.

As a reference book, it's great for people who have already a familiarity with Haskell in general.  If you don't know Haskell, this book won't teach it to you.  That is, unfortunately, possibly a missed marketing opportunity, as those who don't know Haskell (but have knowledge of another programming language) really only needs a small bit to understand enough of how functions are written in Haskell to pick up what's going on in the book.  This means if you know another programming language, know a bit about data analysis, you could use this book to learn some Haskell so long as you pick up the basic syntax with another tutorial in hand (so it's really not a show stopper to using this book).

Similarly, I'd say you had best be familiar with how to do data analysis as a discipline in itself.  If you don't know whether to do clustering or regression, or whether to use a K-NN or K-means, this book won't teach it to you.

Much of that is, of course, echoing the Clojure Data Analysis Cookbook.  Where the Haskell Data Analysis Cookbook differs, makes the two books have a set of complementary ideas.  Whereas both books talk about concurrency and parallelism, the Clojure DAC goes into those topics (including distributed computing) in much more detail.

On the other hand, whereas both books talk about preparing and processing data (prior to performing statistics or machine learning on it), the Haskell DAC goes into much more detail on topics like processing strings with more advanced algorithms (as in computing the Jaro-Winkler distance between strings, not like doing substring/concat operations), computing hashes and using bloom filters, and working with trees and graphs (as in node-and-link graph theory graphs, not grade-school bar graphs).

So in some sense, the Haskell Data Analysis Cookbook has more theory heavy topics (graphs and trees!), whilst the Clojure Data Analysis Cookbook has more "engineering" topics (concurrency, parallelism, and distributed computing).

Neither books are comprehensive treatise on the topic, but someone who needs a practical refresher on working with graphs and trees may find Haskell Data Analysis Cookbook to be quite useful.

All in all, I'd say this is a decent book, because if you have some familiarity of Haskell, have some familiarity with some of the basic technologies like JSON, MongoDB, or SQLite, have taken a class or two of data analysis or machine learning in university (or a MOOC?), and aren't expecting a lot of hand holding from the book, then this book is a great guide to start you off to doing some data analysis with Haskell.