For example:
(let [data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}])]
($where {:a {:$gt 2}} data))
That will select all rows where given a row, its data under column
:a
is greater than 2.But what if you want to filter the dataset to get rid of certain columns? Say you only want column 0. This can do that:
(let [data (to-dataset [{:a 1 :b 2} {:a 3 :b 4}])]
($ :all 0 data))
What if you want columns 0 and 2? This can do that:
(let [data (to-dataset [{:a 1 :b 2 :c 3} {:a 3 :b 4 :c 5}])]
($ :all [0 2] data))
What if you want to select by column name, e.g. only selecting columns with name
:a
and :b
? This can do that:
(let [data (to-dataset [{:a 1 :b 2 :c 3} {:a 3 :b 4 :c 5}])]
($ :all [:a :b] data))
Here's the tricky one. What if you want to select by column name, but you want to match the column name against a regular expression (say only names containing a vowel)? The
$where
method only lets you select rows of data based on a query — it's a row-wise filtering operation. We want column-wise filtering on column-names.My first attempt involved taking the dataset, turning it into a clojure map, then filtering on the map's keys, and finally turning it back into an Incanter dataset. But doing so ran into a problem in Incanter:
to-dataset
and to-map
are not inverse functions of each other.That is to say, there exists a file of data
*filepath*
that can be read into Incanter with read-dataset
such that this does not work: (to-dataset (to-map (read-dataset *filepath*)))
. For example, if the file is a CSV spreadsheet file with missing data in some cells (not nil, not 0, just no data).Turns out the solution is much easier:
(let [data (to-dataset [{:a 1 :b 2 :c 3} {:a 3 :b 4 :c 5}])
columns (filter #(re-find #"aeiou" (str %)) (:column-names data))]
($ columns data))
The idea is to create a list of column names of the columns you want, then use that list of column names with
$
to select them out of the dataset.Column-wise selection isn't as convenient as selecting row-wise, but this is one way that works.
No comments:
Post a Comment