The command line is a powerful tool for data transformations. We've discussed some CLI tools already such as grep that can be used to transform data. Let's delve into a few more.
csvlook is part of csvkit that we installed earlier. It allows to "pretty print" a csv file in the command line.
Here is an example from an old FiveThirtyEight article on Alcohol Consumption.
curl -s 'https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv' | csvlookFor very long or very wide CSV files, you can pipe the output of csvlook into less.
curl -s 'https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv' | csvlook | lessNote how grep leaves out the header of the CSV. As part of csvkit, there's a version of grep specific to csvs: csvgrep. This allows 1) to grep the contents of a single column and 2) to view the header after grepping.
curl -s 'https://raw.githubusercontent.com/fivethirtyeight/data/master/alcohol-consumption/drinks.csv' | csvgrep -c 'country' -m Francein2csv is a data conversion tool built into CSVKit.
https://csvkit.readthedocs.io/en/1.0.2/scripts/in2csv.html
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | in2csv -f json -k members
Note how you have to specify a toplevel key.
jq is a Command-line JSON processor. Here are a few examples using the superheroes.json dataset.
If you don't have jq installed, run brew install jq on macOS and sudo apt-get install jq on Ubuntu.
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jqcurl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq '.members'curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq -r '.members[].name'curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq -r '.members[].powers[]'
curl -s "https://mdn.github.io/learning-area/javascript/oojs/json/superheroes.json" | jq -r '.members[] | [.name, .secretIdentity, .age] | @csv'