|
2 | 2 | datatools |
3 | 3 | ========= |
4 | 4 |
|
5 | | -_datatools_ provides a variety of command line programs for working with |
6 | | -data in different formats as well as to ease Posix shell scripting |
7 | | -(e.g. writing scripts that run under Bash). The tools are group as data, |
8 | | -strings and scripting. |
9 | | - |
10 | | -For data |
11 | | --------- |
12 | | - |
13 | | -Command line utilities for simplifying work with CSV, JSON, TOML, YAML, |
14 | | -Excel Workbooks and plain text files or content. |
15 | | - |
16 | | -+ [csv2json](docs/csv2json/) - a tool to take a CSV file and convert it into a JSON array or a list of JSON blobs one per line |
17 | | -+ [csv2mdtable](docs/csv2mdtable/) - a tool to render CSV as a Github Flavored Markdown table |
18 | | -+ [csv2tab](docs/csv2tab/) - a tool to take a CSV file and convert to tab separated values |
19 | | -+ [csv2xlsx](docs/csv2xlsx/) - a tool to take a CSV file and add it as a sheet to a Excel Workbook |
20 | | -+ [csvcleaner](docs/csvcleaner/) - normalize a CSV file by column and row including trimming spaces and removing comments |
21 | | -+ [csvcols](docs/csvcols/) - a tool for formatting command line arguments into CSV row of columns or filtering CSV rows for specific columns |
22 | | -+ [csvfind](docs/csvfind/) - a tool for filtering a CSV file rows by column |
23 | | -+ [csvjoin](docs/csvjoin/) - a tool to join two CSV files on common values in designated columns, writes combined CSV rows |
24 | | -+ [csvrows](docs/csvrows/) - a tool for formatting command line arguments into CSV columns of rows or filtering CSV for specific rows |
25 | | -+ [json2toml](docs/json2toml/) - a tool for converting JSON to TOML |
26 | | -+ [json2yaml](docs/json2yaml/) - a tool for converting JSON to YAML |
27 | | -+ [jsoncols](docs/jsoncols/) - a tool for exploring and extracting JSON values into columns |
28 | | -+ [jsonjoin](docs/jsonjoin/) - a tool for joining JSON object documents |
29 | | -+ [jsonmunge](docs/jsonmunge/) - a tool to transform JSON documents into something else |
30 | | -+ [jsonrange](docs/jsonrange/) - a tool for iterating over JSON objects and arrays (return keys or values) |
31 | | -+ [tab2csv](docs/tab2csv/) - a tool to convert from tab separated values to comma separated values |
32 | | -+ [toml2json](docs/toml2json/) - a tool for converting TOML to JSON |
33 | | -+ [xlsx2csv](docs/xlsx2csv/) - a tool for converting Excel Workbooks sheets to CSV files |
34 | | -+ [xlsx2json](docs/xlsx2json/) - a tool for converting Excel Workbooks to JSON files |
35 | | -+ [yaml2json](docs/yaml2json/) - a tool for converting YAML files to JSON |
36 | | -+ [codemeta2cff](codemeta2cff.1.html) - a tool to convert a codemeta.json file into a CITATION.cff file. |
37 | | -+ [sql2csv](sql2csv.1.html) - a tool to execute a SQL query in MySQL or SQLIte3 and render the results in CSV encoding |
38 | | - |
39 | | - |
40 | | -Compiled versions are provided for Linux (amd64), Mac OS X (amd64), |
41 | | -Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases. |
| 5 | +_datatools_ is a rich collection of command line programs targetting |
| 6 | +data conversion, cleanup and analysis directly from your favorite |
| 7 | +POSIX shell. It has proven useful for data collaberations where |
| 8 | +individual members of a project may prefer different toolsets in their |
| 9 | +analysis (e.g. Julia, R, Python) but want to work from a common baseline. |
| 10 | +It also has been used intensively for internal reporting from various |
| 11 | +Caltech Library metadata sources. |
| 12 | + |
| 13 | +The tools fall into three broad categories |
| 14 | + |
| 15 | +- data transformation and conversion |
| 16 | +- shell scripting helpers |
| 17 | +- "string", a tool providing the common string operations missing from shell |
| 18 | + |
| 19 | +See [user manual](user-manual.md) for a complete list of the command line |
| 20 | +programs. The data transformation tools include support for formats such as |
| 21 | +Excel XML, csv, tab delimited files, json, yaml and toml. |
| 22 | + |
| 23 | +Compiled versions of the datatools collection are provided for Linux |
| 24 | +(amd64), Mac OS X (amd64), Windows 10 (amd64) and Raspbian (ARM7). |
| 25 | +See https://github.com/caltechlibrary/datatools/releases. |
42 | 26 |
|
43 | 27 | Use "-help" option for a full list of options for each utility (e.g. `csv2json -help`). |
44 | 28 |
|
| 29 | +Data transformation |
| 30 | +------------------- |
| 31 | + |
| 32 | +The tooling around transformation includes data conversion. These |
| 33 | +include tools that work with CSV, tab delimited, JSON, TOML, YAML |
| 34 | +and Excel XML. |
| 35 | + |
| 36 | +There is also tooling to change data shapes using JSON as the |
| 37 | +intermediate data format. |
| 38 | + |
| 39 | +For the shell |
| 40 | +------------- |
| 41 | + |
| 42 | +Various utilities for simplifying work on the command line. |
| 43 | + |
| 44 | ++ [findfile](docs/findfile/) - find files based on prefix, suffix or contained string |
| 45 | ++ [finddir](docs/finddir/) - find directories based on prefix, suffix or contained string |
| 46 | ++ [mergepath](docs/mergepath/) - prefix, append, clip path variables |
| 47 | ++ [range](docs/range/) - emit a range of integers (useful for numbered loops in Bash) |
| 48 | ++ [reldate](docs/reldate/) - display a relative date in YYYY-MM-DD format |
| 49 | ++ [reltime](docs/reltime/) - display a relative time in 24 hour notation, HH:MM:SS format |
| 50 | ++ [timefmt](docs/timefmt/) - format a time value based on Golang's time format language |
| 51 | ++ [urlparse](docs/urlparse/) - split a URL into parts |
| 52 | + |
45 | 53 | For strings |
46 | 54 | ----------- |
47 | 55 |
|
@@ -71,26 +79,6 @@ Some of the features included |
71 | 79 |
|
72 | 80 | See [string](docs/string/) for full details |
73 | 81 |
|
74 | | -For scripting |
75 | | -------------- |
76 | | - |
77 | | -Various utilities for simplifying work on the command line. |
78 | | - |
79 | | -+ [findfile](docs/findfile/) - find files based on prefix, suffix or contained string |
80 | | -+ [finddir](docs/finddir/) - find directories based on prefix, suffix or contained string |
81 | | -+ [mergepath](docs/mergepath/) - prefix, append, clip path variables |
82 | | -+ [range](docs/range/) - emit a range of integers (useful for numbered loops in Bash) |
83 | | -+ [reldate](docs/reldate/) - display a relative date in YYYY-MM-DD format |
84 | | -+ [reltime](docs/reltime/) - display a relative time in 24 hour notation, HH:MM:SS format |
85 | | -+ [timefmt](docs/timefmt/) - format a time value based on Golang's time format language |
86 | | -+ [urlparse](docs/urlparse/) - split a URL into parts |
87 | | - |
88 | | -Compiled versions are provided for Linux (amd64), Mac OS X (amd64), |
89 | | -Windows 10 (amd64) and Raspbian (ARM7). See https://github.com/caltechlibrary/datatools/releases. |
90 | | - |
91 | | -Use the utilities try "-help" option for a full list of options. |
92 | | - |
93 | | - |
94 | 82 | Installation |
95 | 83 | ------------ |
96 | 84 |
|
|
0 commit comments