|
| 1 | +<!DOCTYPE html> |
| 2 | +<html> |
| 3 | +<head> |
| 4 | + <title>Caltech Library's Digital Library Development Sandbox</title> |
| 5 | + <link href='https://fonts.googleapis.com/css?family=Open+Sans' rel='stylesheet' type='text/css'> |
| 6 | + <link rel="stylesheet" href="/css/site.css"> |
| 7 | +</head> |
| 8 | +<body> |
| 9 | +<header> |
| 10 | +<a href="http://library.caltech.edu"><img src="/assets/liblogo.gif" alt="Caltech Library logo"></a> |
| 11 | +</header> |
| 12 | +<nav> |
| 13 | +<ul> |
| 14 | +<li><a href="/">Home</a></li> |
| 15 | +<li><a href="./">README</a></li> |
| 16 | +<li><a href="license.html">LICENSE</a></li> |
| 17 | +<li><a href="install.html">INSTALL</a></li> |
| 18 | +<li><a href="docs/">Documentation</a></li> |
| 19 | +<li><a href="how-to/">How To</a></li> |
| 20 | +<li><a href="https://github.com/caltechlibrary/datatools">Github</a></li> |
| 21 | +</ul> |
| 22 | + |
| 23 | +</nav> |
| 24 | + |
| 25 | +<section> |
| 26 | +<h1>Action Items</h1> |
| 27 | + |
| 28 | +<h2>Bug</h2> |
| 29 | + |
| 30 | +<h2>Next</h2> |
| 31 | + |
| 32 | +<ul> |
| 33 | +<li>[ ] csvrows would output a range of rows (e.g. [2:] would be all rows but the first row)</li> |
| 34 | +<li>[ ] csv utilities to support integer ranges notation for columns and rows references, E.g. “1,3:4,7,10:” or all</li> |
| 35 | +</ul> |
| 36 | + |
| 37 | +<h2>Someday, Maybe</h2> |
| 38 | + |
| 39 | +<ul> |
| 40 | +<li>[ ] finddir should have an option to exclude directories (e.g. exclude .git directories from a listing)</li> |
| 41 | +<li>[ ] textscraper - a tool for select out text and storing it as a JSON field value, sort grep plus sed cleanup and semi-structured text (e.g. webpage) |
| 42 | + |
| 43 | +<ul> |
| 44 | +<li>look at how cut, sed, grep are commonly used in my scripts and merge that functionality into a single tool</li> |
| 45 | +</ul></li> |
| 46 | +<li>[ ] csvcols, csvrows should have a length option to give you a number of columns or rows respectively</li> |
| 47 | +<li>[ ] csvcols, csvrows should have a filter option to filter to support filting output conditionally</li> |
| 48 | +<li>[ ] csvsort should allow a multi-column sort respecting column headings |
| 49 | + |
| 50 | +<ul> |
| 51 | +<li>plus column number would be ascending by that column</li> |
| 52 | +<li>minos column number would be descending by that column</li> |
| 53 | +<li>sort would be read from left to right</li> |
| 54 | +<li>it would be good to include support for column names and not just column numbers to describe the sort</li> |
| 55 | +</ul></li> |
| 56 | +<li>[ ] jsonmodify takes a JSON document, a dotpath and value then creates/updates the dotpath in the JSON document with the new value |
| 57 | + |
| 58 | +<ul> |
| 59 | +<li>”(delete DOTPATH)” would remove the property described by the dotpath</li> |
| 60 | +<li>”(update DOTPATH NEW_VALUE)” would replace the property described by the dotpath with a new value (value can be a string, number, or JSON)</li> |
| 61 | +<li>”(create” DOTPATH NEW_VALUE)” would add a new property at the described dotpath with a new value (value can be a string, number, or JSON)</li> |
| 62 | +<li>”(join DOTH_PATH SEP)” combines JSON array elements into a string version using separator</li> |
| 63 | +<li>”(concat DOTPATH1 DOTPATH2… SEP)” combines values into a concatenated string, it takes one or more dotpath values (must be string or number) and return them as a concatenated value (concat .last_name .first_name “, “) would return a last name comma first name string.</li> |
| 64 | +<li>”(split DOTH_PATH SEP)” turns a string into an array of strings using separator</li> |
| 65 | +</ul></li> |
| 66 | +<li>[ ] csvcols, csvrows should have a filter mechanism should provide a mechanism to filter by column or row |
| 67 | + |
| 68 | +<ul> |
| 69 | +<li>using a prefix notation (e.g. ‘(and (eq (join (cols (colNo “Last Name”) (colNo “First Name”)) “, “) “Doiel, R. S.”) (gt (cols 4) “2017-06-12”))’)</li> |
| 70 | +</ul></li> |
| 71 | +<li>[ ] csvfind, csvjoin should have an inverted match operation</li> |
| 72 | +<li>[ ] a range should accept the word “all” as well as comma delimited list of rows and ranges</li> |
| 73 | +<li>[ ] Add -uuid and -skip-header-row options constistantly to all csv tools |
| 74 | + |
| 75 | +<ul> |
| 76 | +<li>[ ] csvcols</li> |
| 77 | +</ul></li> |
| 78 | +<li>[ ] unify the options vocabulary to work the same between each cli |
| 79 | + |
| 80 | +<ul> |
| 81 | +<li>Need a common approach to column ranges in csvcols, csvfind, csvjoin</li> |
| 82 | +<li>csv2json, csv2mdtable, csv2xlsx should accept a column and row range option for output</li> |
| 83 | +</ul></li> |
| 84 | +<li>[ ] csvfind add filter by row number (helpful when combined with csvcols for snapshotting the middle of a table)</li> |
| 85 | +<li>[ ] csv2json should have an option that will include a row number in JSON blob output</li> |
| 86 | +<li>[ ] csv2json should have the options to normalize property names in JSON objects |
| 87 | + |
| 88 | +<ul> |
| 89 | +<li>camel case</li> |
| 90 | +<li>snake case</li> |
| 91 | +<li>lower case/upper case</li> |
| 92 | +<li>space to underscores</li> |
| 93 | +<li>strip punctuation</li> |
| 94 | +<li>rename keys</li> |
| 95 | +</ul></li> |
| 96 | +<li>[ ] csvrotate would take a CSV file as import and output columns as rows</li> |
| 97 | +<li>[ ] smartcat would function like cat but with support for ranges of lines (e.g. show me last 20 lines: smartcat -start=0 -end=“-20” file.txt; cat starting with 10th line: smartcat -start=10 file.txt) |
| 98 | + |
| 99 | +<ul> |
| 100 | +<li>[ ] allow prefix line number with a specific delimiter (E.g. comma would let you cat a CSV file adding row numbers as first column)</li> |
| 101 | +<li>[ ] show lines with prefix, suffix, containing or regxp</li> |
| 102 | +<li>[ ] show lines without prefix, suffix, containing or regexp</li> |
| 103 | +</ul></li> |
| 104 | +</ul> |
| 105 | + |
| 106 | +<h2>Completed</h2> |
| 107 | + |
| 108 | +<ul> |
| 109 | +<li>[x] csvcols -col option should not be a boolean, it should take a range like other csv cli</li> |
| 110 | +<li>[x] utilities should use starting index of 1 instead of zero as humans refer to column 1 when intending to work on the first column</li> |
| 111 | +<li>[x] for all cli the -delimiter option should support special characters like \t, \n, \r</li> |
| 112 | +<li>[x] csvfind would accept CSV input from stdin and output rows with matching column values |
| 113 | + |
| 114 | +<ul> |
| 115 | +<li>E.g. <code>cat file1.csv | csvfind -levenshtein -stop-words="the:a:of" -col=1 "This Red Book of West March"</code></li> |
| 116 | +<li>E.g. <code>cat file1.csv | csvfind -inverted -levenstein -stop-words="the:a:of" -col=1 "This Red Book of West March"</code></li> |
| 117 | +<li>E.g. <code>cat file1.csv | csvfind -contains -col=1 "Red Book"</code></li> |
| 118 | +</ul></li> |
| 119 | +<li>[x] csvjoin should have option for fuzzy match on columns (e.g. comparing titles)</li> |
| 120 | +</ul> |
| 121 | + |
| 122 | +</section> |
| 123 | + |
| 124 | +<footer> |
| 125 | +<span><h1><A href="http://caltech.edu">Caltech</a></h1></span> |
| 126 | +<span>© 2017 <a href="https://www.library.caltech.edu/copyright">Caltech library</a></span> |
| 127 | +<address>1200 E California Blvd, Mail Code 1-32, Pasadena, CA 91125-3200</address> |
| 128 | +<span>Phone: <a href="tel:+1-626-395-3405">(626)395-3405</a></span> |
| 129 | +<span><a href=" mailto:[email protected]" >Email Us </a></span> |
| 130 | +<a class="cl-hide" href="sitemap.xml">Site Map</a> |
| 131 | +</footer> |
| 132 | +</body> |
| 133 | +</html> |
0 commit comments