Skip to content

Terminal transcript #3

@Nemi92

Description

@Nemi92

alone. (The ASCII tab character should also be included for good
measure in a production script.)

At this point, we have data consisting of words separated by blank
space. The words only contain alphanumeric characters (and the
underscore). The next step is break the data apart so that we have one
word per line. This makes the counting operation much easier, as we
will see shortly.

 $ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \n' |
 > tr -s ' ' '\n' | ...

This command turns blanks into newlines. The ‘-s’ option squeezes
multiple newline characters in the output into just one, removing blank lines. (The ‘>’ is the shell’s “secondary prompt.” This is what the
shell prints when it notices you haven’t finished typing in all of a
command.)

We now have data consisting of one word per line, no punctuation, all
one case. We’re ready to count each word:

 $ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \n' |
 > tr -s ' ' '\n' | sort | uniq -c | ...

At this point, the data might look something like this:

      60 a
       2 able
       6 about
       1 above
       2 accomplish
       1 acquire
       1 actually
       2 additional

The output is sorted by word, not by count! What we want is the most
frequently used words first. Fortunately, this is easy to accomplish,
with the help of two more ‘sort’ options:

‘-n’
do a numeric sort, not a textual one

‘-r’
reverse the order of the sort

The final pipeline looks like this:

 $ tr '[:upper:]' '[:lower:]' < whats.gnu | tr -cd '[:alnum:]_ \n' |
 > tr -s ' ' '\n' | sort | uniq -c | sort -n -r
 ⊣    156 the
 ⊣     60 a
 ⊣     58 to
 ⊣     51 of
 ⊣     51 and
 ...

Whew! That’s a lot to digest. Yet, the same principles apply. With
six commands, on two lines (really one long one split for convenience),
we’ve created a program that does something interesting and useful, in
much less time than we could have written a C program to do the same
thing.

A minor modification to the above pipeline can give us a simple
spelling checker! To determine if you’ve spelled a word correctly, all
you have to do is look it up in a dictionary. If it is not there, then
chances are that your spelling is incorrect. So, we need a dictionary.
The conventional location for a dictionary is ‘/usr/share/dict/words’.

Now, how to compare our file with the dictionary? As before, we
generate a sorted list of words, one per line:

-----Info: (coreutils)Putting the tools together, 317 lines --60%------------------

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions