|
38 | 38 | Array.from(document.querySelectorAll('#sidebar a')).forEach(function(link) { |
39 | 39 | link.setAttribute('tabIndex', sidebar === 'visible' ? 0 : -1); |
40 | 40 | }); |
41 | | - </script><div id=content class=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=uniq><a class=header href=#uniq>uniq</a></h1><p>The <code>uniq</code> command identifies similar lines that are adjacent to each other. There are various options to help you filter unique or duplicate lines, count them, group them, etc.<h2 id=retain-single-copy-of-duplicates><a class=header href=#retain-single-copy-of-duplicates>Retain single copy of duplicates</a></h2><p>This is the default behavior of the <code>uniq</code> command. If adjacent lines are the same, only the first copy will be displayed in the output. Unlike <code>sort</code>, the <code>uniq</code> command doesn't have to read the entire input since it compares only the lines that are next to each other.<pre><code class=language-bash># uniq will add a newline even if not present for the last input line |
| 41 | + </script><div id=content class=content><main><div class=sidetoc><nav class=pagetoc></nav></div><h1 id=uniq><a class=header href=#uniq>uniq</a></h1><p>The <code>uniq</code> command identifies similar lines that are adjacent to each other. There are various options to help you filter unique or duplicate lines, count them, group them, etc.<h2 id=retain-single-copy-of-duplicates><a class=header href=#retain-single-copy-of-duplicates>Retain single copy of duplicates</a></h2><p>This is the default behavior of the <code>uniq</code> command. If adjacent lines are the same, only the first copy will be displayed in the output.<pre><code class=language-bash># only the adjacent lines are compared to determine duplicates |
| 42 | +# which is why you get 'red' twice in the output for this input |
42 | 43 | $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq |
43 | 44 | red |
44 | 45 | green |
45 | 46 | red |
46 | 47 | blue |
47 | | -</code></pre><p>If you want to retain only a single copy based on the entire input contents, one option is to sort the input before applying <code>uniq</code>. Or, use <code>sort -u</code> if applicable.<pre><code class=language-bash># same as sort -u for this case |
| 48 | +</code></pre><p>You'll need sorted input to make sure all the input lines are considered to determine duplicates. For some cases, <code>sort -u</code> is enough, like the example shown below:<pre><code class=language-bash># same as sort -u for this case |
48 | 49 | $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | sort | uniq |
49 | 50 | blue |
50 | 51 | green |
51 | 52 | red |
52 | | -</code></pre><p>Sometimes though, you want to sort based on some specific criteria but identify duplicates based on the entire line contents. <code>uniq</code> will help in such cases.<pre><code class=language-bash># can't use sort -n -u here |
| 53 | +</code></pre><p>Sometimes though, you may need to sort based on some specific criteria and then identify duplicates based on the entire line contents. Here's an example:<pre><code class=language-bash># can't use sort -n -u here |
53 | 54 | $ printf '2 balls\n13 pens\n2 pins\n13 pens\n' | sort -n | uniq |
54 | 55 | 2 balls |
55 | 56 | 2 pins |
56 | 57 | 13 pens |
57 | | -</code></pre><p>If you need to preserve input order, use alternatives like <code>awk</code>, <code>perl</code> and <code>huniq</code>.<pre><code class=language-bash># retain single copy of duplicates, maintain input order |
| 58 | +</code></pre><blockquote><p><img src=./images/info.svg alt=info> <code>sort+uniq</code> won't be suitable if you need to preserve the input order as well. You can use alternatives like <code>awk</code>, <code>perl</code> and <a href=https://github.com/koraa/huniq>huniq</a> for such cases.</blockquote><pre><code class=language-bash># retain single copy of duplicates, maintain input order |
58 | 59 | $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | awk '!seen[$0]++' |
59 | 60 | red |
60 | 61 | green |
|
83 | 84 | toothpaste |
84 | 85 | washing powder |
85 | 86 |
|
| 87 | +# just a reminder that uniq works based on adjacent lines only |
86 | 88 | $ printf 'red\nred\nred\ngreen\nred\nblue\nblue' | uniq -u |
87 | 89 | green |
88 | 90 | red |
|
131 | 133 | 1 toothpaste |
132 | 134 | 1 soap |
133 | 135 | </code></pre><h2 id=ignoring-case><a class=header href=#ignoring-case>Ignoring case</a></h2><p>Use the <code>-i</code> option to ignore case while determining duplicates.<pre><code class=language-bash># depending on your locale, sort and sort -f can give the same results |
134 | | -$ printf 'cat\nbat\nCAT\ncar\nbat\n' | sort -f | uniq -iD |
| 136 | +$ printf 'cat\nbat\nCAT\ncar\nbat\nmat\nmoat' | sort -f | uniq -iD |
135 | 137 | bat |
136 | 138 | bat |
137 | 139 | cat |
138 | 140 | CAT |
139 | | -</code></pre><h2 id=partial-match><a class=header href=#partial-match>Partial match</a></h2><p><code>uniq</code> has three options to change the matching criteria to partial parts of the input line. These aren't as powerful as the <code>sort -k</code> option, but they do come in handy for some use cases.<p>The <code>-f</code> option allows you to skip first <code>N</code> fields. Field separation is based on one or more space/tab characters only. Note that these separators will still be part of the field contents, so this will not work with variable number of blanks.<pre><code class=language-bash># skip first field |
| 141 | +</code></pre><h2 id=partial-match><a class=header href=#partial-match>Partial match</a></h2><p><code>uniq</code> has three options to change the matching criteria to partial parts of the input line. These aren't as powerful as the <code>sort -k</code> option, but they do come in handy for some use cases.<p>The <code>-f</code> option allows you to skip first <code>N</code> fields. Field separation is based on one or more space/tab characters only. Note that these separators will still be part of the field contents, so this will not work with variable number of blanks.<pre><code class=language-bash># skip first field, works as expected since no. of blanks is consistent |
140 | 142 | $ printf '2 cars\n5 cars\n10 jeeps\n5 jeeps\n3 trucks\n' | uniq -f1 --group |
141 | 143 | 2 cars |
142 | 144 | 5 cars |
|
147 | 149 | 3 trucks |
148 | 150 |
|
149 | 151 | # example with variable number of blanks |
150 | | -$ printf '2 cars\n5 cars\n10 jeeps\n5 jeeps\n3 trucks\n' | uniq -f1 |
| 152 | +# 'cars' entries were identified as duplicates, but not 'jeeps' |
| 153 | +$ printf '2 cars\n5 cars\n1 jeeps\n5 jeeps\n3 trucks\n' | uniq -f1 |
151 | 154 | 2 cars |
152 | | -10 jeeps |
| 155 | +1 jeeps |
153 | 156 | 5 jeeps |
154 | 157 | 3 trucks |
155 | 158 | </code></pre><p>The <code>-s</code> option allows you to skip first <code>N</code> characters (calculated as bytes).<pre><code class=language-bash># skip first character |
156 | 159 | $ printf '* red\n* green\n- green\n* blue\n= blue' | uniq -s1 |
157 | 160 | * red |
158 | 161 | * green |
159 | 162 | * blue |
160 | | -</code></pre><p>The <code>-w</code> option allows you to specify a maximum of <code>N</code> characters to be used for comparison (calculated as bytes).<pre><code class=language-bash># compare only first 2 characters |
| 163 | +</code></pre><p>The <code>-w</code> option restricts the comparison to the first <code>N</code> characters (calculated as bytes).<pre><code class=language-bash># compare only first 2 characters |
161 | 164 | $ printf '1) apple\n1) almond\n2) banana\n3) cherry' | uniq -w2 |
162 | 165 | 1) apple |
163 | 166 | 2) banana |
164 | 167 | 3) cherry |
165 | | -</code></pre><p>When these options are used simultaneously, the priority is <code>-f</code> first, then <code>-s</code> and then <code>-w</code> option. Remember that blanks are part of the field content.<pre><code class=language-bash># skip first field |
| 168 | +</code></pre><p>When these options are used simultaneously, the priority is <code>-f</code> first, then <code>-s</code> and finally <code>-w</code> option. Remember that blanks are part of the field content.<pre><code class=language-bash># skip first field |
166 | 169 | # then skip first two characters (including the blank character) |
167 | 170 | # use next two characters for comparison ('bl' and 'ch' in this example) |
168 | 171 | $ printf '2 @blue\n10 :black\n5 :cherry\n3 @chalk' | uniq -f1 -s2 -w2 |
|
0 commit comments