{"id":1527,"date":"2016-07-07T19:16:16","date_gmt":"2016-07-08T00:16:16","guid":{"rendered":"http:\/\/osric.com\/chris\/accidental-developer\/?p=1527"},"modified":"2016-07-08T09:48:08","modified_gmt":"2016-07-08T14:48:08","slug":"removing-exceptions-from-a-list-using-bash-with-sed-and-awk","status":"publish","type":"post","link":"https:\/\/osric.com\/chris\/accidental-developer\/2016\/07\/removing-exceptions-from-a-list-using-bash-with-sed-and-awk\/","title":{"rendered":"Removing exceptions from a list using Bash (with sed and awk)"},"content":{"rendered":"<ul>\n<li>I have a CSV file, a list of 1000+ users and user properties.<\/li>\n<li>I have a list of exceptions (users to be excluded from processing), one user per line, about 50 total.<\/li>\n<\/ul>\n<p>How can I remove the exceptions from the list?<\/p>\n<p><code># make a copy of the original list<br \/>\ncp list-of-1000.csv list-of-1000-less-exceptions.csv<br \/>\n# loop through each line in exceptions.txt and remove matching lines from the copy<br \/>\nwhile read line; do sed -i \"\/${line}\/d\" list-of-1000-less-exceptions.csv; done &lt; exceptions.txt<\/code><\/p>\n<p>This is a little simplistic and could be a problem if any usernames are subsets of other usernames. (For example, if user &#8216;bob&#8217; is on the list of exceptions, but the list of users also contains &#8216;bobb&#8217;, both would be deleted.)<\/p>\n<p>In the particular instance I am dealing with, the username is conveniently the first field in the CSV file. This allows me to match the start of the line and the comma following the username:<\/p>\n<p><code>while read line; do sed -i \"\/^${line},\/d\" list-of-1000-less-exceptions.csv; done &lt; exceptions.txt<\/code><\/p>\n<p>What if the username was the third field in the CSV instead of the first?<\/p>\n<p>Use <code>awk<\/code>:<br \/>\n<code>awk -F, -vOFS=, '{print $3,$0}' list-of-exceptions.csv &gt; copy-of-list-of-exceptions.csv<\/code><\/p>\n<ul>\n<li><code>-F,<\/code> sets the field separator to a comma (defaults to whitespace)<\/li>\n<li><code>-vOFS=,<\/code> sets the Output Field Separator (OFS) to a comma (defaults to a space)<\/li>\n<li><code>$3<\/code> prints the third field<\/li>\n<li><code>$0<\/code> prints all the fields, with the specified field separator between them<\/li>\n<\/ul>\n<p><code>while read line; do sed -i \"\/^${line},\/d\" copy-of-1000-less-exceptions.csv; done &lt; exceptions.txt<\/code><\/p>\n<p>Now there&#8217;s still an extra username in this file. Maybe that doesn&#8217;t matter, but maybe it does. There are several ways to remove it&#8211;here&#8217;s one:<\/p>\n<p><code>awk -F, -vOFS=, '$1=\"\"; print $0' copy-of-1000-less-exceptions.csv | sed 's\/^,\/\/' &gt; list-of-1000-less-exceptions.csv<\/code><\/p>\n<ul>\n<li><code>-F,<\/code> sets the field separator to a comma (defaults to whitespace)<\/li>\n<li><code>-vOFS=,<\/code> sets the Output Field Separator (OFS) to a comma (defaults to a space)<\/li>\n<li><code>$1=\"\"<\/code> sets the first field to an empty string<\/li>\n<li><code>print $0<\/code> prints all the fields<\/li>\n<\/ul>\n<p>The result of the <code>awk<\/code> command has an initial comma on each line. The first field is still there, it&#8217;s just set to an empty string. I used <code>sed<\/code> to remove it.<\/p>\n<p>You could also use sed alone to remove the extra username field:<br \/>\n<code>sed -i 's\/^[^,]*,\/\/' copy-of-1000-less-exceptions.csv<\/code><\/p>\n","protected":false},"excerpt":{"rendered":"<p>I have a CSV file, a list of 1000+ users and user properties. I have a list of exceptions (users to be excluded from processing), one user per line, about 50 total. How can I remove the exceptions from the list? # make a copy of the original list cp list-of-1000.csv list-of-1000-less-exceptions.csv # loop through &hellip; <a href=\"https:\/\/osric.com\/chris\/accidental-developer\/2016\/07\/removing-exceptions-from-a-list-using-bash-with-sed-and-awk\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Removing exceptions from a list using Bash (with sed and awk)<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[232],"tags":[406,197,196,293,297],"class_list":["post-1527","post","type-post","status-publish","format-standard","hentry","category-tips-tricks","tag-awk","tag-bash","tag-linux","tag-sed","tag-shell"],"_links":{"self":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/1527","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/comments?post=1527"}],"version-history":[{"count":8,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/1527\/revisions"}],"predecessor-version":[{"id":1536,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/1527\/revisions\/1536"}],"wp:attachment":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/media?parent=1527"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/categories?post=1527"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/tags?post=1527"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}