awk – The Accidental Developer

I have a CSV file, a list of 1000+ users and user properties.
I have a list of exceptions (users to be excluded from processing), one user per line, about 50 total.

How can I remove the exceptions from the list?

# make a copy of the original list cp list-of-1000.csv list-of-1000-less-exceptions.csv # loop through each line in exceptions.txt and remove matching lines from the copy while read line; do sed -i "/${line}/d" list-of-1000-less-exceptions.csv; done < exceptions.txt

This is a little simplistic and could be a problem if any usernames are subsets of other usernames. (For example, if user ‘bob’ is on the list of exceptions, but the list of users also contains ‘bobb’, both would be deleted.)

In the particular instance I am dealing with, the username is conveniently the first field in the CSV file. This allows me to match the start of the line and the comma following the username:

while read line; do sed -i "/^${line},/d" list-of-1000-less-exceptions.csv; done < exceptions.txt

What if the username was the third field in the CSV instead of the first?

Use awk:
awk -F, -vOFS=, '{print $3,$0}' list-of-exceptions.csv > copy-of-list-of-exceptions.csv

-F, sets the field separator to a comma (defaults to whitespace)
-vOFS=, sets the Output Field Separator (OFS) to a comma (defaults to a space)
$3 prints the third field
$0 prints all the fields, with the specified field separator between them

while read line; do sed -i "/^${line},/d" copy-of-1000-less-exceptions.csv; done < exceptions.txt

Now there’s still an extra username in this file. Maybe that doesn’t matter, but maybe it does. There are several ways to remove it–here’s one:

awk -F, -vOFS=, '$1=""; print $0' copy-of-1000-less-exceptions.csv | sed 's/^,//' > list-of-1000-less-exceptions.csv

-F, sets the field separator to a comma (defaults to whitespace)
-vOFS=, sets the Output Field Separator (OFS) to a comma (defaults to a space)
$1="" sets the first field to an empty string
print $0 prints all the fields

The result of the awk command has an initial comma on each line. The first field is still there, it’s just set to an empty string. I used sed to remove it.

You could also use sed alone to remove the extra username field:
sed -i 's/^[^,]*,//' copy-of-1000-less-exceptions.csv

Tag: awk

3 ways to remove blank lines from a file

Removing exceptions from a list using Bash (with sed and awk)