- I have a CSV file, a list of 1000+ users and user properties.
- I have a list of exceptions (users to be excluded from processing), one user per line, about 50 total.
How can I remove the exceptions from the list?
# make a copy of the original list
cp list-of-1000.csv list-of-1000-less-exceptions.csv
# loop through each line in exceptions.txt and remove matching lines from the copy
while read line; do sed -i "/${line}/d" list-of-1000-less-exceptions.csv; done < exceptions.txt
This is a little simplistic and could be a problem if any usernames are subsets of other usernames. (For example, if user ‘bob’ is on the list of exceptions, but the list of users also contains ‘bobb’, both would be deleted.)
In the particular instance I am dealing with, the username is conveniently the first field in the CSV file. This allows me to match the start of the line and the comma following the username:
while read line; do sed -i "/^${line},/d" list-of-1000-less-exceptions.csv; done < exceptions.txt
What if the username was the third field in the CSV instead of the first?
Use awk
:
awk -F, -vOFS=, '{print $3,$0}' list-of-exceptions.csv > copy-of-list-of-exceptions.csv
-F,
sets the field separator to a comma (defaults to whitespace)-vOFS=,
sets the Output Field Separator (OFS) to a comma (defaults to a space)$3
prints the third field$0
prints all the fields, with the specified field separator between them
while read line; do sed -i "/^${line},/d" copy-of-1000-less-exceptions.csv; done < exceptions.txt
Now there’s still an extra username in this file. Maybe that doesn’t matter, but maybe it does. There are several ways to remove it–here’s one:
awk -F, -vOFS=, '$1=""; print $0' copy-of-1000-less-exceptions.csv | sed 's/^,//' > list-of-1000-less-exceptions.csv
-F,
sets the field separator to a comma (defaults to whitespace)-vOFS=,
sets the Output Field Separator (OFS) to a comma (defaults to a space)$1=""
sets the first field to an empty stringprint $0
prints all the fields
The result of the awk
command has an initial comma on each line. The first field is still there, it’s just set to an empty string. I used sed
to remove it.
You could also use sed alone to remove the extra username field:
sed -i 's/^[^,]*,//' copy-of-1000-less-exceptions.csv
You should supply example files, so readers can test your solutions, and/or come up with their own.