Recently a handful of e-mail messages went undelivered due to some mis-communication between 2 servers.
One server had a record of all the addresses it thought it sent to over the period of time in question, and the other server a record of all the addresses to which it had actually delivered (including messages from several other servers).
I had both lists, but what I really wanted was just the set difference: only the elements of the first list that did not appear in the second. (In other words, a list of the recipients whose messages were never delivered).
I had two files:
- possibly-delivered.txt
- definitely-delivered.txt
First, the possibly-delivered.txt file had a bunch of extraneous lines, all of which contained the same term: “undelivered”. Since that term did not exist in any of the lines I was looking for, I removed all the lines using sed (stream editor):
sed '/undelivered/d' possibly-delivered.txt > possibly-delivered-edited.txt
I already knew (from prior investigations) that there should be 204 addresses in that list, so I performed a check to make sure there were 204 lines in the file using wc (word count):
wc -l possibly-delivered-edited.txt
204 lines returned. Great! Now, how to compare the 2 files to get only the results I wanted?
With a little help from Set Operations in the Unix Shell I found what I needed–comm (compare):
comm -23 possibly-delivered-edited.txt definitely-delivered.txt
However, comm warned me that the 2 files were not in sorted order, so first I had to sort them:
sort possibly-delivered-edited.txt > possibly-delivered-edited-sorted.txt
sort definitely-delivered.txt > definitely-delivered-sorted.txt
Again:
comm -23 possibly-delivered-edited-sorted.txt definitely-delivered-sorted.txt
This returned zero results. That was not possible (or at least, highly improbable!), so I checked the files. It looks like the sed command had converted my Windows linebreaks to Unix linebreaks, so I ran a command to put them back:
unix2dos possibly-delivered-edited-sorted.txt
Again:
comm -23 possibly-delivered-edited-sorted.txt definitely-delivered-sorted.txt
That returned my list of addresses from the first list that did not appear in the second list. (Quickly, accurately, and without tedium.)