Size of data in bytes – The Accidental Developer

This was prompted by an error I was running into with the AWS s3 service: I needed to tell the transfer utility the size of the data, in bytes, when transferring large files.

In this case I am looking at files of characters. Some of these methods should work equally well for binary files, and others don’t. In the following examples, I’ll use the full text of Moby-Dick from Project Gutenberg, 2701-0.txt, as the target file. I retrieved the file using the following command:

curl -O http://www.gutenberg.org/files/2701/2701-0.txt

A couple commands to get size in bytes immediately came to mind: ls, stat, and wc.

$ ls -l 2701-0.txt | cut -d' ' -f5
1276201

$ stat --format %s 2701-0.txt 
1276201

$ wc -c 2701-0.txt | cut -d' ' -f1
1276201

All those options work. But what if the input isn’t a file on disk, and instead is an input stream? This is to demonstrate counting the bytes in a character stream coming from any source, so forgive the “useless use of cat”:

$ cat 2701-0.txt | wc -c
1276201

$ cat 2701-0.txt | cksum | cut -d' ' -f2
1276201

$ cat 2701-0.txt | dd of=/dev/null
2492+1 records in
2492+1 records out
1276201 bytes (1.3 MB, 1.2 MiB) copied, 0.00997434 s, 128 MB/s

The output from dd above is not the simplest thing to parse. It’s multi-line and sent to stderr, so I redirected it to stdout and grepped for “bytes”:

$ cat 2701-0.txt | dd of=/dev/null 2>&1 | grep 'bytes' | cut -d' ' -f1
1276201

There are at least 5 methods to find the size of a file using common command-line tools:

ls
stat
wc
cksum
dd

Know of others? Leave a comment below.

Leave a Reply