This was prompted by an error I was running into with the AWS s3 service: I needed to tell the transfer utility the size of the data, in bytes, when transferring large files.
In this case I am looking at files of characters. Some of these methods should work equally well for binary files, and others don’t. In the following examples, I’ll use the full text of Moby-Dick from Project Gutenberg,
2701-0.txt, as the target file. I retrieved the file using the following command:
curl -O http://www.gutenberg.org/files/2701/2701-0.txt
A couple commands to get size in bytes immediately came to mind:
$ ls -l 2701-0.txt | cut -d' ' -f5 1276201 $ stat --format %s 2701-0.txt 1276201 $ wc -c 2701-0.txt | cut -d' ' -f1 1276201
All those options work. But what if the input isn’t a file on disk, and instead is an input stream? This is to demonstrate counting the bytes in a character stream coming from any source, so forgive the “useless use of cat”:
$ cat 2701-0.txt | wc -c 1276201 $ cat 2701-0.txt | cksum | cut -d' ' -f2 1276201 $ cat 2701-0.txt | dd of=/dev/null 2492+1 records in 2492+1 records out 1276201 bytes (1.3 MB, 1.2 MiB) copied, 0.00997434 s, 128 MB/s
The output from
dd above is not the simplest thing to parse. It’s multi-line and sent to
stderr, so I redirected it to
stdout and grepped for “bytes”:
$ cat 2701-0.txt | dd of=/dev/null 2>&1 | grep 'bytes' | cut -d' ' -f1 1276201
There are at least 5 methods to find the size of a file using common command-line tools:
Know of others? Leave a comment below.