This was prompted by an error I was running into with the AWS s3 service: I needed to tell the transfer utility the size of the data, in bytes, when transferring large files.
In this case I am looking at files of characters. Some of these methods should work equally well for binary files, and others don’t. In the following examples, I’ll use the full text of Moby-Dick from Project Gutenberg, 2701-0.txt
, as the target file. I retrieved the file using the following command:
curl -O http://www.gutenberg.org/files/2701/2701-0.txt
A couple commands to get size in bytes immediately came to mind: ls
, stat
, and wc
.
$ ls -l 2701-0.txt | cut -d' ' -f5
1276201
$ stat --format %s 2701-0.txt
1276201
$ wc -c 2701-0.txt | cut -d' ' -f1
1276201
All those options work. But what if the input isn’t a file on disk, and instead is an input stream? This is to demonstrate counting the bytes in a character stream coming from any source, so forgive the “useless use of cat”:
$ cat 2701-0.txt | wc -c
1276201
$ cat 2701-0.txt | cksum | cut -d' ' -f2
1276201
$ cat 2701-0.txt | dd of=/dev/null
2492+1 records in
2492+1 records out
1276201 bytes (1.3 MB, 1.2 MiB) copied, 0.00997434 s, 128 MB/s
The output from dd
above is not the simplest thing to parse. It’s multi-line and sent to stderr
, so I redirected it to stdout
and grepped for “bytes”:
$ cat 2701-0.txt | dd of=/dev/null 2>&1 | grep 'bytes' | cut -d' ' -f1
1276201
There are at least 5 methods to find the size of a file using common command-line tools:
- ls
- stat
- wc
- cksum
- dd
Know of others? Leave a comment below.