{"id":3424,"date":"2021-02-24T18:11:08","date_gmt":"2021-02-24T23:11:08","guid":{"rendered":"http:\/\/osric.com\/chris\/accidental-developer\/?p=3424"},"modified":"2021-02-24T18:11:08","modified_gmt":"2021-02-24T23:11:08","slug":"size-of-data-in-bytes","status":"publish","type":"post","link":"https:\/\/osric.com\/chris\/accidental-developer\/2021\/02\/size-of-data-in-bytes\/","title":{"rendered":"Size of data in bytes"},"content":{"rendered":"<p>This was prompted by an error I was running into with the AWS s3 service: I needed to tell the transfer utility the size of the data, in bytes, when transferring large files.<\/p>\n<p>In this case I am looking at files of <em>characters<\/em>. Some of these methods should work equally well for binary files, and others don&#8217;t. In the following examples, I&#8217;ll use the full text of <em>Moby-Dick<\/em> from Project Gutenberg, <code>2701-0.txt<\/code>, as the target file. I retrieved the file using the following command:<\/p>\n<pre><code>curl -O http:\/\/www.gutenberg.org\/files\/2701\/2701-0.txt<\/code><\/pre>\n<p>A couple commands to get size in bytes immediately came to mind: <code>ls<\/code>, <code>stat<\/code>, and <code>wc<\/code>. <\/p>\n<pre><code>$ ls -l 2701-0.txt | cut -d' ' -f5\r\n1276201\r\n\r\n$ stat --format %s 2701-0.txt \r\n1276201\r\n\r\n$ wc -c 2701-0.txt | cut -d' ' -f1\r\n1276201<\/code><\/pre>\n<p>All those options work. But what if the input isn&#8217;t a file on disk, and instead is an input stream? This is to demonstrate counting the bytes in a character stream coming from any source, so forgive the &#8220;useless use of cat&#8221;:<\/p>\n<pre><code>$ cat 2701-0.txt | wc -c\r\n1276201\r\n\r\n$ cat 2701-0.txt | cksum | cut -d' ' -f2\r\n1276201\r\n\r\n$ cat 2701-0.txt | dd of=\/dev\/null\r\n2492+1 records in\r\n2492+1 records out\r\n1276201 bytes (1.3 MB, 1.2 MiB) copied, 0.00997434 s, 128 MB\/s<\/code><\/pre>\n<p>The output from <code>dd<\/code> above is not the simplest thing to parse. It&#8217;s multi-line and sent to <code>stderr<\/code>, so I redirected it to <code>stdout<\/code> and grepped for &#8220;bytes&#8221;:<\/p>\n<pre><code>$ cat 2701-0.txt | dd of=\/dev\/null 2&gt;&amp;1 | grep 'bytes' | cut -d' ' -f1\r\n1276201<\/code><\/pre>\n<p>There are at least 5 methods to find the size of a file using common command-line tools:<\/p>\n<ul>\n<li>ls<\/li>\n<li>stat<\/li>\n<li>wc<\/li>\n<li>cksum<\/li>\n<li>dd<\/li>\n<\/ul>\n<p>Know of others? Leave a comment below.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This was prompted by an error I was running into with the AWS s3 service: I needed to tell the transfer utility the size of the data, in bytes, when transferring large files. In this case I am looking at files of characters. Some of these methods should work equally well for binary files, and &hellip; <a href=\"https:\/\/osric.com\/chris\/accidental-developer\/2021\/02\/size-of-data-in-bytes\/\" class=\"more-link\">Continue reading <span class=\"screen-reader-text\">Size of data in bytes<\/span><\/a><\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[232],"tags":[],"class_list":["post-3424","post","type-post","status-publish","format-standard","hentry","category-tips-tricks"],"_links":{"self":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3424","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/comments?post=3424"}],"version-history":[{"count":16,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3424\/revisions"}],"predecessor-version":[{"id":3443,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3424\/revisions\/3443"}],"wp:attachment":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/media?parent=3424"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/categories?post=3424"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/tags?post=3424"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}