{"id":3733,"date":"2024-01-16T18:49:32","date_gmt":"2024-01-16T23:49:32","guid":{"rendered":"https:\/\/osric.com\/chris\/accidental-developer\/?p=3733"},"modified":"2024-01-16T18:49:32","modified_gmt":"2024-01-16T23:49:32","slug":"renaming-multiple-files-replacing-or-truncating-varied-file-extensions","status":"publish","type":"post","link":"https:\/\/osric.com\/chris\/accidental-developer\/2024\/01\/renaming-multiple-files-replacing-or-truncating-varied-file-extensions\/","title":{"rendered":"Renaming multiple files: replacing or truncating varied file extensions"},"content":{"rendered":"<p>In the previous post, I ran into an issue where Wget saved files to disk verbatim, including query strings\/parameters. The files on disk ended up looking like this:<\/p>\n<ul>\n<li><code>wp-includes\/js\/comment-reply.min.js?ver=6.4.2<\/code><\/li>\n<li><code>wp-includes\/js\/jquery\/jquery-migrate.min.js?ver=3.4.1<\/code><\/li>\n<li><code>wp-includes\/js\/jquery\/jquery.min.js?ver=3.7.1<\/code><\/li>\n<li><code>wp-includes\/css\/dist\/block-library\/style.min.css?ver=6.4.2<\/code><\/li>\n<\/ul>\n<p>I wanted to find a way to rename all these files, and truncate the filename after and including the question mark. As an example, to convert <code>jquery.min.js?ver=3.7.1<\/code> to <code>jquery.min.js<\/code>.<\/p>\n<p><!--more--><\/p>\n<p>I ended up creating a setup script to help test this because I was trying several different methods. The following just creates a directory structure with some realistic examples several directories deep in the hierarchy:<\/p>\n<p><strong>setup.sh<\/strong><\/p>\n<pre><code>#!\/bin\/sh\r\n\r\nmkdir -p demo\/wp-includes\/js\/jquery\r\nmkdir -p demo\/wp-includes\/css\/dist\/block-library\r\ntouch demo\/wp-includes\/js\/comment-reply.min.js?ver=6.4.2\r\ntouch demo\/wp-includes\/js\/jquery\/jquery-migrate.min.js?ver=3.4.1\r\ntouch demo\/wp-includes\/js\/jquery\/jquery.min.js?ver=3.7.1\r\ntouch demo\/wp-includes\/css\/dist\/block-library\/style.min.css?ver=6.4.2<\/code><\/pre>\n<p>I used the <code>find<\/code> command to find all the files in the <code>demo<\/code> directory (and subdirectories below) that include a question mark in the filename:<\/p>\n<pre><code>find .\/demo -type f -name '*\\?*'<\/code><\/pre>\n<p>In case the <code>find<\/code> command is not something you use every day:<\/p>\n<ul>\n<li><code>-type f<\/code> indicates it will look only for regular files, not directories or symlinks<\/li>\n<li><code>-name '*\\?*'<\/code> will look for any files with a question mark in the name (preceded or followed by zero or more characters)<\/li>\n<\/ul>\n<p><strong>Method 1: short shell script<\/strong><br \/>\nOne way to update the filenames is by creating a quick shell script, <code>rename.sh<\/code>:<\/p>\n<p><strong>rename.sh:<\/strong><\/p>\n<pre><code>#!\/bin\/sh\r\n\r\nNEWFILE=$(echo \"$1\" | cut -f1 -d?)\r\nmv \"$1\" \"$NEWFILE\"\r\n\r\nexit 0<\/code><\/pre>\n<p>Then I can call the <code>rename.sh<\/code> shell script from the <code>find<\/code> command using the <code>-exec<\/code> or <code>-execdir<\/code> options:<\/p>\n<pre><code>find .\/demo -type f -name '*\\?*' -execdir ~\/rename.sh '{}' \\;<\/code><\/pre>\n<p>After that, I ran find again to confirm the filenames were changed:<\/p>\n<pre><code>find .\/demo -type f\r\n.\/demo\/wp-includes\/js\/jquery\/jquery.min.js\r\n.\/demo\/wp-includes\/js\/jquery\/jquery-migrate.min.js\r\n.\/demo\/wp-includes\/js\/comment-reply.min.js\r\n.\/demo\/wp-includes\/css\/dist\/block-library\/style.min.css<\/code><\/pre>\n<p>That works, but I felt like this is something that could be achieved with a one-liner.<\/p>\n<p><strong>Method 2: while loop<\/strong><br \/>\nIn this case I passed the output of the find command to a while loop that manipulates the filename and moves the file from the old name to the new name:<\/p>\n<pre><code>find .\/demo -type f -name '*\\?*' | while read OLDFILE; do NEWFILE=$(echo $OLDFILE | cut -f1 -d?); mv $OLDFILE $NEWFILE; done<\/code><\/pre>\n<p>While that&#8217;s technically a one-liner, I found it clunky. It&#8217;s hard to read. I don&#8217;t want to say to someone, &#8220;Just do this,&#8221; and give them 125 characters.<\/p>\n<p><strong>Method 3: Perl<\/strong><br \/>\nI was previously unaware of this, but ran into Perl&#8217;s <code>rename<\/code> function somewhere on a Stack Exchange site:<\/p>\n<pre><code>find .\/demo -type f -name '*\\?*' | perl -lne '($old=$_) &amp;&amp; s\/\\?ver=.+$\/\/ &amp;&amp; rename($old,$_)'<\/code><\/pre>\n<p>That takes some deciphering:<\/p>\n<ul>\n<li>The <code>-l<\/code> option tells Perl to chomp the input (remove whitespace characters, including the newline)<\/li>\n<li>The <code>-n<\/code> option tells Perl to loop over the input one line at a time<\/li>\n<li>The <code>-e<\/code> option tells Perl to execute or evaluate the given command string (rather than loading and executing Perl code from a file)<\/li>\n<li>Recall that in Perl, the default variable is $_. And, if a variable is not specified for an operation, it will operate on the default variable.<\/li>\n<\/ul>\n<p>Perl seems to have fallen out of favor in the past 10-20 years, but it is still very good at some operations and is preinstalled on most Linux distributions. Still, that&#8217;s 93 characters and it&#8217;s not immediately evident what the code does.<\/p>\n<p><strong>Method 4: the rename command<\/strong><br \/>\nThere are two (or maybe more) <code>rename<\/code> commands. One is part of the <code>util-linux<\/code> package from the Linux Kernel Organization. It is pretty basic, although it can be useful. Another is a Perl-based <code>rename<\/code> command, which can use regular expressions. You can&#8217;t have two commands with the same name, so on some systems the Perl-based rename command is <code>file-rename<\/code>, on others it is <code>prename<\/code> (pre = Perl Regular Expressions).<\/p>\n<p>It&#8217;s kind of a mess. For more details and discussion, check out:<\/p>\n<ul>\n<li><a href=\"https:\/\/unix.stackexchange.com\/questions\/730894\/what-are-the-different-versions-of-the-rename-command-how-do-i-use-the-perl-ver\">What are the different versions of the rename command? How do I use the Perl version?<\/a><\/li>\n<li><a href=\"https:\/\/unix.stackexchange.com\/questions\/229230\/whats-with-all-the-renames-prename-rename-file-rename\">What&#8217;s with all the renames: prename, rename, file-rename?<\/a><\/li>\n<\/ul>\n<p>I find the Perl-based <code>rename<\/code> command the simplest solution to the problem. It may not be installed by default:<\/p>\n<pre><code>sudo apt install rename<\/code><\/pre>\n<p>Now the command looks like this:<\/p>\n<pre><code>find .\/demo -type f -name '*\\?*' -execdir file-rename 's\/\\?.+$\/\/' '{}' \\;<\/code><\/pre>\n<p>74 characters! That feels more like a one-liner.<\/p>\n<p>On <a href=\"https:\/\/unix.stackexchange.com\/questions\/19654\/how-do-i-change-the-extension-of-multiple-files\/185262#185262\">How do I change the extension of multiple files?<\/a>, a comment suggests using <code>+<\/code> instead of <code>;<\/code> to terminate the find command. Since <code>file-rename<\/code> can take multiple files, <code>+<\/code> would pass them all instead of running <code>file-rename<\/code> over and over:<\/p>\n<pre><code>find .\/demo -type f -name '*\\?*' -execdir file-rename 's\/\\?.+$\/\/' '{}' +<\/code><\/pre>\n<p>In my tests that worked perfectly. The link above points out that for an exceedingly long list of files you may encounter the error message <code>arg list too long<\/code>. I think you are unlikely to run into that under normal circumstances.<\/p>\n<p>If you have other favorite methods or improvements upon the above, let me know in the comments.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>After using Wget to mirror a WordPress site, I ended up with a bunch of files on disk that included question marks (&#8220;?&#8221;) and various querystring parameters (&#8220;ver=3.7.1&#8221;). I examine several different ways to rename the files and drop the question mark and following text from each filename.<\/p>\n","protected":false},"author":2,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[232],"tags":[],"class_list":["post-3733","post","type-post","status-publish","format-standard","hentry","category-tips-tricks"],"_links":{"self":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3733","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/users\/2"}],"replies":[{"embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/comments?post=3733"}],"version-history":[{"count":12,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3733\/revisions"}],"predecessor-version":[{"id":3745,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/posts\/3733\/revisions\/3745"}],"wp:attachment":[{"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/media?parent=3733"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/categories?post=3733"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/osric.com\/chris\/accidental-developer\/wp-json\/wp\/v2\/tags?post=3733"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}