The Accidental Developer – Page 2 – What if Gregor Samsa awoke a computer programmer?

3 ways to remove blank lines from a file

There are certainly more than 3 ways to do this. Typically I’ve always used sed to do this, but here’s my method using sed and two other methods using tr and awk:

sed:

sed '/^$/d' file_with_blank_lines

tr:

tr -s '\n' <file_with_blank_lines

awk:

awk '{ if ($0) print $0 }' file_with_blank_lines

If you have other favorite ways, leave a note in the comments!

Migrating database servers

As I’m migrating websites and applications from one server to another, I’m also migrating databases from one server to another.

Even though I’ve done this dozens, if not hundreds, of times, I always find myself looking up how to do this. I’m migrating from one MySQL (MariaDB) servers to another MySQL (MariaDB), so relatively straightforward but still some command syntax I don’t remember off the top of my head.

First, export the old database to a file:

DBHOST=old-db-host.osric.com
DBUSER=dbusername
DBNAME=dbname
mysqldump --add-drop-table -h $DBHOST -u $DBUSER -p $DBNAME >$DBNAME.07-NOV-2023.bak.sql

The this mysqldump command produces output that will re-create the necessary tables and insert the data.

In this case I’m not compressing the output, but it would be trivial to pipe the output of mysqldump to a compression utility such as xz, bzip2, or gzip. For my data, which is entirely text-based, any of these utilities performs well, although xz achieves the best compression:

mysqldump --add-drop-table -h $DBHOST -u $DBUSER -p $DBNAME | xz -c >$DBNAME.07-NOV-2023.bak.sql.xz
mysqldump --add-drop-table -h $DBHOST -u $DBUSER -p $DBNAME | bzip2 -c >$DBNAME.07-NOV-2023.bak.sql.bz2
mysqldump --add-drop-table -h $DBHOST -u $DBUSER -p $DBNAME | gzip -c >$DBNAME.07-NOV-2023.bak.sql.gz

Next, create the new database and a database user account. This assumes there is a database server running:

sudo mysql -u root
CREATE DATABASE dbname;
CREATE USER 'dbuser'@'localhost' IDENTIFIED BY 'your-t0p-s3cr3t-pa55w0rd';
GRANT ALL PRIVILEGES ON dbname.* TO 'dbuser'@'localhost';

Note that the CREATE USER and GRANT PRIVILEGES commands will result in a “0 rows affected” message, which is normal:

Query OK, 0 rows affected (0.002 sec)

There are other ways to create the database, see 7.4.2 Reloading SQL-Format Backups in the MySQL documentation.

Next, import the database from the file. This example uses the root user because I did not grant the dbuser PROCESS privileges (which are not table-level privileges):

sudo mysql --user=root --host=localhost dbname <dbname.07-NOV-2023.bak.sql

WordPress 6.3 is incompatible with older versions of PHP

After installing WordPress 6.3, this site was broken because the new version of WordPress isn’t compatible with PHP 5.x.

I know WordPress has been complaining about this for a while, but PHP 5.x is the default version on CentOS 7, which is still supported until June 30, 2024.

I would expect that WordPress would, instead of encouraging the users on systems with old versions of PHP to apply the update, warn that applying the update will absolutely break the target website.

I’m exceedingly annoyed at WordPress. An absolutely terrible experience.

I currently have the site running on a temporary server that is a little fragile, it remains to be seen how stable it will be over the coming days.

Running Splunk in AWS

I don’t like using Google Analytics. The data is useful and well-presented, but I really just want basic web stats without sending all my web stats (along with data from my users) to Google. I’ve considered a number of other options, including Matomo. But I already use Splunk at work, why not run Splunk at home too?

Splunk Enterprise offers a 60-day trial license. After that, there’s a free license. It’s not really clear that the free license covers what I’m trying to do here. The free license info includes:

If you want to run Splunk Enterprise to practice searches, data ingestion, and other tasks without worrying about a license, Splunk Free is the tool for you.

I think this scenario qualifies. This is a hobby server. I use Splunk at my day job, so this is in some sense Splunk practice. I’ll give it more thought over the next 60 days. Your situation may vary!

I launched an EC2 instance in AWS (Amazon Web Services). I picked a t2.micro instance. That instance size might be too small, but I’m not planning to send much data there. I picked Amazon Linux, which uses yum and RPMs for package management, familiar from the RHEL, CentOS, and now Rocky Linux servers I use frequently. (One thing to note, the default user for Amazon Linux is ec2-user. I always have to look that up.)

For purposes of this post, I’ll use 203.0.113.18 as the EC2 instance’s public IP address. (203.0.113.0/24 is an address block reserved for documentation, see RFC 5737.)

I transferred the RPM to the new server. I’m using Splunk 9.0.3, the current version as of this writing. I installed it:

sudo yum install splunk-9.0.3-dd0128b1f8cd-linux-2.6-x86_64.rpm

Yum reported the installed size as 1.4 GB. Important to note, since I used an 8 GB HD, the default volume size when I launched the EC2 instance.

I added an inbound rule the security group associated with the EC2 instance to allow 8000/tcp traffic from my home IPv4 address.

The installation works! I was able to connect to 203.0.113.18:8000 in a web browser. My connection to 203.0.113.18:8000 was not encrypted, but one thing at a time, right?

Disk space, as I suspected, might be an issue. This warning appeared in Splunk’s health status:

MinFreeSpace=5000. The diskspace remaining=3962 is less than 1 x minFreeSpace on /opt/splunk/var/lib/splunk/audit/db

Next question: how do I get data into Splunk? The Splunk Enterprise download page helpfully includes a link to a “Getting Data In — Linux” video, although the video focused on ingesting local logs. I’m more interested in setting up the Splunk Universal Forwarder on a different server and ingesting logs from the osric.com web server. I installed the Splunk forwarder on the target web server.

I enabled a receiver via Splunk web (see Enable a receiver for Splunk Enterprise for information). I used the suggested port, 9997/tcp.

I also allowed this traffic from the web server’s IPv4 address via the AWS security group associated with the EC2 instance.

I configured the forwarder on the target web server (see Configure the universal forwarder using configuration files for more details):

$ ./bin/splunk add forward-server 203.0.113.18:9997
Warning: Attempting to revert the SPLUNK_HOME ownership
Warning: Executing "chown -R splunk /opt/splunkforwarder"
WARNING: Server Certificate Hostname Validation is disabled. Please see server.conf/[sslConfig]/cliVerifyServerName for details.
Splunk username: admin
Password:
Added forwarding to: 203.0.113.18:9997.

I tried running a search, but the disk space limitations finally became apparent:

Search not executed: The minimum free disk space (5000MB) reached for /opt/splunk/var/run/splunk/dispatch. user=admin., concurrency_category="historical", concurrency_context="user_instance-wide", current_concurrency=0, concurrency_limit=5000

I increased disk to 16 GB. (I’d never done that before for an EC2 instance, but it was surprisingly easy.)

I needed to add something to monitor. On the target web server host I ran the following:

$ sudo -u splunk /opt/splunkforwarder/bin/splunk add monitor /var/www/chris/data/logs

The resulting output included the following message:

Checking: /opt/splunkforwarder/etc/system/default/alert_actions.conf
                Invalid key in stanza [webhook] in /opt/splunkforwarder/etc/system/default/alert_actions.conf, line 229: enable_allowlist (value: false).

It’s not clear if that’s actually a problem, and a few search results suggested it wasn’t worth worrying about.

Everything was configured to forward data from the web server to Splunk. How could I find the data? I tried running a simple Splunk search:

index=main

0 events returned. I also checked the indices at http://203.0.113.18:8000/en-US/manager/search/data/indexes, which showed there were 0 events in the main index.

I ran tcpdump on the target web server and confirmed there were successful connections to 203.0.113.18 on port 9997/tcp:

sudo tcpdump -i eth0 -nn port 9997

I tried another search on the Splunk web interface, this time querying some of Splunk’s internal indexes:

index=_* osric

Several results were present. Clearly communication was happening. But where were the web logs?

The splunk user on the target web server doesn’t have permissions to read the web logs! I ran the following:

chown apache:splunk /var/www/chris/data/logs/osric*

After that change, the Indexes page in the Splunk web interface still showed 0 events in the main index.

I followed the advice on What are the basic troubleshooting steps in case of universal forwarder and heavy forwarder not forwarding data to Splunk?, but still wasn’t seeing any issues. I took a close look again at the advice to check permissions. Tailing a specific log file worked fine, but getting a directory listing as the splunk user failed:

$ sudo -u splunk ls logs
ls: cannot open directory logs: Permission denied

Of course! The splunk user had access to the logs themselves, but not to the directory containing them. It couldn’t enumerate the log files. I ran the following:

$ sudo chgrp splunk logs

That did it! Logs were flowing! Search queries like the following produced results on the Splunk web interface:

index=main

The search was slow, and there were warnings present when searching:

Configuration initialization for /opt/splunk/etc took longer than expected (1964ms) when dispatching a search with search ID 1676220274.309. This usually indicates problems with underlying storage performance.

I looks like t2.micro is much too small and under-powered for Splunk, even an instance with very little data (only 3 MB of data and 20,000 log events in the main index).

Despite these drawbacks, the data was searchable. How did Splunk compare as a solution?

Dashboards
I’ll need to create dashboards from scratch. I’ll want to know top pages, top URIs resulting in 404 errors, top user agents, etc. All of those will need to be built. It’s possible there’s a good Splunk app available that includes a lot of common dashboards for the Apache web server, but I haven’t really explored that.

Google Analytics can’t report on 404 errors, but otherwise it provides a lot of comprehensive dashboards and data visualizations. Even if all you want are basic web stats, an application tailored to web analytics will include a lot of ready-made functionality.

Robots, Spiders, and Crawlers (and More)
It turns out, a large percentage of requests to my web server are not from human beings. Many of the requests are coming from robots. At least 31% of requests in the past day were coming from these 9 bots:

8LEGS
Sogou
PetalBot
AhrefsBot
SEOkicks
zoominfobot
SemrushBot
BingBot
DotBot

Google Analytics (and presumably other web analytics tools) do a great job of filtering these out. It’s good to know which bots are visiting, but it’s not really telling me anything about which content is most popular with users.

Security Insights
Related to the above, the stats from the web logs do a much better job of showing suspicious activity than Google Analytics does. It’s much easier to see which IP addresses are requesting files that don’t exist, or are repeatedly trying and failing to log in to WordPress (19% of all requests are for wp-login.php). This is useful information that I can use to help protect the server: I’ve previously written about how to block WordPress scanners using fail2ban. A tool dedicated to web analytics likely won’t provide this kind of detail, and may in fact hide it from site administrators if they aren’t also reviewing their logs.

Costs
The t2.micro instance will cost me approximately 8 USD per month. The t2.micro instance clearly isn’t powerful enough to run Splunk at any reasonable level of performance, even for a single-user system with a fairly small number of log events.

What is the right size instance? I don’t have enough experience running Splunk as an administrator to make a guess, or even to determine if the bottleneck is CPU (likely) or RAM. But I decided to at least try upgrading the instance to t2.medium to see if that made a difference, since that includes 2 virtual CPUs (twice that of the t2.micro) and 4 GB RAM (four times that of t2.micro).

It did make a difference! The Splunk web interface is much faster now, but will cost roughly 33 USD per month. That’s getting close to the amount I pay to run the web server itself. I think setting up Splunk to collect web stats was a useful exercise, but I’m going to look at some other alternatives as Google Analytics replacements.

DIY Gist Chatbots

[This was originally posted at the now-defunct impractical.bot on 23 Feb 2019]

I created a tool that will allow anyone to experiment with NLTK (Natural Language Toolkit) chatbots without writing any Python code. The repository for the backend code is available on GitHub: Docker NLTK chatbot.

I plan to expand on this idea, but it is usable now. In order to create your own bot:

Create a GitHub account
Create a “gist” or fork my demo gist: Greetings Bot Source
Customize the name, match, and replies elements
Note your username and the unique ID of your gist (a hash value, a 32-character string of letters and numbers)
Visit http://osric.com/chat/user/hash, replacing user with your GitHub username and hash with the unique ID of your gist. For an example, see Greetings Bot.

You can now interact with your custom bot, or share the link with your friends!

One more thing: if you update your gist, you’ll need to let the site know to update the code. Just click the “Reload Source” link on the chat page.

IPBan’s ProcessToRunOnBan functionality

The IPBan config file contains 2 interesting items that can trigger actions when IP addresses are banned or unbanned: ProcessToRunOnBan and ProcessToRunOnUnban.

Here’s the default config entry for ProcessToRunOnBan:


<add key="ProcessToRunOnBan" value=""/>

I decided I wanted to make the list of banned IP addresses public by writing to a web-accessible file. I tried adding the following values:


<add key="ProcessToRunOnBan" value="%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe C:\add-bannedip.ps1|###IPADDRESS###"/>

<add key="ProcessToRunOnUnban" value="%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe C:\remove-unbannedip.ps1|###IPADDRESS###"/>

The PowerShell (.ps1) scripts were really simple. The first adds banned IPs to a text file within the web root:

# Add an IP address to the list of banned IPs
param ($BannedIP)

$BannedPath = 'C:\inetpub\wwwroot\banned.txt'
Add-Content -Path $BannedPath -Value $BannedIP

The next removes unbanned IPs from the same text file:

# Remove an IP address from the list of banned IPs
param ($UnbannedIP)

$BannedPath = 'C:\inetpub\wwwroot\banned.txt'
Set-Content -Path $BannedPath (Get-Content $BannedPath | Select-String -NotMatch $UnbannedIP)

There are some flaws and a lack of error-checking in the above. The un-ban script could match IP addresses that are not identical, for example: 192.0.2.1 would match 192.0.2.10 and 192.0.2.100. Additionally, I would want to confirm that the parameter value was a valid IP address, but this was just a quick proof-of-concept.

However, I encountered an error when the next IP address was banned:

2022-01-03 00:51:35.8763|ERROR|DigitalRuby.IPBanCore.Logger|Failed to execute process C:\Program Files\IPBan\%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe C:\add-bannedip.ps1 192.0.2.14: System.ComponentModel.Win32Exception (2): An error occurred trying to start process 'C:\Program Files\IPBan\%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe C:\add-bannedip.ps1' with working directory 'C:\Program Files\IPBan\%SystemRoot%\system32\WindowsPowerShell\v1.0\powershell.exe C:'. The system cannot find the file specified.
   at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo )
   at System.Diagnostics.Process.StartCore(ProcessStartInfo )
   at System.Diagnostics.Process.Start()
   at System.Diagnostics.Process.Start(ProcessStartInfo )
   at DigitalRuby.IPBanCore.IPBanService.<>c__DisplayClass191_0.<ExecuteExternalProcessForIPAddresses>b__0() in C:\Users\Jeff\Documents\GitHub\DigitalRuby\IPBan\IPBanCore\Core\IPBan\IPBanService_Private.cs:line 591

It apparently failed to expand %SystemRoot%, so I replaced it with C:\Windows.

As mentioned in my previous post on IPBan (IPBan: fail2ban for Windows), I am using a remote config file hosted on an HTTPS server. Shortly after I made the change on the remote server, I noticed this in the logs (logfile.txt):

2022-01-03 01:36:43.3348|INFO|DigitalRuby.IPBanCore.Logger|Config file changed

It looks like IPBan automatically checks the GetUrlConfig value for updates. I confirmed that the file at C:\Program Files\IPBan\ipban.config was updated at the same time. This is excellent, previously I thought I might need to restart the IPBan service any time the configuration changed.

Unfortunately, my change still didn’t work. I encountered the following error:

2022-01-03 14:22:21.7154|ERROR|DigitalRuby.IPBanCore.Logger|Failed to execute process C:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe C:\add-bannedip.ps1 192.0.2.208: System.ComponentModel.Win32Exception (2): An error occurred trying to start process 'C:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe C:\add-bannedip.ps1' with working directory 'C:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe C:'. The system cannot find the file specified.
   at System.Diagnostics.Process.StartWithCreateProcess(ProcessStartInfo )
   at System.Diagnostics.Process.Start(ProcessStartInfo )
   at DigitalRuby.IPBanCore.IPBanService.<>c__DisplayClass191_0.<ExecuteExternalProcessForIPAddresses>b__0() in C:\Users\Jeff\Documents\GitHub\DigitalRuby\IPBan\IPBanCore\Core\IPBan\IPBanService_Private.cs:line 591

I decided to take a closer look at line 591:

    ProcessStartInfo psi = new()
    {
        FileName = programFullPath,
        WorkingDirectory = Path.GetDirectoryName(programFullPath),
        Arguments = replacedArgs
    };
    using Process p = Process.Start(psi);

It looked to me like it was taking the path for the PowerShell executable and the PowerShell script both as the full path. I changed the config to pass in the path to the PowerShell script as part of the arguments, which made sense:

<add key="ProcessToRunOnBan" value="C:\Windows\system32\WindowsPowerShell\v1.0\powershell.exe|C:\add-bannedip.ps1 ###IPADDRESS###"/>

That worked! The next time an IP address was banned, the .ps1 script was run and added the IP address to a web-accessible file.

IPBan: fail2ban for Windows

I was looking for a tool to block IP addresses after a certain number of failed RDP login attempts, something like fail2ban but for Windows. I came across IPBan. Calling IPBan a “fail2ban for Windows” unfairly minimizes what it can do, but it can handle that task quite nicely.

As a test I installed it on a Windows 2019 Server running in Azure using the provided install instructions from the README:

[Net.ServicePointManager]::SecurityProtocol = [Net.SecurityProtocolType]::Tls12; iex ((New-Object System.Net.WebClient).DownloadString('https://raw.githubusercontent.com/DigitalRuby/IPBan/master/IPBanCore/Windows/Scripts/install_latest.ps1'))

You don’t want to blindly run a script that downloads and installs something on your machine without looking at it first, do you? Of course not. You take a look at the code first:

https://github.com/DigitalRuby/IPBan/blob/master/IPBanCore/Windows/Scripts/install_latest.ps1

Essentially, the script does the following:

Downloads a zip file
Extracts the files
Sets up a Windows service
Suggests that you might want to edit the config by opening the config file in Notepad

I took a look at the config file and made only one change to get started: I added a CenturyLink subnet to the allow list. CenturyLink is my ISP, and I didn’t want to accidentally lock myself out. Normally you wouldn’t want to add an entire /12 of your residential ISP, but this was just a test on a temporary and disposable server:

<add key="Whitelist" value="75.160.0.0/12"/>

I noted a few other interesting things in the config file:

It can also examine Linux logs (IPBan can run on Windows or Linux)
It blocks failed RDP logins, but also blocks failed logins for other Windows services, such as MSSQL and Exchange
It defaults to banning an IP address after 5 failed attempts, but the number of failed attempts can be configured
The default ban duration is 1 day, but this can be configured
The default ban duration can take into account recidivism, so that the second or third time an address is banned it can use a longer duration
It can also use a username-based allow list (whitelist), so that attempted logins from usernames not on the list will ban the IP address immediately
It can add blocks based on external threat intelligence feeds, and uses the Emerging Threats blocklist by default
It can pull the config file from a URL, allowing you to manage the configuration of many servers from one location.

After I reviewed the config file (C:\Program Files\IPBan\ipban.config) I restarted the IPBAN service via Services GUI. You could also restart it via Powershell:

sc.exe stop IPBAN
sc.exe start IPBAN

Now that it was up-and-running, what changes did it make? I opened up Windows Defender Firewall and opened the Advanced Settings. Looking though the Inbound rules I found 4 rules prefixed with IPBAN_, three Deny rules and one Allow rule:

[Deny] IPBan_Block_0
[Deny] IPBan_EmergingThreats_0
[Deny] IPBan_EmergingThreats_1000
[Allow] IPBan_GlobalWhitelist_0

I looked at the properties for IPBan_Block_0. Already, one IP address had been banned! (I checked several hours later and only 2 additional IP addresses had been banned, so I may have gotten lucky to see one within minutes.)

It took me a while to figure out what the difference between IPBan_EmergingThreats_0 and IPBan_EmergingThreats_1000. IPBan is creating a new firewall rule after 999 entries, so the 1000th IP address or subnet from the Emerging Threats feed was added to IPBan_EmergingThreats_1000. I’ve seen some arguments online about whether or not there is a limit to the number of addresses or subnets that can be included in the scope for a Windows Defender Firewall rule, and some sources indicate there is a limit of 1000 (and I’m fairly certain the author of IPBan is one of the people arguing there is a limit).

The IPBan_GlobalWhitelist_0 contained the CenturyLink subnet that I explicitly allowed.

I was excited about pulling the configuration from a URL, so I added my configuration to https://osric.com/chris/ipban/ipban.example.config, initially with just one tweak:

<add key="UserNameWhitelist" value="chris"/>

This would, in theory, ban any IP address immediately if they used an address like admin or guest. It uses a configurable Levenshtein distance to try to avoid banning an IP address based on a typo (for example, chros instead of chris), which is a clever approach.

I then added the URL to the config file on the server itself:

<add key="GetUrlConfig" value="https://osric.com/chris/ipban/ipban.example.config"/>

After another service restart, I wanted to test to see if a login attempt with a bad username would cause an immediate block. I opened an RDP connection using the credentials admin:admin and from a VPN IP address that would be outside of the allowed CenturyLink subnet.

I did not immediately see the VPN IP address in the IPBAN_Block_0 deny rule. I checked the logs (C:\Program Files\IPBan\logfile.txt):

2021-12-31 04:35:45.3336|INFO|DigitalRuby.IPBanCore.Logger|Firewall entries updated: 
2021-12-31 04:37:45.5791|WARN|DigitalRuby.IPBanCore.Logger|Login failure: 198.51.100.101, admin, RDP, 1
2021-12-31 04:37:45.5791|INFO|DigitalRuby.IPBanCore.Logger|IP blacklisted: False, user name blacklisted: False, fails user name white list regex: False, user name edit distance blacklisted: True
2021-12-31 04:37:45.5791|WARN|DigitalRuby.IPBanCore.Logger|Banning ip address: 198.51.100.101, user name: admin, config black listed: True, count: 1, extra info: , duration: 1.00:00:00
2021-12-31 04:37:45.6024|WARN|DigitalRuby.IPBanCore.Logger|Updating firewall with 1 entries...
2021-12-31 04:37:45.6024|INFO|DigitalRuby.IPBanCore.Logger|Firewall entries updated: 198.51.100.101

There was just a delay in adding it. I checked the IPBan_Block_0 deny rule again in Windows Defender Firewall and the IP address was there. I must have been exceedingly quick, as the cycle time defaults to 15 seconds:

<add key="CycleTime" value="00:00:00:15"/>

One other change I made to my config: I set this option to false so that any future tests would not send my VPN IP to the “global ipban database”:

<add key="UseDefaultBannedIPAddressHandler" value="true" />

Normally leaving that set to true should be fine, but during testing it would be good to avoid adding your own IP address to a block list. I have not yet discovered if the global IPBan database is publicly available.

A couple other things I noted:

The ipban.config file was completely overwritten by the version at the GetUrlConfig, including the GetUrlConfig value! The next time I restarted the IPBAN service, it did not pick up the config from that URL, as the GetUrlConfig option was now blank. I updated the GetUrlConfig value on ipban.config locally and on the remotely hosted ipban.example.config to include the URL to itself, so that it would persist after a restart.

The FirewallRules config option has a somewhat confusing syntax, at least for Deny rules. I added the following rule, which then appeared in my Windows Defender Firewalls block rules as IPBan_EXTRA_GoogleDNS_0:

<add key="FirewallRules" value="
    GoogleDNS;block;8.8.4.4;53;.
"/>

This rule allows inbound port 53/tcp traffic from 8.8.4.4 and blocks it from all other ports (0-52/tcp and 54-65535/tcp). I’m still not sure how to specify that I want to block all ports, or block both TCP and UDP traffic using this config option.

IPBan has a lot of other configuration options that I’m excited to test. For me, this tool fills a major gap for Windows servers!

Size of data in bytes

This was prompted by an error I was running into with the AWS s3 service: I needed to tell the transfer utility the size of the data, in bytes, when transferring large files.

In this case I am looking at files of characters. Some of these methods should work equally well for binary files, and others don’t. In the following examples, I’ll use the full text of Moby-Dick from Project Gutenberg, 2701-0.txt, as the target file. I retrieved the file using the following command:

curl -O http://www.gutenberg.org/files/2701/2701-0.txt

A couple commands to get size in bytes immediately came to mind: ls, stat, and wc.

$ ls -l 2701-0.txt | cut -d' ' -f5
1276201

$ stat --format %s 2701-0.txt 
1276201

$ wc -c 2701-0.txt | cut -d' ' -f1
1276201

All those options work. But what if the input isn’t a file on disk, and instead is an input stream? This is to demonstrate counting the bytes in a character stream coming from any source, so forgive the “useless use of cat”:

$ cat 2701-0.txt | wc -c
1276201

$ cat 2701-0.txt | cksum | cut -d' ' -f2
1276201

$ cat 2701-0.txt | dd of=/dev/null
2492+1 records in
2492+1 records out
1276201 bytes (1.3 MB, 1.2 MiB) copied, 0.00997434 s, 128 MB/s

The output from dd above is not the simplest thing to parse. It’s multi-line and sent to stderr, so I redirected it to stdout and grepped for “bytes”:

$ cat 2701-0.txt | dd of=/dev/null 2>&1 | grep 'bytes' | cut -d' ' -f1
1276201

There are at least 5 methods to find the size of a file using common command-line tools:

ls
stat
wc
cksum
dd

Know of others? Leave a comment below.

nmap scans the top 1000 ports by default, but which 1000?

From man nmap:

The simple command nmap target scans 1,000 TCP ports on the host target.

You might reasonable ask, which 1,000 ports is it? Is the particular port in which I am interested included?

Fortunately, nmap has a list of ports/services that includes how frequently they are used. From this we can get the top 1000:

grep -v '^#' /usr/share/nmap/nmap-services | sort -rk3 | head -n1000

The initial grep is to filter out the comments (lines that begin with the hash mark).
The sort command sorts in descending order, by the 3rd column (the frequency).
The final head command displays only the top 1000 results.

In my cases, I wondered if the radmin port, 4899/tcp, was included in an nmap scan. I piped the above command to grep to find out:

grep -v '^#' /usr/share/nmap/nmap-services | sort -rk3 | head -n1000 | grep 4889
radmin  4899/tcp        0.003337        # Radmin (www.radmin.com) remote PC control software

It is included in a default nmap scan.

Is there an easier way to do this? Drop me a line in the comments!

Running VMs? Delete wireless packages!

A best practice for system configuration is to remove any unneeded software. It’s sometimes difficult to know exactly what is needed and what isn’t, but CentOS 7 minimal and CentOS 8 minimal both install a number of packages related to wireless networking. If you’re running a server or a VM there’s almost never a need for these to be present.

To identify packages, I used yum search (substitute dnf for yum on CentOS 8):

yum search wireless

I used the same command a redirected the output to a file:

yum search wireless >wireless_packages

To get just the package names and convert it to a space-separated list, I used grep, cut, and paste:

grep -v Summary wireless_packages | cut -d. -f1 | paste -d' ' -s

You can remove them with the following command:

sudo yum remove iw iwl6000-firmware crda iwl100-firmware iwl1000-firmware iwl3945-firmware iwl4965-firmware iwl5000-firmware iwl5150-firmware iwl105-firmware iwl135-firmware iwl3160-firmware iwl6000g2a-firmware iwl6000g2b-firmware iwl6050-firmware iwl2000-firmware iwl2030-firmware iwl7260-firmware

iw and crda were not installed, so were ignored. The rest were removed.

This may seem trivial, but it frees up some disk space (~100MB) and it means that these packages won’t need to be updated in the future. Getting notifications from your monitoring systems or vulnerability management systems about updates or security updates to unused and unnecessary packages should be avoided.