Converting a WordPress site to a static site using Wget

I recently made a YouTube tutorial on converting a WordPress site to a static HTML site. This blog post is a companion to the video.

First of all, why convert a WordPress site to a static HTML site? There are a number of reasons, but my primary concern is to reduce update fatigue. WordPress software, along with WordPress themes and plugins, have frequent security updates. Many sites have stable content after an initial editing phase, the need to apply never-ending security updates for a site that doesn’t change doesn’t make sense.

The example site I used in the tutorial is www.stress2012.com, a site for an academic conference/workshop that was held in 2012. It’s 2024: the site content is not going to change.

To mirror the site, I used Wget with the following command:

Continue reading Converting a WordPress site to a static site using Wget

Using nc (netcat) to make an HTTP request

I must have had some reason for wanting to do this, although I can’t think of why right now. curl is an excellent tool for ad hoc HTTP requests.

On a server running Apache 2.4.6, first I tried:

# nc 127.0.0.1 80
GET / HTTP/1.1

Which returned a HTTP/1.1 400 Bad Request error.

Next I tried:

# printf "GET /index.html HTTP/1.1\r\n\r\n" | nc 127.0.0.1 80

Which also returned a HTTP/1.1 400 Bad Request error.

I decided to take a look at what curl was sending, since that was working:

# curl -v http://127.0.0.1
* About to connect() to 127.0.0.1 port 80 (#0)
* Trying 127.0.0.1...
* Connected to 127.0.0.1 (127.0.0.1) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 127.0.0.1
> Accept: */*
...

I put the same headers (with a modified User-Agent) into my printf statement:

# printf "GET /index.html HTTP/1.1\r\nUser-Agent: nc/0.0.1\r\nHost: 127.0.0.1\r\nAccept: */*\r\n\r\n" | nc 127.0.0.1 80
HTTP/1.1 200 OK
Date: Sun, 28 Jan 2018 23:11:04 GMT
Server: Apache/2.4.6 (CentOS) PHP/5.4.16
Last-Modified: Sun, 28 Jan 2018 20:10:37 GMT
ETag: "78-563dbb912bfe0"
Accept-Ranges: bytes
Content-Length: 120
Content-Type: text/html; charset=UTF-8

<!DOCTYPE html>
<html>
<head>
<title>well that worked</title>
</head>
<body>
<h1>apache is running</h1>
</body>
</html>

That worked!

I eliminated the User-Agent the Accept headers and it still worked, so the missing Host header was the cause of my problems. I swear I’ve done this before without a Host header though.

I looked up the HTTP specification, and as described in section 5.2 of the RFC:

1. If Request-URI is an absoluteURI, the host is part of the Request-URI. Any Host header field value in the request MUST be ignored.

2. If the Request-URI is not an absoluteURI, and the request includes a Host header field, the host is determined by the Host header field value.

3. If the host as determined by rule 1 or 2 is not a valid host on the server, the response MUST be a 400 (Bad Request) error message.

Recipients of an HTTP/1.0 request that lacks a Host header field MAY attempt to use heuristics (e.g., examination of the URI path for something unique to a particular host) in order to determine what exact resource is being requested.

I could not get it to work with an absoluteURI, even using the example in the RFC. However I did find that I could ignore the Host header if I specified HTTP/1.0:

# printf "GET / HTTP/1.0\r\n\r\n" | nc 127.0.0.1 80

I also found that Apache didn’t care what the Host header was when using HTTP/1.1, just so long as something was there:

# printf "GET / HTTP/1.1\r\nHost: z\r\n\r\n" | nc 127.0.0.1 80

That’s a little odd. I did not specify a ServerName in my Apache config, but even after I specified ServerName 127.0.0.1:80 in /etc/httpd/conf/httpd.conf and restarted Apache, it still required the Host header and it still didn’t care what the content of the Host header was (so long as it was not empty).

Let’s Encrypt: certbot error “No vhost exists with servername or alias of”

It’s about time–or rather, years past time–I enabled HTTPS for this site. I decided to try Let’s Encrypt. It wasn’t as turnkey as I expected, so I’ve included some notes here in case anyone else has similar issues.

The Let’s Encrypt site suggested installing Certbot and included specific instructions for using Certbot with Apache on CentOS 7. It suggested that a single command might do the trick:

$ sudo certbot --apache

Unfortunately, I received a couple error messages and it was ultimately able to create the certificate for me, but unable to update my Apache configuration. An excerpt of the output of the certbot command is below:

Saving debug log to /var/log/letsencrypt/letsencrypt.log
No names were found in your configuration files. Please enter in your domain
name(s) (comma and/or space separated) (Enter 'c' to cancel):osric.com,www.osric.com
...
No vhost exists with servername or alias of: osric.com (or it's in a file with multiple vhosts, which Certbot can't parse yet). No vhost was selected. Please specify ServerName or ServerAlias in the Apache config, or split vhosts into separate files.
Falling back to default vhost *:443...
No vhost exists with servername or alias of: www.osric.com (or it's in a file with multiple vhosts, which Certbot can't parse yet). No vhost was selected. Please specify ServerName or ServerAlias in the Apache config, or split vhosts into separate files.
Falling back to default vhost *:443...
...
No vhost selected

IMPORTANT NOTES:
- Unable to install the certificate
...

I’m guessing it’s because my Apache virtual host configuration is in /etc/httpd/conf/vhosts/chris/osric.com instead of the expected location.

I looked at the certbot documentation hoping to find a way I could pass the certbot command the path to my virtual host configuration file. I did not find an option to do that. The logs at /var/log/letsencrypt/letsencrypt.log are fairly verbose, but it still does not indicate what files or directories it looked at to attempt to find my Apache configuration.

I noted that /etc/letsencrypt/options-ssl-apache.conf contains Apache directives. I thought maybe I could just include it in my config file using Apache’s Include directive, e.g.:

Include /etc/letsencrypt/options-ssl-apache.conf

I restarted Apache using systemctl (I know, I should be using apachectl restart instead):

$ sudo systemctl restart httpd
Job for httpd.service failed because the control process exited with error code. See "systemctl status httpd.service" and "journalctl -xe" for details.

Two problems there. One, options-ssl-apache.conf appears to be a generic file with no data specific to the host or cert. Additionally, I had just added it to a VirtualHost directive listening on port 80.

I duplicated the VirtualHost directive in my config file at /etc/httpd/conf/vhosts/chris/osric.com and made a few modifications and additions:

<IfModule mod_ssl.c>
<VirtualHost 216.154.220.53:443>
...all the directives from the port 80 VirtualHost...
SSLEngine on
SSLCertificateFile /etc/letsencrypt/live/osric.com/cert.pem
SSLCertificateKeyFile /etc/letsencrypt/live/osric.com/privkey.pem
SSLCertificateChainFile /etc/letsencrypt/live/osric.com/chain.pem
</VirtualHost>
</IfModule>

I restarted Apache:

$ sudo apachectl restart

The server restarted, but still did not respond to HTTPS requests. It didn’t appear to be listening on 443:

$ curl https://www.osric.com
curl: (7) Failed connect to www.osric.com:443; Connection refused

As a sanity check, I confirmed that mod_ssl was indeed installed:

$ yum list mod_ssl
Installed Packages
mod_ssl.x86_64 1:2.4.6-45.el7.centos @base

And I checked to confirm that Apache was loading mod_ssl:

$ cat /etc/httpd/conf.modules.d/00-ssl.conf
LoadModule ssl_module modules/mod_ssl.so

I looked at some other Apache configurations where I knew SSL was working and I noted the Listen directive:

Listen 443

I added that line to the top of my configuration file at /etc/httpd/conf/vhosts/chris/osric.com, above the VirtualHost directive. I restarted Apache and it worked!

check_http returns 403 Forbidden on fresh Nagios installation

I recently installed a Nagios server on a new CentOS 7 virtual machine (on Virtual Box).

One of the default checks included upon installation is a check on localhost to confirm that the HTTP server is responding. (First I had to install the check_http plugin, see previous post.) The Nagios web interface reports a warning for this check:

HTTP WARNING: HTTP/1.1 403 Forbidden - 5261 bytes in 0.001 second response time

This is unexpected, since I can request the same page in a browser, which returns the Apache Welcome page.

When I run the check manually I get the same result, as expected:

# /usr/lib64/nagios/plugins/check_http -H localhost
HTTP WARNING: HTTP/1.1 403 Forbidden - 5261 bytes in 0.001 second response time |time=0.000907s|;;;0.000000 size 5261B;;;0

I checked with curl:

# curl http://localhost

This returns the HTML source of the Apache Welcome page. It looks like it is working, right? But looking at the headers returned by the Apache server also shows 403 Forbidden:

# curl -I http://localhost
HTTP/1.1 403 Forbidden
...

The Apache Welcome page gives some hints about this behavior:

Are you the Administrator?

You should add your website content to the directory /var/www/html/.

To prevent this page from ever being used, follow the instructions in the file /etc/httpd/conf.d/welcome.conf.

The /etc/httpd/conf.d/welcome.conf file begins with the following comments and directive:

#
# This configuration file enables the default "Welcome" page if there
# is no default index page present for the root URL.  To disable the
# Welcome page, comment out all the lines below.
#
# NOTE: if this file is removed, it will be restored on upgrades.
#
<LocationMatch "^/+$">
    Options -Indexes
    ErrorDocument 403 /.noindex.html
</LocationMatch>

The Apache config is specifying that if there is no index page for the document root, return the Welcome page as an error document with a 403 HTTP status code.

Once I added a basic HTML file at /var/www/html/index.html, Nagios returned a success message:

HTTP OK: HTTP/1.1 200 OK - 549 bytes in 0.001 second response time

Applying per directory X-Frame-Options headers in Apache

To help prevent against click-jacking, I had applied the following to my Apache 2.2 configuration based on the suggestions described in OWASP’s Clickjacking Defense Cheat Sheet and Mozilla Developer Network’s The X-Frame-Options response header:

Header always append X-Frame-Options SAMEORIGIN

However, my site has certain pages that are included in an iframe on another site, for the purpose of displaying content on digital signage devices. After I added that header, those pages would no longer load in an iframe on the digital signage devices’ browsers.

I thought I might be able to change SAMEORIGIN to ALLOW-FROM and list both the URI of my site and the URI of the digital signage page. However, the HTTP Header Field X-Frame-Options RFC indicates:

Wildcards or lists to declare multiple domains in one ALLOW-FROM statement are not permitted

The pages I wanted to exempt from the X-Frame-Options restriction exist in their own directory, /digitalsignage, so I tried to override the X-Frame-Options header in a .htaccess file:

Header always append X-Frame-Options ALLOW-ACCESS http://example.com

That caused a 500 Server Error. This message appeared in the error logs:

.htaccess: error: envclause should be in the form env=envar

The Header directive must be malformed, but I’m am not sure how. I did not determine how to properly format the statement so as not to produce that error, although several sites have pointed out that some browsers (Chrome, Safari) do not support ALLOW-ACCESS.

I changed the .htaccess file back to SAMEORIGIN, to match what was in the main site configuration:

Header always append X-Frame-Options SAMEORIGIN

I then noted that the response header sent by the server included SAMEORIGIN twice:

Header: SAMEORIGIN, SAMEORIGIN

That’s the expected behavior when using append. It appeared only once after I changed append to set:
Header always set X-Frame-Options SAMEORIGIN

I tried using set instead of append with ALLOW-ACCESS:

Header always set X-Frame-Options ALLOW-ACCESS http://example.com

But it still produced the same 500 Server Error.

After reading the documentation for Apache’s mod_headers, I realized that unset would allow me to remove the X-Frame-Options header from the /digitalsignage directory:
Header always unset X-Frame-Options

That worked, and the pages were successfully included as iframes in a page on the digital signage company’s site.

Monitoring web server status with a shell script

Recently, my VPS (Virtual Private Server) ran into some issues where it exceeded the maximum amount of RAM allotted under my subscription. When this happens, the web server software shuts down and does not restart until I manually restart it.

This is bad. I’m not always visiting my own web site, so it could be down for days without me knowing. Although I really need to identify what is using all the RAM, in the meantime I’ll settle for a monitoring system that will notify me when the server is down.

#!/bin/bash
if curl -s --head http://osric.com/ | grep "200 OK" > /dev/null
  then 
    echo "The HTTP server on osric.com is up!" > /dev/null
  else
    echo "The HTTP server on osric.com is down!"
fi

cURL will let you retrieve a URL via the command line, and provides more options than Wget for a single URL. In this case, I used the silent switch to eliminate the status/progress output, and the head switch to retrieve only the document headers. The document header is then piped to Grep, which searches for the string “200 OK” (the HTTP status message for a successful request).

I send the result of that to /dev/null so that the output doesn’t appear on the screen.

If grep does find 200 OK, then I send a success message to /dev/null. This is largely unnecessary, but it is nice to leave in to test the script in a successful case–just remove the > /dev/null. If it doesn’t find 200 OK, then there is a problem. It might not mean, necessarily, that the web server is down, but it definitely indicates there is a problem that needs to be identified.

I added a call to this script to a crontab to run every 5 minutes. If there is no output, nothing happens. If there is output, the output is sent to me via e-mail, which, assuming I am checking my e-mail religiously, should reduce server downtime.

Apache Install and Ambiguous Errors

I installed Apache 2.2.11 on the Windows XP portion of my desktop workstation for development purposes, but I got a lot of ambiguous errors when starting from the Apache Service Monitor or the Windows start menu.

Finally, when I started Apache from the command line I got a more informative error:
(OS 10048) Only one usage of each socket address (protocal/network address/port) is normally permitted. : make_sock: could not bind to address 127.0.0.1:80 no listening sockets available, shutting down

It turns out, I had Skype running, which by default binds to ports 5520, 80, and 443. There are several solutions:
Continue reading Apache Install and Ambiguous Errors