Enabling website compression to reduce bandwidth use

After years of steady growth, markwilson.it has seen a small drop in the number of blog subscribers in recent months.  To me this means one of two things:

  • Perhaps RSS is no longer the most useful way to consume blog content (for example, I rarely read RSS feeds these days and rely instead on what friends, peers and industry contacts “talk” about on Twitter to understand what’s worth reading.  I know that many of my readers follow me on Twitter @markwilsonit too).
  • People don’t like my posts.

If I’m brutally honest with myself, it’s most likely to be the latter – after years of blogging furiously, I have seriously scaled back – partly (mostly) due to a lack of time – and so I guess some people have assumed that I’m no longer blogging (or at least not blogging enough that’s interesting enough to keep in their feed reader).  That’s a shame – and it’s not really something that I want to see becoming a continuing trend.

Graph showing bandwidth growing and exceeding quotaStrangely though, as my blog output has dropped significantly, and my subscribers have dropped (ever so slightly), my bandwidth usage has continued to rise – to the point that my hosting provider actually dropped the site returning a “bandwidth exceeded” message to readers recently (thankfully, this was resolved within a very short time of me noticing and bringing it to their attention).  When I started to look into this, I found that the biggest jump in bandwidth usage related to my upgrade from WordPress 2.2 to 2.9 in January.  I couldn’t understand why the same database, same theme, etc. running on a new version of WordPress would result in a significant increase in bandwidth usage until I saw that WordPress no longer contains an option to compress content for clients that support it, or, in WordPress parlance: “WordPress now leaves compression as a decision for the server”.

There are loads of plugins out there to enable GZIP compression on WordPress 2.5 and later; however I found that the WordPress guys are right – the simplest way is often to let the web server handle the compression – after all, I’d like to compress content (and save bandwidth) for all of my content, regardless of whether it’s served from WordPress.

I found the answer in Ryan Williamscomment on a post at Il Filosofo – by adding a few lines to my .htaccess file (after checking that my host has the deflate module enabled in Apache), I saw a 72% reduction in the bandwidth required to serve my home page.  This is the code I added:

AddOutputFilterByType DEFLATE text/html text/css text/plain text/xml application/x-javascript application/json application/x-httpd-php application/x-httpd-fastphp application/rss+xml application/atom_xml application/x-httpd-eruby
Header append Vary Accept-Encoding

In a nutshell, this tells the server to deflate various text-based document types.  Job done.  There are various tools available on the ‘net to see if a site has compression enabled (such as GIDZipTest) but my favourite is Is My Blog Working because it also tells me about some other items that I might like to look into to potentially improve the efficiency of the site.  Hopefully now I’ll see my bandwidth fall back within my quota – which should also please my hosting provider.

Installing WordPress on a Mac

The software platform which markwilson.it runs on is in desperate need of an updated but there is only me to make it happen (supported by ascomi) and if I make a mistake then it may take some time for me to get the site back online (time which I don’t have!). As a result, I really needed a development version of the site to work with.

I thought that it would also be handy if that development version of the site would run offline – i.e. if it were served from a web server on one of my computers. I could run Windows, IIS (or Apache), MySQL and PHP but as the live site runs on CentOS, Apache, MySQL and PHP it makes sense to at least use something similar and my Mac fits the bill nicely, as a default installation of OS X already includes Apache and PHP.

I should note that there are alternative stacks available for running a web server on a Mac (MAMP and XAMPP are examples); however my machine is not a full web server serving hundreds of users, it’s a development workstation serving one user, so the built in tools should be fine. The rest of this post explains what I did to get WordPress 2.7 up and running on OS X 10.5.5.

  1. Open the System Preferences and select the Sharing pane, then enable Web Access.
  2. Web Sharing in OS X

  3. Test access by browsing to the default Apache website at http://computername/ and a personal site at http://computername/~username/.
  4. Download the latest version of MySQL Community Server (I used mysql-5.1.31-osx10.5-x86_64) and run the corresponding packaged installer (for me that was mysql-5.1.31-osx10.5-x86_64.pkg).
  5. After the MySQL installation is completed, copy MySQL.PreferencePane to /Library/PreferencePanes and verify that it is visible in System Preferences (in the Other group).
  6. MySQL Preferences in OS X

  7. Launch the MySQL preference pane and start MySQL Server (if prompted by the firewall to allow mysqld to allow incoming connections, allow this). Optionally, select automatic startup for MySQL.
  8. MySQL running in OS X

  9. Optionally, add /usr/local/mysql/bin to the path (I didn’t do this, as creating a .profile file containing export PATH="$PATH:/usr/local/mysql/bin" seemed to mess up my path somehow – it just means that I need to specify the full path when running mysql commands) and test access to MySQL by running /usr/local/mysql/bin/mysql.
  10. Enable PHP by editing /etc/apache2/httpd.conf (e.g. by running sudo nano /etc/apache2/httpd.conf) to remove the # in front of LoadModule php5_module libexec/apache2/libphp5.so.
  11. Test the PHP configuration by creating a text file named phpinfo.php containing <?php phpinfo(); ?> and browse to http://localhost/~username/phpinfo.
  12. With Mac OS X, Apache, MySQL and PHP enabled, start to work on the configuration by by running /usr/local/mysql/bin/mysql and entering the following commands to secure MySQL:
    drop database test;
    delete from mysql.user where user = '';
    flush privileges;
    set password for root@localhost = password('{newrootpassword}');
    set password for root@127.0.0.1 = password('{newrootpassword}');
    set password for 'root'@'{hostname}.local' = password('{newrootpassword}');
    quit
  13. Test access to MySQL. using the new password with /usr/local/mysql/bin/mysql -u root -p and entering newrootpassword when prompted.
  14. Whilst still logged in to MySQL, enter the following commands to create a database for WordPress and grant permissions (I’m not convinced that all of these commands are required and I do not know what foo is!):
    create database wpdatabasename;
    grant all privileges on wpdatabasename.* to wpuser@localhost identified by 'foo';
    set password for wpuser@localhost = old_password('wppassword');
    quit
  15. Download the latest version of WordPress and extract it to ~username/Sites/ (i chose to put my copy in a subfolder called blog, as it is on the live site).
  16. Configure WordPress to use the database created earlier by copying wordpressdirectory/wp_config_sample.php to wordpressdirectorywp_config.php and editing the following lines:
    define('DB_NAME', 'wpdatabasename');
    define('DB_USER', 'wpuser');
    define('DB_PASSWORD', 'wppassword');
    define('DB_HOST', 'localhost:/tmp/mysql.sock');
  17. Restart Apache using sudo apachectl restart.
  18. If WordPress is running in it’s own subdirectory, copy wordpressdirectory/index.php and wordpressdirectory/.htaccess to ~/Sites/ and then edit index.php so that WordPress can locate it’s environment and templates (require('./wordpressdirectory/wp-blog-header.php');).
  19. Browse to http://localhost/~username/wordpressdirectory/wp-admin/install.php and follow the “five minute WordPress installation process”.
  20. WordPress installation

  21. After installation, the dashboard for the new WordPress site should be available at http://localhost/~username/wordpressdirectory/wp-admin/.
  22. WordPress fresh out of the box (dashboard)

  23. The site may be accessed at http://localhost/~username/wordpressdirectory/.
  24. WordPress fresh out of the box

Credits

I found the following articles extremely useful whilst I was researching this post:

Apache HTTP server on Windows Server 2008 Server Core

Microsoft’s James O’Neill wrote about how:

“Some bright spark tried running Apache on [Windows Server 2008 Server] Core and having no special Windows dependencies it works.”

I couldn’t find any references to this elsewhere on the ‘net so I had to give it a go – it’s actually really easy:

  1. Install Windows Server 2008 Server Core
  2. Map a network drive, insert a CD or some other media and copy over the Apache HTTP server installer MSI.
  3. Issue the command, msiexec /i apache_2.2.4-win32-x86-no_ssl.msi.

    Not surprisingly, the installer is unable to create application shortcuts:

    Apache HTTP Server 2.2 Installer Information

    Warning 1909. Could not create shortcut Apache Online Documentation.lnk. Verify that the destination folder exists and that you can access it.

    Apache HTTP Server 2.2 Installer Information

    Warning 1909. Could not create shortcut Help, I’m stuck!.lnk. Verify that the destination folder exists and that you can access it.

    Presumably, that’s what causes an error dialog with no message and an OK button at the end of the install.

  4. Open up the firewall with netsh firewall set portopening TCP 80 "Apache Web Server".
  5. Point a browser at the server’s IP address and the words “It works!” should be displayed.

OK, so Apache running on Windows is no big deal but if this one cross-platform application runs on Server Core with no modifications, think what else this stripped out version of Windows can be used for.

Using .htaccess to improve the user experience for a website running on an Apache server

A few weeks back, I updated two websites (which run on my ISPs’ Apache servers) to use various features which improve the experience for users of the site. These features include:

All of these features (and more) may be controlled on an Apache server using a file called .htaccess, which is intended for users who do not have access to the server configuration to make configuration changes on a per-directory basis.

In general, where access to the server configuration is available, then changes should be made at the server level; however in a hosted environment, .htaccess allows content providers to make their own configuration without affecting other users of the server.

Administrators should be made aware that enabling .htaccess on a server does incur a performance hit as Apache will look in every directory on the path for an .htaccess file, and will load the file, whether or not the directives contained within .htaccess are relevant to the HTTP request. For this reason, some ISPs may prohibit the use of .htaccess.

Microsoft Internet Information Server (IIS) does not have an equivalent to .htaccess and all configuration must be carried out using the various IIS administration tools (along with an appropriate organisational security model).

Links
Apache Tutorial: .htaccess files
Comprehensive guide to .htaccess

Denying access to certain files on an Apache web server

Under certain circumstances, it may be necessary to deny users access to various files on a web server.

For example, some directives in an Apache .htaccess file may be considered a security risk and so access to the file may be prevented using the following directives:

<files .htaccess>
order deny,allow
deny from all
</files>

The first line limits the directive to the .htaccess file (simply change the filename to limit access to other files), whilst the remaining code sets deny to have precedence over allow, denies access from all users and then terminates the directive.

Preventing listing the contents of a directory on an Apache web server

When no default document is found on a web server, depending on the server configuration, users may be able to list the files in a given directory. For Apache servers, this may be prevented on a per-directory basis by adding add an IndexIgnore directive to an .htaccess file.

The syntax is:

IndexIgnore file [file] ...

For example, IndexIgnore * will prevent listing of all files, or alternatively, individual files may be specified.

Full details may be found in the Apache HTTP Server documentation.

Redirecting clients when websites change

Whilst looking at the website statistics for my main website, I realised that many users were attempting to access pages that no longer exist on the server. Some may argue that old content should be left in place, but others will disagree and my preferred approach is to redirect requests to the new locations, or at least to provide a polite message that the document has been removed and a link to the home page! Fortunately on an Apache server, this may easily be achieved using an .htaccess directive.

Various types of redirect are available through .htaccess, using the syntax:

Redirect [status] URL-path URL

The status argument can be used to return a number of HTTP status codes:

  • permanent returns a permanent redirect status (301) indicating that the resource has moved permanently.
  • temp returns a temporary redirect status (302). This is the default and is assumed if no status argument is given, indicating to the client that the resource has moved temporarily.
  • seeother returns a “See Other” status (303) indicating that the resource has been replaced.
  • gone returns a “Gone” status (410) indicating that the resource has been permanently removed. When this status is used the URL argument should be omitted.

Other status codes can be returned by giving the numeric status code as the value of status. If the status is between 300 and 399, the URL argument must be present, otherwise it must be omitted.

For example, a temporary redirection from old file or directory to new:

Redirect /olddirectory/oldfile.html http://yoursite.com/newdirectory/newfile.html
Redirect /olddirectory http://yoursite.com/newdirectory/

or a permanent redirect:

Redirect permanent /olddirectory http://www.yoursite.com/

or redirect with error 410:

Redirect gone /oldfile.html

Full details for Apache users may be found in the Apache HTTP Server documentation.

Microsoft Internet Information Server (IIS) users can find information on redirecting requests to files directories or programs in the IIS 6.0 Operations Guide.

RFC 2616 details all HTTP status (including error) codes.

Changing the default documents for a website

My ISPs’ Apache servers are configured for index.html and index.htm to be the default documents; however since implementing server side includes in my websites I need index.shtml to be recognised as the default document.

Fortunately, this can be achieved using the following directive in the corresponding .htaccess file:

DirectoryIndex index.shtml index.html index.htm

Microsoft Internet Information Server (IIS) users can find information on setting up default documents in the IIS 6.0 Operations Guide.

Implementing custom error pages for a website

One of the features used in my website is custom error pages, which allow errors to be handled using a format that matches other documents on the site.

Apache users can configure custom error messages using .htaccess. Once pages have been created for an error message, include a directive in the .htaccess file as follows:

ErrorDocument error-code document

For example, ErrorDocument 404 /errors/404-notfound.shtml will redirect any page not found (HTTP error 404) errors to display the /errors/404-notfound.shtml document.

Full details for Apache users may be found in the Apache core features documentation.

Microsoft Internet Information Server (IIS) users can find information on configuring custom error messages in the IIS 6.0 Operations Guide.

RFC 2616 details all of all HTTP status (including error) codes.

Using server side includes in web pages

One of the features used in my website is server side includes (SSI). The SSI code allows my sites to include dynamic information which would otherwise require scripting that may not function correctly with certain browsers.

SSI is pretty simple. Apache users need to edit the .htaccess file in their web root directory to allow SSI, adding the following lines:

Options Includes
AddType text/html .shtml

AddHandler server-parsed .shtml

Some of these may not be necessary if they have been set at a higher level in the Apache configuration by the ISP or server administrator – for full details, see the Apache Tutorial: Introduction to Server Side Includes.

Microsoft Internet Information Server (IIS) users can find information on using server side include directives in the IIS 6.0 Operations Guide.

Once enabled, pages which call the server-based code should be named .shtml (or whatever file extension is defined in the configuration). Because my ISP has configured its servers for the default web page to be called index.htm or index.html, it was also necessary to change the default documents for the website.

One use of SSI to reuse common HTML code (e.g. headers, menus, etc.), but another useful application is to report document information (e.g. date last modified). There are many references on the Internet for SSI options, but one of the most useful is Craig McFetridge’s SSI page on the Carleton University website, with another being the one found on the ThinkQuest Amazing HTML website.