Bye bye Blogger – Hello WordPress!

Regular visitors to this site may have noticed that over the last 24 hours, the site has developed a totally different look and feel.

I will start posting content that isn’t about the redevelopment of this site again soon but the last couple of weeks have been pretty tough on the self-hosted IT front. First I started to have problems with e-mail delivery to certain hosts, then I accidentally dropped my domain off the Internet and at the same time, I’ve been busy moving this website to a new content management system and hosting provider.

For some time now, I’ve been working on rewriting the site using (semantically correct) XHTML and CSS but my lack of design skills (combined with a lack of spare time) were holding the project back. Ironically, it was my decision to dump Blogger as a content management system (a not insubstantial project in its own right) that has pulled everything together.

WordPress logoI’ve heard a lot of good things about WordPress, which is available as a hosted service or as software to run on a server under your own control, and I’ve chosen the latter option. In fact, over the last couple of weeks, the whole site has been migrated to a WordPress installation on ascomi‘s webspace.

It’s quite strange – most of the technology on which I’ve built my career is from Microsoft – yet I’m writing this post on a Mac and publishing it on a site which uses the Linux-Apache-MySQL-PHP (LAMP) software stack (actually, the server is running FreeBSD, so it’s really FAMP but that’s just being pedantic).

I had originally planned to run the old and new sites in parallel until all the issues were ironed out, but in practice it’s not been that straightforward as I tried to maintain the URL structure. Late last night I cut everything across to the new site but like so much on the ‘net today, Mark’s (we)Blog 2.0 is in beta!

So, why’s it been so complex? Well, so far, this is what I’ve done:

  1. Order new hosting space and upload the content from old website.
  2. Transfer/register all domain names and direct them to the new hosting provider’s name servers.
  3. Edit .htaccess to rewrite requests from secondary domain names (or without the www. prefix) to http://www.markwilson.co.uk/.
  4. Install and configure WordPress – pretty straightforward with a Fantastico scripted installation.
  5. Customise WordPress – pick a template (Andreas Viklund’s WP-Andreas01), install and activate plug-ins (WP Suicide, New Blogger Import).
  6. Commit WordPress Suicide, in the process wiping out default posts etc. but leaving behind users, user metadata and options.
  7. Migrate Blogger content to WordPress, maintaining the existing URL structure – this was the bit that scared me most and actually it was really simple (hosted WordPress users can also directly import from Blogger). First of all I needed to switch Blogger over to host my blog at Google (BlogSpot) – as all the previously-published content was still available on my server then users would not have seen any change. Next, I used the New Blogger Import plugin to suck over 700 posts and 600 comments out of BlogSpot and into WordPress. I had an issue with the formatting of the URLs but Ady Romantika very kindly updated his script for me and the updated version ran very smoothly (a couple of posts were missed but I found them from an XML sitemap generator broken links report and migrated them manually). It’s worth noting that Ady’s script also leaves the Blogger post ID as a comment in each migrated post. Once migrated, I switched Blogger back to FTP publishing and ran the old and new sites in parallel for a short time but found that to be too much work and have since removed the Blogger site from the server (an archived version of the old site will remain in place for a few weeks at least).
  8. Install and activate the Category Tagging plugin. Start to assign categories to posts and create a new post, which removed the PHP error messages that originally appeared (Warning: array_keys(): The first argument should be an array in /usr/home/username/public_html/blog/wp-content/plugins/category-tagging.php on line 95 and Warning: Invalid argument supplied for foreach() in /usr/home/username/public_html/blog/wp-content/plugins/category-tagging.php on line 96).
  9. Make more template formatting changes; deactivate WP Suicide and New Blogger Import; install and activate Fancy Archives and AdSense Deluxe; register for a WordPress API key and activate Akismet.
  10. Create new pages to replace the non-blog content from the old site (and redirect requests using .htaccess).
  11. Remove the old content and generate a new XML sitemap.

Looking back, it’s odd that one of the things holding back the redevelopment of the original site was the lack of a good design – as it happens the WordPress template that I chose is also available as a standard website template and there are loads of good-looking templates at freecsstemplates.org and at Open Source Web Design.

At the moment I’m still adding categories and tweaking the formatting (there are some CSS glitches to iron out – hence the beta tag) but I’m hoping that within a few weeks the site will be pretty much there. I also plan to go back through the template code and implement some of the CSS tips that I’ve been picking up from the old .net magazines that Alex gave me as well as two excellent books:

If all goes to plan, subscribers shouldn’t have to change any settings, the URLs for the content should be preserved, the quality of the content should improve and my search engine placement should be maintained.

Embedding video content in (X)HTML

Yesterday’s Mac vs. PC post should have been straightforward, except that it contained three video clips, each of which I wanted to embed in a standards-compliant way whilst maintaining maximum browser compatibility (i.e. ignoring the official advice from Adobe on embedding Flash content and Apple’s advice for embedding QuickTime content by avoiding the non-standard <embed> element and just using the <object> and <param> elements)… what a task that turned out to be.

To be honest, a lot of the problems probably came down to me not thinking my code was working because the preview function in my content management system (Blogger) failed to display the videos in one browser or another so, after another late night, I decided to publish and be damned. The resulting code seems to work for the Flash content on most the the browser/operating system combinations I have tried (Mozilla 1.7.13 and Firefox 1.5.06 on Linux; Internet Explorer 7.0.5730.11 on Windows XP, Safari 2.0.4 and Firefox 2.0.0.1 on Mac OS X – Intel), although I was using Adobe Flash Player 9 (I’m not sure which version is needed for the clips I used so I didn’t update the codebase attribute to reflect it – older player versions will not automatically update until I fix this) and I’m aware that there may still be some issues with the QuickTime clip (it does seem to be working on Firefox and IE though).

So, how should this be done?

Firstly, the valid Flash, video, and audio embed (object) markup post at the Web Standards Project links to some great articles which should be read, namely:

These give the background to why the <embed> element shouldn’t be used, as well as demonstrating the use of conditional comments to force certain browsers into compliance. I actually used another variation on this theme – David Grudl’s how to correctly insert Flash into XHTML – ironically this uses a negated version of Internet Explorer-specific conditional comments to force IE into ignoring code intended for other browsers!

Then, there is the issue of the changes made to the behaviour of ActiveX content in Internet Explorer, following the Eolas patent suit, as described by Robert Nyman. In my case, it doesn’t really matter if you need to activate a control to view a video clip on my blog; however there are some workarounds. Most use JavaScript (indeed Adobe recommends a JavaScript-based workaround to the changes made in Internet Explorer) and one popular alternative is to use document.write in an external JavaScript function to dynamically re-write the object embedding code. Alternatives include Geoff Stearns’ SWFObject (formerly known as FlashObject) and Bobby van der Sluis’ unobtrusive flash objects (UFO). I plumped for a version I found in a comment by Karl Rudd the Robert Nyman post that I linked earlier (Fix It uses a similar concept, also advocated by David Grudl in his post on how to avoid activation of ActiveX in IE).

After spending most of yesterday working on the object embedding, and a good part of this morning writing about it here, I think I’ll leave that one alone now, unless anyone has any better ideas to fix my code (note that the <br /> tags scattered through it were added by Blogger – not by me).

Running multiple versions of Internet Explorer side-by-side

I’ve written previously about using user agent spoofing to make Microsoft Internet Explorer (IE) 7 and Mozilla Firefox behave like legacy versions of IE but I just stumbled across this nifty method of running multiple versions of IE side-by-side. I haven’t tried it out yet and it’s unsupported by Microsoft but it sounds like an interesting idea for next time I’m doing some website development work.

Why webstats are so interesting

I’ve been writing this blog for a couple of years now. With over 500 posts, it’s consumed a scary amount of my time, but at least it’s finally something useful to do with the markwilson.co.uk domain that I first registered back in the late ’90s when I was thinking of leaving my job and working as a freelance IT contractor!

Over time I’ve tried to move towards a standards-compliant website, with lots of information that people find useful. I’ve still got some way to go – not being a developer, my code is not as standards-compliant as I’d like it to be (although the website that I have been working on recently with my buddy Alex should soon be pretty much there from a CSS and XHTML standpoint) and the usefulness of the content is totally subjective (but the blog started out as a dumping ground for my notes and that’s still its primary purpose – if others find it useful then that’s great and the trickle of Google AdSense/PayPal revenue is always welcome).

From time to time I look at the website statistics (webstats) for the site and always find them an interesting read. I can’t claim to be an expert in search engine optimisation (nor do I want to be) but the Webalizer webstats that my ISP provides are great because they let me see:

  • How many hits I’m getting (not surprisingly I get more hits after I post new articles and less when I’m busy with work or other projects) on a monthly, daily or hourly basis.
  • The HTTP response codes that Apache dishes out (200s are good, 404s are bad).
  • The top 30 URLs that are being hit (not surprisingly /blog is number 1, but it also helps to see pages that account for lots of bandwidth but not much traffic – the ones where maybe I should be looking at optimising the code).
  • Entry and exit pages (there’s a big correlation between these two, so obviously I’m not encouraging enough browsing of the site).
  • Where people visit from (mostly crawlers, although unfortunately I can see how the stats are skewed by my own broadband connection at number 18 because I use the site so much to look things up for myself).
  • Who is referring visitors to me.
  • What people are looking for when they get referred here.
  • What browser people are using.
  • Where people are visiting from.

This information lets me understand which pages are most popular as well as highlighting technical issues with the site but it doesn’t always go far enough.

Some time ago, I applied for a Google Analytics (formerly Urchin) account and last week I finally set it up. Whilst the Webalizer stats are still useful in many ways for me as a website administrator, the Google Analytics information is much richer. For example, I no longer need my ClustrMaps map because I can see a geomap along with my pages per visit ratio, how many visitors return and who sends them here. For marketeers there are tools to track campaigns and see how they are progressing, and I can also find a whole load of technical information about my visitors (e.g. connection speed used, browser, platform, java and flash capabilities, language, screen colours and resolution – all of which can help in decisions as to what features should be incorporated in future). There’s also information about how long visitors spent viewing a particular page (in fact there are so many reports that I can’t list them all here).

So, what have I learned from all of this – well, from Google Analytics I can see that most of you have a broadband connection, are using Windows (94%), IE (65%, vs. 29% for Firefox), view the site in 32-bit colour and have a screen resolution of 1024×768. That means that most of you should be able to see the content as I intended. I also know that people tend to visit a single page and then leave the site and that Google is my main referrer. Webalizer tells me that Apache gave a strange error 405 to someone this month (somebody obviously tried to do something they shouldn’t be trying to do) but also some 404s (so maybe I have some broken links to investigate). I can also tell that (for the IP addresses that could be resolved) most of my visitors were from Western Europe or the United States but hello to everyone who has visited this month from Australia, China, India, Japan, Malaysia, New Zealand, Pakistan, Saudi Arabia, Singapore, South Africa, South Korea, Thailand, and the United Arab Emirates.

I hope this has illustrated how website statistics can be useful, even for small-time website operators like me and I encourage you to check out Webalizer (which reads Apache web server log files) and Google Analytics (which needs some JavaScript to be added to the website code). Alternatives (e.g. for IIS users) include AWstats and Christopher Heng also has a list of free web statistics and web log analysers on his site.

Why have some of my PageRanks dropped?

It’s well known that the Google index is based on the PageRank system, which can be viewed using the Google Toolbar.

Google page rank

But something strange has happened on this blog – the main blog entry page has a PageRank of 5, the parent website has a PageRank of 4, but the PageRanks for most of the child pages have dropped to zero.

Now I know that posts have been a bit thin on the ground this month (I’ve been busy at work, as well as working on another website), but I can’t understand why the rankings have dropped. I found this when I was using the site search feature to find something that I knew I’d written, but it didn’t come up. Entering site:markwilson.co.uk as a Google search brings back 258 results, but this blog has nearly 500 entries, plus archive pages and the parent website – where have all the others gone? Some recent entries, like the one on Tesco’s VoIP Service, have a PageRank of zero but still come back on a search (at the time of writing, searching for Tesco VOIP brings back my blog as the third listed entry). Others just don’t appear in search results at all. Meanwhile some old posts have PageRanks of 2 or 3.

I know (from my website statistics) that Googlebot is still dropping by every now and again. So far this month it accounts for 3319 hits from at least 207 visits – I just can’t figure out why so many pages have a PageRank of zero (which seems to be a penalty rank, rather than “not ranked yet” marking).

I don’t deliberately try to manipulate my search rankings, but steady posting of content has seem my PageRank rise to a reasonable level. I just don’t understand why my second-level pages are not appearing in the index. The only thing I can think of is that it’s something to do with my new markwilson.it domain, which is linked from this blog, and which redirects back to a page at markwilson.co.uk (but that page has no link to the blog at the time of writing).

I’ve just checked the syntax of my robots.txt file (and corrected some errors, but they’ve been there for months if not years). I’ve also added rel="nofollow" to any links to the markwilson.it domain. Now, I guess I’ll just have resubmit my URL to Google and see what happens…

Geotagging websites

A couple of weeks back, a little GeoURL icon was added the side panel of this blog (underneath the feedmap). GeoURLs are a way of encoding location information within a website.

For example, whilst I try to make the information on markwilson.co.uk applicable to a wider audience, inevitably some of it is UK-specific. Geolocation by IP address can help to match users with localised content, but it does have some issues. A DNS lookup on markwilson.co.uk tells me that it is an alias for hp.force9.net (212.159.8.1). Using the CAIDA Internet geographic database (NetGeo) to look up 212.199.8.1 tells me that this address is actually allocated to Force 9 Internet in Sheffield, UK (latitude 53.38, longitude -1.50) but that’s not much help for localising services as that’s where my ISP is registered (it may not even be the location of their servers) and I’m nowhere near there. In addition, the CAIDA database is also no longer maintained, so other tools may be more appropriate, but of far more interest is the location to which the site’s information applies.

For locating an Internet site or service (such as a location-specific web page or RSS feed), geolocation using geotags is probably more applicable. For markwilson.co.uk, the actual code which identifies the geoURL (the geo-structure tag or geotag) is found in the HTML head and reads:

<meta name="geo.position" content="52.1542;-0.7122" />
<meta name="geo.region" content="GB" />
<meta name="geo.placename" content="Olney" />

These geotags can be generated using the geotag generator (I found out the latitude and longitude using multimap). It’s also possible to use an ICBM tag such as <meta name="ICBM" content="52.1542, -0.7122" /> but geo-structure tags are newer and also include region (using the ISO-3166-1 country names and region names specifications) and placename information.

GeoURL is a location-to-URL reverse directory (although at the time of writing it only lists 211,991 sites). A GeoURL lookup on markwilson.co.uk returns a list of sites located nearby and although it’s of limited use at the moment, as more and more sites are geotagged, information like this will become more and more relevant, particularly when combined with services such as Google Maps.

Finding that elusive Microsoft support site

The Microsoft website is not always the easiest place to find things – especially when in the middle of a crisis.

Blake Hall has published a comprehensive list of Microsoft support resources on the Industry Insiders blog.

Well worth checking out next time you’re researching an issue, if only for the advice on how best to search the knowledge base.

Practical advice for webmasters

Last year I blogged some advice on spam-proofing a website from Thomas Brunt’s Outfront site. Although Outfront is billed as a “FrontPage learning community” (urgh!), it also includes some practical advice for web authoring techniques such as looking at writing pages which are compliant with web standards, advanced use of mailto links and preparing photos for publication on a website, as well as some topics I have covered in this blog like custom error pages and the use of .htaccess files.

The site also features a good description of RSS (although most people familiar with blogs will also know RSS!) and design tips for producing a good website.

Using server side includes in web pages served from IIS

Last year I blogged about using server side includes in web pages. My SSI code has all been working well on my ISP’s Apache servers, but my development server runs under IIS 5 on Windows 2000. Even with the default document list set to include index.shtml, I was getting HTTP 404 errors for pages that I knew existed. I checked that I had application mappings in place for .shtml files, but what none of the documentation told me was that I needed to change the executable path for .shtml from %systemroot%\System32\inetsrv\404.dll to %systemroot%\System32\inetsrv\ssinc.dll. Once I had made that change, everything jumped into life and my dynamic pages were served as expected.

Denying access to certain files on an Apache web server

Under certain circumstances, it may be necessary to deny users access to various files on a web server.

For example, some directives in an Apache .htaccess file may be considered a security risk and so access to the file may be prevented using the following directives:

<files .htaccess>
order deny,allow
deny from all
</files>

The first line limits the directive to the .htaccess file (simply change the filename to limit access to other files), whilst the remaining code sets deny to have precedence over allow, denies access from all users and then terminates the directive.