Tag: Website Development

  • Website development tips and tricks

    About a year ago, I started the redevelopment of this site to use WordPress as my CMS, in the process aiming to make the site XHTML and CSS standards-compliant. It was a big job and, as this blog is really just a hobby that I put most of my spare time into, it took some time. For the last year I’ve had a draft post part-written about some of the things I found along the way and now, as I’m about to embark on another facelift (let’s call it markwilson.it v2.5), I thought it was about time I finished that post – or at least published the collection of notes I made during the development of markwilson.it v2.0. I hope some of the information here is useful to others setting off on the same route and, of course, if anyone knows better, feel free to leave a comment.

    First of all, sizing my text. No pixel sizing here – I use font-size:medium; for the body and then percentages to resize text elsewhere (ems would achieve the same result as percentages but this method was recommended and avoids potential issues with Internet Explorer). Then, in order to keep the text size pretty much consistent across browsers, I set the font-size:76%; on the #wrap. Using a font-size of medium as a start point is probably not essential – that’s the default for most browsers anyway but it does at least give me a known start point. I could go further and implement the A List Apart hack for Internet Explorer (IE) 5 on Windows but, as IE5 users accounted for just 0.13% of my visitors in the last month, it really doesn’t justify the effort (I’m waiting for the day I can say the same about IE6 but that’s some time off yet).

    I do most of my development using Mozilla Firefox (on a Mac), then test on Safari (MacOS), IE6/7 (Windows) and Mozilla (Linux). I’m afraid that I don’t bother about other browsers (except some compatibility testing for mobile browsers) but that covers the vast majority of visitors. I also test the site with the page style disabled (to check that document flow is correct and that site is still usable) and repeat (with style applied) to see the effect of the site without images, and without scripting. Basically, if it all degrades well, then I’m happy. The only downside (for me) of browsing without JavaScript support is the absense of Google AdSense or Analytics.

    When trying to work out what is working, and what is not, the accessibility-checking favelets can be useful but I actually prefer Chris Pederick’s Web Developer extension for Firefox. When I first came across this handy extension, I thought Chris’ name was familiar and from his resume I see that we were both working at ICL at the same time in the late 1990s… maybe that’s it). Incidentally, the Web Accessibility Tools Consortium (WAT-C) has produced toolbars for IE and Opera.

    Validating XHTML and CSS using the W3C tools is good for checking out the quality of your code (but you do need to keep checking back – I’ve just noticed that some of the third-party code I’ve added has broken my XHTML). For blogrolls, an OPML validator is available.

    Having moved to a hosting provider where I have access to the server logs, the first thing I noticed was the volume of errors like this one:

    [timestamp] [error] [client ipaddress] File does not exist: /usr/home/username/public_html/favicon.ico

    I added a favourites icon file to my web server’s root folder and instantly saw a drop in the number of errors (there are plenty of online generators available but I used Favicon Maker – largely because the site looked good… I find it remarkable how many people offering web design tips don’t appear to have looked at their own site recently… although I do realise that there is a difference between design and code and I also realise I’m leaving myself open to criticism here too). I also added this line of code because not all browsers will look for the presence of the favicon.ico file:

    <link rel="shortcut icon" href="http://example.com/favicon.ico" type="image/vnd.microsoft.icon">

    Incidentally, Information Gift has a useful summary of how various browsers treat the favourites icon.

    On a similar note, I recently added a 57×57 apple-touch-icon.png file to the site to support webclips on the iPhone. It may be a minority platform but it’s one I use!

    A few more resources that I’ve found useful whilst developing the site include:

    Last, but by no means least, I’d like to mention my buddy Alex, who provides the hosting service for the site and is also my first port of call for any WordPress/web development advice.

  • Backing up and restoring a WordPress database

    Last year, Alex and I redeveloped the website for a campaign group close to the town where I live. Even though the design was pretty plain (neither of us are designers), from a web development standpoint the site was pretty good – written in semantically correct XHTML and CSS, using PHP for server-side scripting. Alex even wrote a neat navigation bar system and all in all it was a pretty good site.

    Unfortunately, I ran out of time for updating the site content and handing it over to a non-technical person to produce new content was never going to be straightforward. I needed something with a user-friendly content management system and so I rewrote the site to use WordPress with pages for the static content and blog posts for the front page news.

    Moving all of the content to WordPress didn’t take too long – I still need to sort out a few dead links and develop a decent template, but one of the beauty of WordPress is the ability to customise a site on-the-fly so I can keep on working on those after making the site live – the important thing for me was to let other people create new content without needing any code.

    Even so, when the time came to launch the new site, I did need to move my WordPress database from the /dev subdirectory to the root (it is possible to install WordPress in a subdirectory and still let it be accessed from the site root; however I chose not to take that path).

    Although WordPress includes an export/import function that would let me export all of the posts via an XML file, then import them to a new WordPress installation, it doesn’t handle all of the database changes (new users, configuration, etc.) and it seems that the best way is to back up the database and then restore it to a new location. Whilst the WordPress Codex provides various methods for backing up a database, the clearest instructions are actually found in a link at the bottom of the Codex article to the Clearpoint Systems blog post on how to backup and restore a WordPress database (using phpMyAdmin). Then, because the database tables will refer to the old location, it is necessary to update the siteurl and home entries in the wp_options table.

    It took just a few seconds to backup the database, restore it to a new WordPress installation, and make the changes necessary to make the site accessible again. Finally, all that was required then was to upload any edited theme files and plugins to the appropriate locations in the WordPress folder structure.

  • Attempting to reduce my website’s bandwidth usage

    This website is in a spot of trouble. Over the last few months, I’ve seen the bandwidth usage grow dramatically although it seems to have grown faster than the number of subscribers/readers. We’re not talking vast volumes here and my hosting provider has been very understanding, but even so it’s time to do something about it.

    So I had a think, and came up with three options:

    1. Don’t write anything. Tried that for half of June (when I was on holiday). No noticeable change in the webstats!
    2. Write rubbish. It’s debatable as to whether that’s a continuation of the status quo.
    3. Drp ll th vwls s f wrtng txt mssgs.
    4. Optimise the site to reduce bandwidth usage, without a major rewrite. Yeah! That sounds like a challenge.

    So, option four sounded like the best course of action. There are two main elements to consider in this:

    1. Site performance (i.e. how fast pages load).
    2. Bandwidth usage.

    As far as I can tell, my site performance is not blindingly fast but it’s OK. I could use something like the WP-Cache plugin to cache content but, although that should reduce the load on the server, it won’t actually decrease my bandwidth usage. In fact it might increase it as I’d need to turn off HTTP compression.

    That led me to concentrate on the bandwidth issues. This is what I tried (based mostly on Jeff Atwood’s experience of reducing his site’s bandwidth usage):

    • Shut out the spammers. Akismet had blocked over 7000 spam messages in 15 days and each of these would have loaded pages and leeched some bandwidth in the process. Using the Bad Behaviour plugin started to reduce that, blocking IP known spammers based on their IP address. Hopefully it hasn’t blocked legitimate users too. Please let me know if it has blocked you (assuming you can read this!).
    • Compress the content. Check that HTTP compression is enabled for the site (it was). According to Port 80 Software’s real-time compression check, this both reduces my file size by about 77% and decreases download times by 410%. It’s also possible to compress (i.e. remove whitespace and comments) in CSS and JavaScript (as well as tools for HTML compression) but in my opinion, the benefits are slim (as these files are already compressed with HTTP compression) and code readability is more important to me (although at 12.7KB, my main stylesheet is a little on the bloated side of things – and it is one file that gets loaded frequently by clients).
    • Optimise the graphics. I already use Adobe Photoshop/ImageReady to save web optimised graphics but I used a Macintosh utility that Alex pointed me to called Ping to optimise the .PNG files that make up about half the graphics on this site (I still need to do something with the .JPGs and .GIFs) and that shaved just over 10% off their file size – not a huge reduction but it should help.
    • Outsource. Switching the main RSS feed to FeedBurner made me nervous. I’d rather have all my readers come to my domain than to one over which I have no control but then again FeedBurner gives me some great analysis tools. Then I found out about Feedburner’s MyBrand feature (previously a chargable option but free since Feedburner was acquired by Google) which lets me use feeds.markwilson.co.uk (i.e. a domain under my control) instead of feeds.feedburner.com. Combined with the FeedSmith plugin, this has let me keep control over all of my feeds. One more option is to use an external image provider (Jeff Atwood recommends Amazon S3 but I haven’t tried that yet).

    At the moment it’s still early days but I do have a feeling that I’m not eating up my bandwidth quite as quickly as I was. I’ll watch the webstats over the coming days and weeks and hope to see a downward trend.

  • Find out how a site is built

    This may not be news as it’s pretty high on Digg right now but it may be a useful resource to remember. BuiltWith is a web page technology profiler – a site to find out which technologies have been used to create a website. I even learnt a few things about the underlying technologies for my own site!

  • Adding a meaningful description to web pages

    One of the things that I noticed whilst reviewing the Google results for this site, was how the description for every page was shown using the first text available on the page – mostly the alternative text for the masthead photo (“Winter market scene from the small town of Porjus in northern Sweden – photograph by Andreas Viklund, edited by Alex Coles.”):

    Screenshot showing duplicate descriptions

    Clearly, that’s not very descriptive and so it won’t help much with people finding my site, linking to me, and ultimately improving the search engine placement for my pages, so I need to get a decent description listed for each page.

    The WordPress documentation includes a page on meta tags in WordPress, including an explanation as to why they aren’t implemented by default (my template did include a meta description for each page which included the weblog title and tagline though). Even though meta tags are not a magic solution to search engine placement, I wanted to find a way to add a meaningful description for each page using <meta description="descriptionofcontent"> and also <meta keywords="pagecontext"> (although it should be noted that much of the available advice indicates that major search engines ignore this due to abuse). Fortunately there is a WordPress plugin which is designed to make those changes – George Notoras’ Add-Meta-Tags. There’s plenty of speculation as to whether or not Google actually uses the description meta tag but recent advice seems to indicate that it is one of many factors involved in the description shown in search results (although it will not actually affect positioning).

    I already had meta tags in place for content-type, robots, and geolocation but I added some more that I was previously using HTML comments for:

    <meta http-equiv="content-language" content="en-gb" />
    <meta name="author" content="Mark Wilson" />
    <meta name="generator" content="WordPress" />
    <meta name="publisher" content="markwilson.it" />
    <meta name="contact" content="webmaster@markwilson.co.uk" />
    <meta name="copyright" content="This work is licenced under the Creative Commons Attribution-Non-Commercial-Share Alike 2.0 UK: England & Wales License. To view a copy of this licence, visit http://creativecommons.org/licenses/by-nc-sa/2.0/uk/ or send a letter to Creative Commons, 559 Nathan Abbott Way, Stanford, California 94305, USA" />

    Incidentally, a comprehensive list of meta tags and an associated FAQ is available at Andrew Daviel’s Vancouver webpages.

    After checking back a couple of weeks later, the same search returns something far more useful:

    Screenshot showing duplicate descriptions

    Unfortunately my PageRank has dropped too, and it’s possible that the duplicate entries for http://www.markwilson.co.uk/ and https://www.markwilson.co.uk/blog/ are causing the site to be penalisedGoogle’s Webmaster guidelines say “don’t create multiple pages, subdomains, or domains with substantially duplicate content. The presence of those duplicate entries is actually a little odd as checking the server headers for http://www.markwilson.co.uk/ reveals an HTTP 301 response (moved permanently), redirecting to https://www.markwilson.co.uk/blog/.  Of course, it could be down to something entirely different, as PageRank is updated infrequently (there’s more information and links to some PageRank anaylsis tools at RSS Pieces but I use Page Rank Checker) and there have been a lot of changes to this site of late… only time (and building the volume of backlinks to https://www.markwilson.co.uk/blog/) will tell.

  • The elements of meaningful XHTML

    I’m really trying to use good, semantic, XHTML and CSS on this website but sometimes it’s hard work. Even so, the validation tools that I’ve used have helped me to increase my XHTML knowledge and most things can be tweaked – I’m really pleased that this page current validates as both valid XHTML 1.1 and CSS2.

    Last night I came across an interesting presentation by Tantek Çelik (of box model hack fame) that dates back to the 2005 South by SouthWest (SxSW) interactive festival and discusses the elements of meaningful XHTML. Even though the slidedeck is no substitute for hearing the original presentation, I think it’s worth a look for a few reasons:

    • It taught me about some XHTML elements that I wasn’t familiar with (e.g. <address>) and others I’m just getting to grips with (e.g. <cite>).
    • It highlighted some techniques which abuse the intended meaning for XHTML elements and how the same result should be achieved using semantically correct XHTML.
    • It introduced me to extending XHTML with microformats for linked licenses, social relationships, people, events, outlines and even presentations (thanks to the links provided by Creative Commons and the XHTML Friends Network, I already use linked licenses and social relationships on this site but now I understand the code a little better).
    • It reinforced that I’m doing the right thing!
  • Modifying wp-mobile to create content that validates as XHTML-MP

    Yesterday, I wrote a post about using Alex King’s WordPress Mobile Edition plugin (wp-mobile) to generate WordPress content formatted for the mobile web. wp-mobile makes the code generation seamless; however I did have a few issues when I came to validating the output at the ready.mobi site. After a few hours (remember, I’m an infrastructure bod and my coding abilities are best described as weak) I managed to tweak the wp-mobile theme to produce code that validates perfectly.

    Screen grab from the ready.mobi report for this website

    The changes that I made to the wp-mobile index.php file can be seen at Paul Dixon’s PHP pastebin but are also detailed below:

    1. Add an XHTML Mobile Profile (XHTML-MP) document type descriptor: <!DOCTYPE html PUBLIC "-//WAPFORUM//DTD XHTML Mobile 1.0//EN" "http://www.wapforum.org/DTD/xhtml-mobile10.dtd">. Incidentally, I didn’t include an XML declaration (which looks like: <?xml version="1.0" charset="UTF-8" ?>) as it kept on generating unexpected T_STRING PHP errors and it seems that it is not strictly necessary if the UTF-8 character set is in use:

      “An XML declaration is not required in all XML documents; however XHTML document authors are strongly encouraged to use XML declarations in all their documents. Such a declaration is required when the character encoding of the document is other than the default UTF-8 or UTF-16 and no encoding was determined by a higher-level protocol.”

      W3C recommendation for XHTML 1.0

    2. Add some caching controls: <?php header ("Cache-Control: max-age=10 "); ?>. 10 seconds is a little on the low side but it can be changed later and it means that the caching is unlikely to affect testing of subsequent changes.
    3. Remove <meta name="HandheldFriendly" value="true" />: this code doesn’t appear to do anything and is not valid XHTML-MP – media="handheld" can be used instead when linking the stylesheet (see below).
    4. Change the stylesheet link method: although <style type="text/css">@import url("<?php print(get_stylesheet_uri()); ?>"); </style> should work, I found that the validator was only completely satisfied with the form <link href="<?php print(get_stylesheet_uri()); ?>" rel="stylesheet" type="text/css" media="handheld" />.
    5. Provide access keys using accesskey="key" inside the <a> tag for each of the main menu items.
    6. Surround <?php ak_recent_posts(10); ?> with <ul> and </ul> tags – this bug took the most time to track down and was the final change necessary to make the markup validate as XHTML-MP.

    I also made some minor changes in order to fit my own page design (adding a legal notice, etc.) but in order to get the elusive 100% in the report for this site, there was one minor tweak required to style.css: removal of the height: 1px; rule for <hr>. I understand why it was there but the validator didn’t like it, suggesting that relative units should be used instead (I would argue that 1px is far more logical for a horizontal rule than the use of relative units but this change resulted in another pass on the report).

    Right, enough of these mobile diversions – I’d better focus my development efforts on getting the rest of this site to be fully XHTML compliant…

  • Publishing WordPress content on the mobile web

    A few nights back, I was reading a .net magazine article about developing websites enabled for mobile content.

    As my blog is written primarily for technical people, it seems logical to assume that a reasonable proportion of its readers could make use of access from a mobile device, especially as the magazine article’s author, Brian Fling, believes that:

    “[the mobile web] will revolutionize the way we gather and interact with information in the next three years”

    Web 2.0 Expo: From Desktop to Device: Designing the Ubiquitous Mobile Experience

    Basically, the catalyst for this comes down to a combination of increasing network speeds and mobile services, combined with a falling cost in the provision of data services.

    It seems that there are basically two schools of thought when it comes to designing mobile content for the web: some (most notably the W3C) believe that content should be device agnostic; whilst that approach is perfectly laudable (a mobile browser is, after all, just another form of browser) others believe that the whole point of the mobile web is that device-specific functionality can be used to provide services that wouldn’t otherwise be available (e.g. location-based services).

    Brian’s .net magazine article explains that there are for major methods of mobile web publishing:

    1. Small screen rendering
    2. Programatically reformatting content
    3. Handheld style-sheets
    4. Mobile-specific site.

    As we work down the list, each of these methods is (potentially) more complex, but is also faster. Luckily, for WordPress users like myself, Alex King has written a WordPress Mobile Edition plugin, which applies a different stylesheet for mobile browsers, publishing a mobile friendly site. Using the Opera Mini live demo to simulate a mobile browser, this is what it did for my site:

    This website, viewed in a simulated mobile phone browserThe mobile-optimised version of this website, viewed in a simulated mobile phone browser

    The first image shows the content as it would be rendered using the default, small screen rendering – not bad but not exactly ideal on a small screen – but the second image is using the WordPress Mobile Edition plugin to display something more suitable for the mobile web. Not only is the display much simpler and easy to navigate on a handset, but the page size has dropped from 28KB to 1KB. Consequently, I was a bit alarmed when I used the ready.mobi site to generate a report for this site, as the site only scored 3 out of 5 and was labelled as “will possibly display poorly on a mobile phone”. Even so, the user experience on my relatively basic (by modern standards) Nokia 6021 was actually quite good (especially when considering that the device is not a smartphone and it failed the handheld media type test) whereas viewing the normal (non-mobile) version generated a “memory full” error.

    So, it seems that preparing a WordPress site for the mobile web is actually pretty simple. I have a couple of tweaks to make in order to improve the ready.mobi test results (quick fixes ought to include support for access keys and working out why the page heading is being tagged as <h3> when the standard site uses an <h1> tag) but there is certainly no need for me to develop a separate site for mobile devices, which is just as well as it’s taking me ages to finish the redevelopment of the site (and I can save myself a few quid by not registering the markwilson.mobi domain)!

    Links
    The following links may be useful to anyone who is looking at developing content for the mobile web:

    It may also be worth stopping by at Keni Barwick’s blog on all things mobile.

  • Coding horror

    I just stumbled upon Jeff Atwood’s Coding Horror blog and it’s very interesting reading (even for those of us who write very little code). The article that I found was commenting on Jakob Nielsen’s latest tome on web usability. Although Nielsen makes some valid points, the comments are worth a read as they highlight some of the real compromises that website designers and website developers have to make.

    I’m sure I could lose many hours reading Jeff’s writing but they all seem well-informed, to the point and interesting… these were just a few of the posts that grabbed my attention this afternoon:

    • When in doubt, make it public looks at how Web 2.0 is really just creating websites out of old Unix commands and that the new business models are really about taking what was once private and making it public!
    • SEOs: the new pornographers of the web looks at how much of the real search engine optimisation is just good web development and that many of the organisations focusing on SEO are all about money and connections – whether or not the assertions that Jeff makes in his post are correct, it’s an interesting view and certainly seems to have a lot of SEOs fighting their corner.
    • Why does Vista use all my memory? looks at how Windows Vista’s approach to memory management (a feature called SuperFetch) and how grabbing all the available memory to use it as a big cache is not necessarily a bad thing.
  • Removing duplicate search engine content using robots.txt

    Here’s something that no webmaster wants to see:

    Screenshot showing that Google cannot access the homepage due to a robots.txt restriction

    It’s part of a screenshot from the Google Webmaster Tools that says “[Google] can’t current access your home page because of a robots.txt restriction”. Arghh!

    This came about because, a couple of nights back, I made some changes to the website in order to remove the duplicate content in Google. Google (and other search engines) don’t like duplicate content, so by removing the archive pages, categories, feeds, etc. from their indexes, I ought to be able to reduce the overall number of pages from this site that are listed and at the same time increase the quality of the results (and hopefully my position in the index). Ideally, I can direct the major search engines to only index the home page and individual item pages.

    I based my changes on some information on the web that caused me a few issues – so this is what I did and by following these notes, hopefully others won’t repeat my mistakes; however, there is a caveat – use this advice with care – I’m not responsible for other people’s sites dropping out of the Google index (or other such catastrophes).

    Firstly, I made some changes to the section in my WordPress template:







    Because WordPress content is generated dynamically, this tells the search engines which pages should be in, and which should be out, based on the type of page. So, basically, if this is an post page, another single page, or the home page then go for it; otherwise follow the appropriate rule for Google, MSN or other spiders (Yahoo! and Ask will follow the standard robots directive) telling them not to index or archive the page but to follow any links and additionally, for Google not to include any open directory information. This was based on advice from askapache.com but amended because the default indexing behaviour for spiders is to index, follow or all so I didn’t need to specify specific rules for Google and MSN as in the original example (but did need something there otherwise the logic reads “if condition is met donothing else dosomething” and the donothing could be problematic) .

    Next, following fiLi’s advice for using robots.txt to avoid content duplication, I started to edit my robots.txt file. I won’t list the file contents here – suffice to say that the final result is visible on my web server and for those who think that publishing the location of robots.txt is a bad idea (because the contents are effectively a list of places that I don’t want people to go to), then think of it this way: robots.txt is a standard file on many web servers, which by necessity needs to be readable and therefore should not be used for security purposes – that’s what file permissions are for (one useful analogy refers to robots.txt as a “no entry” sign – not a locked door)!

    The main changes that I made were to block certain folders:

    Disallow: /blog/page
    Disallow: /blog/tags
    Disallow: /blog/wp-admin
    Disallow: /blog/wp-content
    Disallow: /blog/wp-includes
    Disallow: /*/feed
    Disallow: /*/trackback

    (the trailing slash is significant – if it is missing then the directory itself is blocked, but if it is present then only the files within the directory are affected, including subdirectories).

    I also blocked certain file extensions:

    Disallow: /*.css$
    Disallow: /*.html$
    Disallow: /*.js$
    Disallow: /*.ico$
    Disallow: /*.opml$
    Disallow: /*.php$
    Disallow: /*.shtml$
    Disallow: /*.xml$

    Then, I blocked URLs that include ? except those that end with ?:

    Allow: /*?$
    Disallow: /*?

    The problem at the head of this post came about because I blocked all .php files using

    Disallow: /*.php$

    As https://www.markwilson.co.uk/blog/ is equivalent to https://www.markwilson.co.uk/blog/index.php then I was effectively stopping spiders from accessing the home page. I’m not sure how to get around that as both URLs are serving the same content, but in a site of about 1500 URLs at the time of writing, I’m not particularly worried about a single duplicate instance (although I would like to know how to work around the issue). I resolved this by explicitly allowing access to index.php (and another important file – sitemaps.xml) using:

    Allow: /blog/index.php
    Allow: /sitemap.xml

    It’s also worth noting that neither wildcards (*, ?) nor allow are valid robots.txt directives and so the file will fail validation. After a bit of research I found that the major search engines have each added support for their own enhancements to the robots.txt specification:

    • Google (Googlebot), Yahoo! (Slurp) and Ask (Teoma) support allow directives.
    • Googlebot, MSNbot and Slurp support wildcards.
    • Teoma, MSNbot and Slurp support crawl delays.

    For that reason, I created multiple code blocks – one for each of the major search engines and a catch-all for other spiders, so the basic structure is:

    # Google
    User-agent: Googlebot
    # Add directives below here

    # MSN
    User-agent: msnbot
    # Add directives below here

    # Yahoo!
    User-agent: Slurp
    # Add directives below here

    # Ask
    User-agent: Teoma
    # Add directives below here

    # Catch-all for other agents
    User-agent: *
    # Add directives below here

    Just for good measure, I added a couple more directives for the Alexa archiver (do not archive the site) and Google AdSense (read everything to determine what my site is about and work out which ads to serve).

    # Alexa archiver
    User-agent: ia_archiver
    Disallow: /

    # Google AdSense
    User-agent: Mediapartners-Google*
    Disallow:
    Allow: /*

    Finally, I discovered that Google, Yahoo!, Ask and Microsoft now all support sitemap autodiscovery via robots.txt:

    Sitemap: http://www.markwilson.co.uk/sitemap.xml

    This can be placed anywhere in the file, although Microsoft don’t actually do anything with it yet!

    Having learned from my initial experiences of locking Googlebot out of the site, I checked the file using the Google robots.txt analysis tool and found that Googlebot was ignoring the directives under User-agent: * (no matter whether that section was first or last in the file). Thankfully, posts to the help groups for crawling, indexing and ranking and Google webmaster tools indicated that Googlebot will ignore generic settings if there is a specific section for User-agent: Googlebot. The workaround is to include all of the generic exclusions in each of the agent-specific sections – not exactly elegant but workable.

    I have to wait now for Google to re-read my robots.txt file, after which it will be able to access the updated sitemap.xml file which reflects the exclusions. Shortly afterwards, I should start to see the relevance of the site:www.markwilson.co.uk results improve and hopefully soon after that my PageRank will reach the elusive 6.

    Links

    Google webmaster help center.
    Yahoo! search resources for webmasters (Yahoo! Slurp).
    About Ask.com: Webmasters.
    Windows Live Search site owner help: guidelines for succesful indexing and controlling which pages are indexed.
    The web robots pages.