Tag Archives: Blogging

Site notices Technology Waffle and randomness

Fighting back to the spammers: charging for removal of blog spam links…

Right from (almost) the start, this blog has suffered from spam. I guess it just goes with the territory but I’ve written in the past about people who’ve left spam comments and then found Google’s index quotes them out of context or tech companies criticising their competitors “anonymously” in blog comments.

Even when I was helping my then-CTO to raise his social media presence, my employer’s PR agency was encouraging the use of comments on blogs to generate backlinks and now the tide is turning as Google cracks down on low-quality backlinks.

As a result, I’m getting an increasing number of emails from digital agencies including phrases like the one below:

“I’m writing to request the removal of a link to my clients’ [sic] site which is located at the following page:”

They’re (or their clients are) wasting my time, so I reserve the right to charge for removing such links.

The irony is that, over the last few years, Google’s index changes have penalised original content creators like myself in favour of corporate websites and this blog has just a fraction of the traffic it once enjoyed (oh, those were the days)…

Would be blog spammers at this site should check out the Rules for Comments.

Technology Waffle and randomness

Major change to my role at Fujitsu

Over the last few weeks, I’ve dropped a few hints online about a change in my job at Fujitsu. Some, eagle-eyed LinkedIn connections saw me update my profile a couple of weeks ago to add a new position – as Fujitsu’s Head of Practice and Lead Architect for Messaging in the UK and Ireland – and today is my first day (although I’ve been picking up parts of the role for a few weeks now).

After almost three years in a strategy role, supporting two Chief Technology Officers with very different areas of focus, it’s time for a new challenge. My new role is a mixture of line management and practicing consultant so I’m actually returning to my technical roots whilst gaining additional experience of directly leading a team and being responsible for growing part of our business (including some challenging financial targets). Added to that, as messaging moves into our Business and Application Services service line, this is an opportunity for me to work in an applications business whilst building on many years of infrastructure experience.  There’s also some pretty exciting stuff going on with Microsoft (I’m not sure that’s announced publicly, so I won’t say anything more here) – but it’s a great time for me to be making this move.

Messaging is not entirely new for me – from the mid-1990s through to the mid-2000s, I worked on a number of NT and Microsoft Mail/Exchange migrations/implementations and I was one of the consultants working on ICL’s partner stand at the Microsoft  Exchange 4.0 UK launch roadshow.  In addition, one of my technical career highlights was the work I did at Polo Ralph Lauren, to design and project-manage a migration from Novell Netware to Microsoft Windows Server, from Novell GroupWise to Microsoft Exchange Server and to roll out a standard desktop build across Europe, in multiple languages, with just two Windows XP images (one uniprocessor and one ACPI). The success of that project was down to the professionalism and capabilities of the team around me – and it will be just the same in this new role.

As for this blog, well, I’ve been pretty busy for the last few weeks, as I’ve juggled two jobs – and I expect I’ll be just as busy over the coming weeks and months – but I’m still tweeting and I’ll still knock out the odd blog post too.  There might be some more Microsoft Exchange and Lync content but I expect that the usual mix of photography, social media and observations on the state of tech will persist.  This blog has been here for 9 years now, the content just shifts slightly as I do different things in my life and it seems that some people still find it interesting enough to read (or at least to subscribe)!

Site notices Technology

Website moving to a new server…

My hosting provider has told me that they are moving this website to a new server over the weekend.

All being well, the move will be transparent but I will also need to point the domain names at new DNS servers, so, if I disappear offline for a while on Sunday night, please bear with me and I should be back again once the interwebs have updated…

Technology

Reducing website errors with HTTP 301 redirects

A couple of weeks ago, I wrote about a WordPress plugin called Redirection. I mentioned that I’ve been using this to highlight HTTP 404 errors on my site but I’ve also been using the crawl errors logged by Google’s Webmaster Tools to track down a number of issues resulting from the various changes that have been made to the site over the  years, then creating HTTP 301 redirects to patch them.

Redirections as a result of other people’s mistakes

One thing that struck me was how other people’s content can affect my site – for example, many forums seem to abbreviate long URLs with … in the middle. That’s fine until the HTML anchor gets lost (e.g. in a cut/paste operation) and so I was seeing 404 errors from incomplete URLs like http://www.markwilson.co.uk/blog/2008/12/netboo…-file-systems.htm. These were relatively easy for me to track down and create a redirect to the correct target.

Unfortunately, there is still one inbound link that includes an errant apostrophe that I’ve not been able to trap – even using %27 in the redirect rule seems to fail. I guess that one will just have to remain.

Locating Post IDs

Some 404s needed a little more detective work – for example http://www.markwilson.co.uk/blog/2012/05/3899.htm is a post where I forgot to add a title before publishing and, even though I updated the WordPress slug afterwards, someone is linking to the old URL.  I used PHPMyAdmin to search for post ID 3899 in the wp_content table of the database, from which I could identify the post and  create a redirect.

Pattern matching with regular expressions

Many of the 404s were being generated based on old URL structures from either the Blogger version of this site (which I left behind several years ago) or changes in the WordPress configuration (mostly after last year’s website crash). For these I needed to do some pattern matching, which meant an encounter with regular expressions, which I find immensely powerful, fascinating and intimidating all at once.

Many of my tags were invalid as, at some point I obviously changed the tags from /blog/tags/tagname to /blog/tag/tagname but I also had a hierarchy of tags in the past (possibly when I was still mis-using categories) which was creating some invalid URLs (like http://www.markwilson.co.uk/blog/tag/apple/ipad).  The hierachy had to be dealt with on a case by case basis, but the RegEx for dealing with the change in URL for the tags was fairly simple:

  • Source RegEx: (\/tags\/)
  • Target RegEx: (\/tag\/)

Using the Rubular Ruby RegEx Editor (thanks to Kristian Brimble for the suggestion – there were other tools suggested but this was one I could actually understand), I was able to test the RegEx on an example URL and, once I was happy with it, that was another redirection created.  Similarly, I redirected (\/category\/) to (\/topic\/).

I also created a redirection for legacy .html extensions, rewriting them to .htm:

  • Source RegEx: (.*).html
  • Target  RegEx: $1.htm

Unfortunately, my use of a “greedy” wildcard meant this also sustituted html in the middle of a URL (e.g. http://www.markwilson.co.uk/blog/2008/09/creating-html-signatures-in-apple-mail.htm became http://www.markwilson.co.uk/blog/2008/09/creating-.htm-signatures-in-apple-mail.htm) , so I edited the source RegEx to (.*).html$.

More complex regular expressions

The trickiest pattern I needed to match was for archive pages using the old Blogger structure.  For this, I needed some help, so I reached out to Twitter:

Any RegEx gurus out there who fancy a challenge, please can you help me convert /blog/archive/yyyy_mm_01_archive.htm to /blog/yyyy/mm ?
@markwilsonit
Mark Wilson

and was very grateful to receive some responses, including one from Dan Delaney that let me to create this rule:

Source RegEx: /blog\/([a-zA-Z\/]+)([\d]+)(\D)(\d+)(\w.+)
Target RegEx: /blog/$2/$4/

Dan’s example helped me to understand a bit more about how match groups are used, taking the second and fourth matches here to use in the target, but I later found a tutorial that might help (most RegEx tuturials are quite difficult to follow but this one is very well illustrated).

A never-ending task

It’s an ongoing task – the presensce of failing inbound links due to incorrect URLs means that I’ll have to keep an eye on Google’s crawl errors but, over time, I should see the number of 404s drop on my site. That in itself won’t improve my search placement but it will help to signpost users who would otherwise have been turned away – and every little bit of traffic helps.

Technology

Redirection – an essential plug-in for WordPress users

Last year, a combination of a loss of service from my hosting provider and my appalling backups meant that this website was temporarily wiped off the face of the Internet. It’s never recovered – at least not in terms of revenue – and it taught me an important lesson about backups (it’s all too easy to forget the hours of effort that go into a “hobby” site like this one…).

Whilst the blog posts were restored, and I took the opportunity to apply a new theme to the site (it’s probably due another one now…) but some of the images had got AWOL along the way. I’ve been ignoring that (mostly) but decided I really should do something about it when an old post was picked up by a journalist today and I realised it had a missing graphic.

I remembered a WordPress plugin that I used on another site recently, for managing redirects when access to the .htaccess file is not available. The plug-in, written by John Godley, is called Redirection, and one of its modules will report on HTTP 404 errors, like the ones that my missing graphics will create. I know there are other tools that can do this for me (Google’s Webmaster Tools, for example, or trawling through the web logs) but it’s an easy way to see when a 404 has been returned in order to investigate accordingly.  So far this afternoon, I’ve tracked down and replaced around 8 missing graphics and one broken permalink using the logs from Redirection.  I’m now scanning through the rest of John’s plugins to see what else I’m missing and will certainly be donating later…

Technology

Disabling comments for all posts on a WordPress blog

Long-time readers of my blog will know that I used to manage the Fujitsu UK and Ireland CTO Blog (which we’ve recently closed, but have left the content in place for posterity) and I’m still getting the comment notifications (mostly spam).  Many of the posts have HTTP 301 redirects to either mine or David Smith‘s blogs (I found a great WordPress plugin for that – Redirection) but, for those that remain, I wanted to turn off comments.  Doing this individually for each post seemed unnecessarily clunky but there is, apparently, no way to do this from the WordPress user interface (with database access it would have been straightforward but I don’t have that level of access).

There is a plug-in that globally disables all comments – named, rather aptly, Disable Comments - except that the blog is part of a multi-site (network) install and I’m not sure what the broader impact would be…

No bother, I found a workaround – simply set all of the posts to close comments after a certain number of days. The theme that someone has applied to the site (since I stopped working with it) doesn’t seem to respect that, and still leaves a comment button visible, but anyone with a well-developed theme should be OK…

Technology

Deleting large quantities of Facebook notes

A few years ago, I followed the example of a “social media guru” and set Facebook up to consume my blog’s RSS feed and republish each post as a note.

This was A Bad Thing for a number of reasons, not least:

  1. Copyright – I’m sure that when I upload anything to Facebook, I give them some rights over it (which is why my images are still on Flickr).
  2. Traffic – reproducing content on Facebook might get eyeballs, but it takes that traffic away from your own website and only Facebook gains any revenue. This may be OK if you are selling goods/services that can be monetised via Facebook links but my revenue is from ads: ads on my site = revenue for me; ads on a Facebook copy = revenue for Facebook.
  3. Layout – invariably, despite my best efforts to write good XHTML, the blog posts look better on my site than when scraped into Facebook as notes.

I turned off the feed but deleting the notes was far from trivial. There is no bulk delete option that I can find, and that meant opening each note, scrolling down, clicking delete, etc. In a word, tedious.

I forgot about the notes until last week, when I switched over to timeline view. Arghh. Yes. Must delete those…

…and then I found another method – much quicker – using the iOS Facebook app.

By opening the Notes section of the Facebook app on my iPad, a quick swipe and press was all it took to delete each note. Still tedious, but a lot quicker to get through…

Technology

Adding extra social sharing services to WordPress with JetPack (ShareDaddy)

Last night, as part of the rebuild of this site, I reinstated the social sharing links for each post. In the old site they had been implemented as bespoke code using each social network’s recommended approach (e.g. Twitter or Facebook‘s official button codes) but presentation becomes problematic, with each button having a slightly different format and needing some CSS trickery to get it right.

I looked into a variety of plugins but they all had issues – either with formatting or functionality – until I stumbled across reference to WordPress.com’s social sharing capabilities.  If only I could have that functionality on a self-hosted (WordPress.org) site…

…As it happens, I can – WordPress.com’s social sharing is based on the ShareDaddy plugin, which is part of a collection called JetPack. ShareDaddy is also available as a freestanding plugin but now I have JetPack installed I’m finding some of the other functionality it gives me useful (and it’s not possible to activate ShareDaddy if you have JetPack installed).

I need to make some changes (like working out how to hack the code and turn off the count next to my Tweet/Like/+1 buttons – it’s embarrassing when the number is small!) but I’m happy enough with the result for now.  One thing I did need to do though was to add some services that are not yet in the JetPack version of the plugin (one of the major advantages of ShareDaddy is how simple it is to do this).

Site notices Technology

Rebuilding my site: please excuse the appearance

Regular readers may have noticed that this site is looking a little… different… right now.

Unfortunately, my hosting provider told me last night that they had a disk failure on the server. Normally that wouldn’t be a problem (that’s why servers have redundant components right? Like RAID on the disks?) but it seems this “server” is just a big PC. I can’t get too mad though… the MySQL database backup scripts have been failing for a month and it was my sloppyness that didn’t chase that up, and it was me who hadn’t made sure I had a recent copy of the file system…

So, as things stand:

  • I think I have restored all posts from 2004 until almost the end of August 2011;
  • I need to restore the later posts and comments (using copies from FeedBlitz, Google Reader, etc.);
  • There are no plugins (so things look odd); Some of the plugins have been reinstalled (but things may still look odd);
  • There are no graphics (they were hosted outside WordPress) I’ve restored all most of the graphics and other external media but there are still some I need to track down;
  • I have not restored the theme (so I’m using the WordPress defaults and there is no mobile theme);
  • The theme I’m using does not specify UTF-8 encoding so lots of  characters; Still some spurious characters appearing on some pages…
  • There are no fewer ads (which you might be happy about, but I do still need to pay the bills).

Please bear with me whilst I get things back… it may take some time as it needs to fit in between other activities but it might also be a good thing (new theme has been long overdue and I might even get smarter about my backups…).

And, if you spot another problem, please let me know.

[Updated at various points as the site has been restored]

Uncategorized

Attempting to track RSS subscribers on a WordPress blog

As well as my own website (which has precious little content these days due to my current workload), I also manage the Fujitsu UK and Ireland CTO Blog. Part of that role includes keeping an eye on a number of metrics to make sure that people are actually interested in what we have to say (thankfully, they seem to be…). Recently though, I realised that, whilst I’m tracking visitors to the blog, I’m missing hits on the RSS feed (because it’s not actually a page with the tracking script included) - and that’s a problem.

There are ways around this (I use Google Feedburner on my own blog, or it’s possible to put a dummy page with a meta refresh in front of the feed to pick up some metrics) but they have their own issues (for example the meta refresh methods breaks autodiscovery for some RSS readers) and will only help with new subscribers going forwards, not with my legacy issue of how many subscribers do I have right now.

There is another approach though: using a popular web-based RSS subscription service like Google Reader to see how many subscribers it tracks for our feed (the same metrics are available from Google’s Webmaster Tools).  The trouble is, that’s not all of the subscribers (for example, a good chunk of people use Outlook to manage their feeds, or other third-party RSS readers). If I use my own blog as an example, Google Reader shows that I have 247 subscribers but Feedburner says I have 855.  Those subscribers come from all manner of feed readers and aggregators, email subscription services and web browsers (Firefox accounts for almost 20% of them) so it’s clear that I’m not getting the whole picture from Google’s statistics. 

Google Reader Subscribers

Google Feedburner Subscribers

Does anyone have any better ideas for getting some subscriber stats for RSS feeds on a WordPress blog using Google Analytics? Or maybe from the server logs?

%d bloggers like this: