Archive Google Mail to a Mac using getmail

Late last year I questioned the wisdom of trusting critical data to the cloud and cited Google Mail as an example. Whilst the Google Mail service is generally reliable, there have been some well-publicised instances of failure (including data loss). I shouldn’t be too alarmed by that – for many things in life you get what you pay for and I pay Google precisely nothing (although they do get to build up a pretty good profile of my interests against which to target advertising…). So, dusting off the motto from my Scouting days (“Be Prepared”), I set about creating a regular backup of my Google Apps mail – just in case it ever ceased to exist!

I already use the Apple Mail application (mail.app) for IMAP access but I have some concerns about mail.app – it’s failed to send messages (and not stored a draft either) on at least two occasions and basically I don’t trust it! But using Mac OS X (derived from BSD Unix) means that I also have access to various Unix tools (e.g. getmail) and that means I can take a copy of my Google Mail and store it in maildir or mbox format for later retrieval, on a schedule that I set.

The first step is to install some Unix tools on the Mac. I chose DarwinPorts (also known as MacPorts). After running the 1.7.0 installer, I fired up a terminal and entered the following commands:

su - Administrator
cd /opt/local/bin
sudo ./port -d selfupdate

This told me that my installation of MacPorts was already current, so set about installing the getmail port:

sudo ./port install getmail

The beauty of this process is that it also installed all the prerequisite packages (expat, gperf, libiconv, ncursesw, ncurses, gettext and python25). Having installed getmail, I followed George Donnelly’s advice to create a hidden folder for getmail scripts and a maildir folder for my GmailArchive – both inside my home directory:

mkdir ~/.getmail
mkdir ~/GmailArchive/ ~/GmailArchive/new ~/GmailArchive/tmp ~/GmailArchive/cur

I then created and edited a getmail configuration file at ~/.getmail/getmailrc.mygmailaccount) and entering the following settings:

[retriever]
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
username = googleaccountname
password = googleaccountpassword

[destination]
type = Maildir
path = ~/GmailArchive/

[options]
verbose = 2
received = false
delivered_to = false
message_log = ~/.getmail/gmail.log

I tested this by running:

/opt/local/bin/getmail -ln --rcfile getmailrc.gmailarchive

but was presented with an error message:

Configuration error: SSL not supported by this installation of Python

That was solved by running:

sudo ./port install py25-socket-ssl

(which installed zlib, openssl and py25-socket-ssl), after which I could re-run the getmail command and watch as my terminal session was filled with messages being downloaded (and the folder at ~/GmailArchive/new started to fill up). Then I saw a problem – even though I have a few thousand messages, I noticed that getmail was only ever downloading the contents of my Inbox.

Eventually, I solved this by adding the following line to the [retriever] section of the getmail configuration file:

mailboxes = ("[Google Mail]/All Mail",)

This took a while to work out because many blog posts on the subject suggest that the mailbox name will include [GMail] but I found I needed to use [Google Mail] (I guess that could be the difference between GMail and the Google Mail service provided as part of Google Apps). After making the change I was able to download a few thousand messages, although it took a few tries (the good news is that getmail will skip messages it has already retrieved). Strangely, although the Google Mail web interface says that there are 3268 items in my All Mail folder, getmail finds 5320 (and, thankfully, doesn’t seem to include the spam, which would only account for 1012 of the difference anyway).

In addition, the getmail help text explains that multiple mailboxes may be selected by adding to the tuple of quoted strings but, if there is just a single value, a trailing comma is required.

Having tested manual mail retrieval, I set up a cron job to retrieve mail on a schedule. Daily would have been fine for backup purposes but I could also schedule a more frequent job to pull updates every few minutes:

crontab -e

launched vim to edit the cron table and I added the following line:

4,14,24,34,44,54 * * * * /opt/local/bin/getmail -ln --rcfile getmailrc.gmailarchive

I then opened up a terminal window and (because running lots of terminal windows makes me feel like a real geek) ran:

tail -f ~/.getmail/gmail.log

to watch as messages were automatically downloaded every 10 minutes at 4, 14, 24, 34, 44, and 54 minutes past the hour.

This also means that I get 6 messages an hour in my the local system mailbox (/var/mail/username) to tell me how the cron job ran so I chose to disable e-mail alerting for the cron job by appending >/dev/null 2>&1 to the crontab entry.

Many of the posts on this subject suggest using POP to download the mail, but Google limits POP transfers so it will require multiple downloads. Peng.u.i.n writes that IMAP should help to alleviate this (although that wasn’t my experience). He also suggests using several mbox files (instead of a single mbox file or a maildir) to backup mail (e.g. one file per calendar quarter) and Matt Cutts suggests backing up to mbox and maildir formats simultaneously:

[destination]
type = MultiDestination
destinations = (’[mboxrd-destination]‘, ‘[maildir-destination]‘)

[mboxrd-destination]
type = Mboxrd
path = ~/GmailArchive.mbox

[maildir-destination]
type = Maildir
path = ~/GmailArchive/

If you do decide to use a mbox file, then it will need to be created first using:

touch ~/GmailArchive.mbox

In Chris Latko’s post on pulling mail out of Gmail and retaining the labels, he describes some extra steps, noteably that the timestamps on mail are replaced with the time it was archived, so he has a PHP script to read each message and restore the original modification time.

Aside from the MacPorts installation, the process is the same on a Unix/Linux machine and, for Windows users, Gina Trapani has written about backing up GMail using fetchmail with Cygwin as the platform.

Sometimes I really do wonder why I bother…

It’s a new day and the sun is shining, I spent some time playing with my kids before starting work – I should be in a good mood.

Except I’m not… I’m actually feeling quite insecure – and one of the reasons is the comments I get about this blog.

Last week I wrote a piece about getting Vodafone Mobile Connect working on a Mac. In that post I linked to someone who had managed to speak to a suitably skilled technician at Vodafone who talked him through the process of installing the application as the root user. Thankfully that person blogged about their experience, I found his post on the ‘net and it helped me, so I did my bit to spread the message. Then somebody (for whom I can apply several four-letter words… but I won’t in public) leaves a comment which says:

“This bears no resemblance to my experiences. I have installed VCM [sic] on about 100 Mac Laptops now and have never, ever had to use the method you describe.

The standard installation works fine and is a hassle free process.

I get the feeling you are making a simple installation process complicated by looking for problems where there are none.

Your advice is incorrect and you really should not attempt to act as a source of knowledge on subjects you know nothing about.”

Well, great, 15 years in IT (not including the time spent in education before that), over 1300 posts on this blog, some of which have apparently been useful to others, and now the insults start to arrive. I responded, then stewed about it for a while, before deciding that I have better things to worry about and to ignore the comments… until I heard that my efforts aren’t necessarily appreciated by technology companies either…

…a few weeks back, I was sent some information from a (very large) technology company in which the e-mail said “please cascade as appropriate”. I thought that the information would be of interest to people reading the blog (even though there’s a lot of stuff I chose not to write about) but it seems that some people in the company thought I had breached an NDA (I did not and would not – indeed I have many blog posts stored up that I can’t publish yet because of such agreements) and that small blogs (written by real people) shouldn’t be reporting things that company blogs (written by marketing departments) should be spinning. I double-checked my source – it definitely said cascade as appropriate, which meant I was in the clear – phew! (Furthermore, thankfully, there are people inside that technology company who have been prepared to defend my position).

Right now it seems that I have a blog which is neither small enough to just take a few hours a week, nor large enough to pay the bills. If I write about real world experiences with technology, I get flamed by fanboys who tell me I don’t know what I’m talking about; meanwhile if I write about technology subjects that are less “hands on”, then the companies those posts relate to get jumpy. It seems I can’t win.

I spend a huge about of time writing on this blog and if I work out how much it pays me then it’s well below the minimum wage so it’s certainly not worth it from a financial perspective. I used to find the writing therapeutic but now it’s just something else that I don’t have time for in my day. I need a break… especially if all I’m doing is creating Internet noise. It would be a shame to undo 5 years’ work and to pack it in, but sometimes really do I wonder why I bother…

Microsoft PowerShell, VBScript and JScript Bible

At last night’s joint user group meeting for the Windows Server UK User Group and the Active Directory UK User Group, James O’Neill mentioned that the book he has co-authored (Microsoft PowerShell, VBScript and JScript Bible, published by John Wiley and sons) goes on sale today.

I haven’t had the chance to review it yet but knowing how immersed James is in PowerShell (and that he wrote the PowerShell sections), I would suggest this might be worth considering if you are looking for a good reference book.

A source of great icons for presentation materials

My colleague Alan Dodd (who works as the technology lead for Unix/Linux at the same company where I look after the Microsoft infrastructure) let me in on the source for his super-cool presentation graphics this week… Gnome/KDE icon sets! Sites like GNOME Look and KDE Look are full of fantastic artwork (some of it scalable), much of which is available under various copyleft licensing agreements and is great for illustrating otherwise dull PowerPoint slides that try to explain IT infrastructure concepts.

Getting Vodafone Mobile Connect and Mac OS X to play nicely together

VodafoneA couple of years back, I wrote about getting Vodafone Mobile Connect (VMC) to work with Windows Vista and today, after spending most of my train journey from Milton Keynes to Crewe trying the same on a Mac (it’s actually a hackintosh… but that’s of little consequence here), it seems I need to repeat the experience for the benefit of Mac users.

It seems that, even with the latest version of Vodafone Mobile Connect (I’m using v2.11.02.00 on OS X 10.5.6) it’s necessary to run the application as the root user (yes, root!) the first time it is used.

Thankfully, Ian Jindal thought to write about his experiences with Vodafone Mobile Connect on the Mac (which he referred to as inept and unnecessary), summarised as:

  1. Install VMC (I did this as a standard user).
  2. Enable the root user account (with a password of root) in Directory Utility.
  3. Log out and log back in as root.
  4. Run VMC (I tested the connection at this stage too).
  5. Disable the root user.
  6. Log out and log back in using a standard user account.

Once you’ve done this, it should be possible to connect to Vodafone as required using the appropriate connection in the network preferences. The software might not be up to scratch, but the network is generally pretty good (in fact, it’s better than the corporate connection I’ve been on all day!).

Vodafone 3G connection on Mac OS X

I’m writing this at the station so let’s see how I get on as I make my way home courtesy of Virgin Trains (who I noticed were advertising an enhanced connection for Orange customers on some of their trains, but there’s no mention of anything for Vodafone).
Vodafone USB Broadband

Windows 7 seems to fix itself!

Microsoft gets some stick for Windows supposedly being less than stable (which, as I’ve written here many times before, is not my experience); however my netbook has had more than its fair share of BSODs this evening… something to do with a file called tdx.sys (the TDI Translation Driver). I should point out that this machine is running a beta operating system (i.e. one where problems should be expected) but the good news is that Windows 7 detected the problem and the Action Center referred me to Microsoft knowledge base article 967891, from where I was able to apply for, download and install the associated hotfix.

Action Center KB referral

Only time will tell if the hotfix solved the problem but the point is that Windows self-diagnosed the issue and I was able to follow the necessary steps to fix it. Some people may argue that if the system can diagnose the issue then it should be able to self-heal and they may have a point but I’d like to know which updates are being applied (and why), so this system works for me.

On a related note, Daniel Terhell at Resplendence Software has written a great utility called WhoCrashed. If you’re trying to diagnose the cause of system crashes, this might just help out (it makes use of Microsoft’s Debugging Tools for Windows) – and it’s free for personal use. If it can’t find any crash dumps on your system (but you can see them in the %windir%\minidump folder) then make sure you are running the program as Administrator (and that’s no excuse for running with Admin privileges all the time).

WhoCrashed

If you want to see just how stable your system is and you’re looking for the Reliability Monitor, under Windows 7 (at least in the beta – build 7000) it is available in the Action Center.

Installing WordPress on a Mac: the aftermath (phpMyAdmin, databases, themes, plugins and fixing the tags)

Last week I wrote about installing WordPress on a Mac but I wanted to follow that up with a post on what happened next.

Installing phpMyAdmin

Installing WordPress is all very well, but it’s good to be able to manipulate the database. The command line mysql tools would have worked but a graphical interface (even an ugly one with bizarre icons) is often easier, so I installed phpMyAdmin as described by Nino Müller:

  • Download the latest version of phpMyAdmin (I used v3.1.2).
  • Extract the files to ~/Sites/phpmyadmin.
  • Copy config.sample.inc.php to config.inc.php and edit the Blowfish secret (the line which reads $cfg['blowfish_secret'] = ''; .
  • Navigate to http://localhost/~username/phpmyadmin and login.

Unfortunately, after attempting to logon, I was presented with an error message:

#2002 – The server is not responding (or the local MySQL server’s socket is not correctly configured)

Following The Vince Wadhwani’s advice at his Hackido site I typed mysqlconfig --socket to verify the socket is in use for MySQL (e.g. /tmp/mysql.sock) but I couldn’t find a config.default.php file (or the equivalent entry in my config.inc.php file) to adjust the socket. A post at Friends of ED suggested creating a symbolic link for the socket and it seemed to work for me:

sudo mkdir /var/mysql
sudo ln -s /tmp/mysql.sock /var/mysql/mysql.sock

Following this I could log into phpMyAdmin (although I still have a warning message that phpMyAdmin cannot load the mcrypt extension – this doesn’t seem to be causing any problems though).

Importing a WordPress database

Using phpMyAdmin on my web host’s server, I exported the database from the live copy of markwilson.it and attempted to import it on the Mac. Unfortunately this didn’t work as my database was too large for PHP to upload – as confirmed by creating a file named ~/Sites/phpinfo.php containing <?php phpinfo(); ?>, then viewing it in a browser (http://localhost/~username/phpinfo.php) and looking for the upload_max_filesize variable.

Rather than messing around with my PHP configuration, I googled for the necessary commands and typed:

/user/local/mysql/bin/mysql -u root -p
drop database wordpressdatabasename;
source ./Downloads/wordpressdatabasename.sql
quit

At this point, the local copy of WordPress was running on the live database, but the links were all to the live site, so I used phpMyAdmin to edit the site URL in the wp_options table, changing it from https://www.markwilson.co.uk/blog to http://localhost/~username/blog.

Because the live copy of the site is on an old version of WordPress, browsing to http://localhost/~username/blog/wp-admin prompted me to upgrade the database, after which I could log in and edit the site settings (e.g. the blog address).

WordPress database upgrade

WordPress 2.7 Dashboard

Restoring the theme and plugins

At this point, although WordPress was running on a local copy of my live database, the normal theme was missing and the plugins were disabled (as they were not present). I copied them from the live server and, after selecting the theme and enabling the plugins saw something approaching normality, although there were a few plugins that required updating and I still couldn’t get rid of a particularly annoying database error:

WordPress database error: [Table ‘wordpressdatabasename.wp_categories’ doesn’t exist]
SELECT cat_name FROM wp_categories ORDER BY cat_name ASC

By disabling plugins one by one (I could also have grepped /~Sites/blog/wp-admin/wp-content/plugins for wp_categories), I found that the issue was in the Bad Behavior plugin that I use to ban IP addresses known to send spam.

Moving from categories to tags

When I first moved this site to WordPress, I used Dean Robinson’s Ultimate Category Cloud plugin to provide a tag cloud (at that time WordPress did not support tags). Over time, that because unmanageable and, although I still need to define a decent taxonomy for the site, the categories still have some value if they are converted to tags.

Over some tapas and drinks in the pub, my friend Alex Coles at ascomi and I had a look at the database structure and Alex came up with a quick SQL query to run against my WordPress database:

UPDATE wp_term_taxonomy SET taxonomy='post_tag' WHERE taxonomy='category'

That converted all of my categories to tags, but there were some I manually edited to return to categories (General – which was once called Uncategorised – and Site Notices) but for some reason, all the posts were recorded in a category of Uncatagorized. Some late night PHP coding (reminiscent of many nights at Uni’ – as Steve will no doubt remember – although in those days it was Modula-2, C, C++ and COBOL) resulted in a script to run through the database, identify all posts with a category of 17 (which in my database is the default category of “General”), put the post numbers into an array and then explicitly set the category as required, taking a note of the ones which have been changed so that they can be ignored from that point on:

<html>
<head>
</head>

<body>
<?php

// Connect to the WordPress database
$db_hostname = "localhost:/tmp/mysql.sock";
$db_username = "wordpressuser";
$db_password = "wordpresspassword";
$db_connect = mysql_connect($db_hostname, $db_username, $db_password) or die("Unable to connect to server.");
$db = mysql_select_db("wordpressdatabasename",$db_connect);

// Retrieve all objects including a category with the value of 17 (my default category)
$hascat = mysql_query("SELECT object_id FROM wp_term_relationships WHERE term_taxonomy_id = '17' ORDER BY object_id");
echo '<p>'.mysql_num_rows($hascat).' rows found with category</p>';

$correct_ids = array();

// Build a PHP array (not a MySQL array) containing the relevant object IDs for later comparison
while ($row = mysql_fetch_array($hascat))
{
$correct_ids[] = $row[0];
}
echo '<p>Array built. Length is '.count($correct_ids).'. First ID is '.$correct_ids[0].'.</p>';

// Retrieve every object
$result = mysql_query("SELECT * FROM wp_term_relationships ORDER BY object_id");
echo '<p>'.mysql_num_rows($result).' rows found total</p>';

// The magic bit!
// If the object is not in our previous array (i.e. the category is not 17)
// then add it to category 17 and put it in the array so it won't get added repeatedly
while ($row = mysql_fetch_array($result))
{
if (!in_array($row['object_id'],$correct_ids))
{
// Add to category 17
mysql_query("INSERT INTO wp_term_relationships (object_id,term_taxonomy_id,term_order) VALUES ('".$row['object_id']."','17','0')");
echo '<p>Alter database entry for object '.$row['object_id'].'.</p>';
// Add to the array so it is not flagged again
$correct_ids[]=$row['object_id'];
}
else echo '<p style="color:white; background-color:black">'.$row['object_id'].' ignored.</p>';
}

?>
</body>
</html>

Remaining issues

Permalinks don’t seem to work – it seems that Mac OS X does not support using .htaccess files by default and, whilst it’s possible to modify for the root folder it doesn’t seem to work for individual user sites. I’ll do some more digging and see if I can find a fix for that one.

WordPress also features the ability to automatically update plugins (and itself), but my installation is falling back to FTP access when I try to update it and making it work appears to be non-trivial. Again, I’ll take another look when I have more time.

More ineptitude from HM Revenue and Customs

18 months ago, I had a rant about the ineptitude of HM Revenue and Customs in losing the personal details of over 25 million people in the UK. Well, if you don’t want to read another rant, stop now because I’ve just got home and found another shining example of their bureaucratic f***-ups.

In common with several million people, I filed an on-line tax return recently. I used the self assessment website from HM Revenue and Customs, entered the relevant numbers, and let it work out how much tax I owed. And the good news was that it calculated that I was owed several hundred pounds. Bonus! I like that sort of tax return but sadly my joy was not to last long. A few days after I received my refund and the 31 January tax return payment deadline had passed, I received a letter explaining that there was a mistake on my return and that I now owed the Revenue some money as a result of having been over-refunded. I thought it was too good to be true but I’d just put in the numbers that the system asked for – how was I to know that the line on my PAYE coding notice that said tax underpaid from a previous year was being reclaimed in my tax code was a complete fallacy? Hey – I only read it from an official Government document… (you may detect a sense of sarcasm… it may be the lowest form of wit but it’s also entirely deliberate).

So I phoned HM Revenue and Customs to check that I wouldn’t be fined for late payment as it was their mistake, to which the response was “How do I know it is a Revenue and Customs mistake?” and my answer was “Because your website calculated my tax bill!” but knowing that resistance is futile, that the tax man is always the first debt that should be paid (and that a smaller refund was better than nothing) I repaid the balance of my account in order to avoid interest and a fine.

Today, I received a final demand from HM Revenue and Customs. To their credit (excuse the pun!), it was issued on the day I made payment but it was issued on 16 February – just a five days (three working days) after they wrote to tell me that I had been overpaid. What’s worse though is that the final demand was for payment by 28 February and was sent by Royal Mail second class post and not delivered until today – 3 March – a whole 16 days (12 working days) later and after the payment deadline! According to Google Maps the postie could have walked the 217 miles from the HMRC offices in Sunderland to my house in a little under 3 days… so what took the Royal Mail so long to deliver it with the aid of some trucks and vans (no wonder they are losing so much business custom)?

I suspect that the final demand, although dated 16 February was not posted until much later, by which time my account had been settled. Which begs the question, why can’t HMRC (or anyone else sending out bills) wait a few days to receive payment before issuing final notices?

Right, time to pack up my soapbox. I found that cathartic and if you’re still reading I hope you did too.

Installing Windows SharePoint Services on Windows 7 (or Vista)

Windows SharePoint Services (WSS) is not supported on a client operating system, but that’s not to say it shouldn’t run – right? After all, Windows client releases include a web server and can run a database service – that should pretty much cover the basics (back in the days of Windows NT it was generally reckoned that the differences between the Workstation and Server releases were just a few registry entries – but even if that was true then, there are a few more differences today)! In response to this, the guys at Bamboo Nation came up with an installer for SharePoint on Vista and, even though it’s been around for a while (thanks to Garry Martin for alerting me to this), last week I finally got around to trying it out on Windows 7.

It seems to work well but, having never installed SQL Server 2008 Express Edition (WSS needs access to a SQL database) I needed to combine two very good resources (the Jonas Nilsson’s installation guide for Windows SharePoint Services 3.0 SP1 on Windows Vista and Symantec’s article on installing and configuring SQL Server 2008 Express) – the result is my installation notes (repeated in full here in case either of those articles ever disappears but for screen shots, refer to the originals – or to Jim Parshall’s video tutorial):

  1. Gather together all the resources that will be required. Assuming that Windows is already running, the remaining components are:
  2. Install and configure SQL Server 2008 Express Edition:
    • SQL Express may be downloaded with or without tools – I went for the “without” option but the tools may be useful for troubleshooting purposes. If you’re installing on an older platform, there are some pre-requisites (.NET Framework 3.5 SP1, Windows Installer 4.5 and Windows PowerShell 1.0) but my Windows 7 client already had these (or later versions).
    • Run the SQL Server Express installer and follow the wizard. It’s fairly straightforward but there are a couple of things to watch out for:
      • For the instance configuration, specify MSSQLSERVER as both the named instance and the instance ID.
      • For the server configuration, use NT AUTHORITY\SYSTEM (no password) as the SQL Server database engine account name and set the SQL Server Browser startup type to Automatic.
      • For database engine configuration, either Windows or mixed mode authentication may be used (I stuck with the defaults) but the installer will not continue until users or groups are specified for unrestricted access to the SQL server. SQL DBAs and security guys will probably have lots of best practice advice here for use with production servers but I took the view that it’s probably nothing too much to worry about on a developer workstation and simply gave the necessary rights to the account I was running as.
  3. Install and configure Internet Information Services in Control Panel, Programs and Features by clicking the option to turn Windows features on or off and enabling:
    • Internet Information Services
      • Web Management Services
        • IIS 6 Management Compatibility
          • IIS 6 Management Console
          • IIS 6 Scripting Tools
          • IIS 6 WMI Campatability
          • IIS 6 Metabase and IIS 6 configuration capability
        • IIS Management Console
      • World Wide Web Services
        • Application Development Features
          • .NET Extensibility
          • ASP.NET
          • ISAPI Extensions
          • ISAPI Filters
        • Common HTTP Features
          • Default Document
          • Directory Browsing
          • HTTP Errors
          • HTTP Redirection
          • Static Content
        • Health and Diagnostics
          • HTTP Logging
          • Request Monitor
        • Performance Features
          • HTTP Compression Dynamic
          • Static Content Compression
        • Security
          • Basic Authentication
          • Request Filtering
          • Windows Authentication
  4. Install the WSS on Vista setup helper application by running wssvista.msi.
  5. Install WSS by:
    • Locating the WSS on Vista helper application files in %programfiles%\WSSonVista\Setup and running setuplauncher.exe.
    • Pointing the setuphelper to the WSS installer (sharepoint.exe)
    • Following the WSS installation wizard, selecting an advanced installation for a web front-end server, creating a new server farm, and supplying the details for the local SQL database (including the account details).
  6. At the end of the WSS installation, take a note of the port number used, and then navigate to http://localhost:portnumber/. If all goes well, then you should see the SharePoint Central Administration site in your browser:
    Windows SharePoint Services running on Windows 7

Finally, a couple of additional notes:

  • I ran all of this as a standard user, answering just a few UAC prompts at the appropriate points to elevate my privileges).
  • These instructions will allow access to SharePoint site from the local machine; however it will be necessary to create some firewall exceptions if remote client access is required.

Useful Links: February 2009

A list of items I’ve come across recently that I found potentially useful, interesting, or just plain funny: