Main menu

SmartFeed by FeedBurner Subscribe to the site feed.


If you find the information here useful, then please consider a small donation, or linking to this site.

Recent Comments

Recent Tweets

  • RT @robmargel: http://tinyurl.com/y8tcrhr - Parental Controls in Windows 7 and Windows Vista ^MW practical advice for parents
  • Just caught up for a coffee with an old friend at Microsoft UK (we were grads together 15 years ago... now I feel /really/ old)
  • Grrr... I paid £5 to respond to an ad on preloved.co.uk and the advertiser didn't even respond. There's a site to avoid in future...
  • [blog] Safer Internet Day: Educating parents on Internet safety for their children http://bit.ly/c0s8Jt
  • European Safer Internet Day: ChildLine's advice on staying safe online - parents might want their kids to read this: http://bit.ly/92HY9C

Calendar

March 2009
M T W T F S S
« Feb   Apr »
 1
2345678
9101112131415
16171819202122
23242526272829
3031  

Archive

Archive Google Mail to a Mac using getmail

Written by: Mark Wilson

Late last year I questioned the wisdom of trusting critical data to the cloud and cited Google Mail as an example. Whilst the Google Mail service is generally reliable, there have been some well-publicised instances of failure (including data loss). I shouldn’t be too alarmed by that – for many things in life you get what you pay for and I pay Google precisely nothing (although they do get to build up a pretty good profile of my interests against which to target advertising…). So, dusting off the motto from my Scouting days (“Be Prepared”), I set about creating a regular backup of my Google Apps mail – just in case it ever ceased to exist!

I already use the Apple Mail application (mail.app) for IMAP access but I have some concerns about mail.app – it’s failed to send messages (and not stored a draft either) on at least two occasions and basically I don’t trust it! But using Mac OS X (derived from BSD Unix) means that I also have access to various Unix tools (e.g. getmail) and that means I can take a copy of my Google Mail and store it in maildir or mbox format for later retrieval, on a schedule that I set.

The first step is to install some Unix tools on the Mac. I chose DarwinPorts (also known as MacPorts). After running the 1.7.0 installer, I fired up a terminal and entered the following commands:

su - Administrator
cd /opt/local/bin
sudo ./port -d selfupdate

This told me that my installation of MacPorts was already current, so set about installing the getmail port:

sudo ./port install getmail

The beauty of this process is that it also installed all the prerequisite packages (expat, gperf, libiconv, ncursesw, ncurses, gettext and python25). Having installed getmail, I followed George Donnelly’s advice to create a hidden folder for getmail scripts and a maildir folder for my GmailArchive – both inside my home directory:

mkdir ~/.getmail
mkdir ~/GmailArchive/ ~/GmailArchive/new ~/GmailArchive/tmp ~/GmailArchive/cur

I then created and edited a getmail configuration file at ~/.getmail/getmailrc.mygmailaccount) and entering the following settings:

[retriever]
type = SimpleIMAPSSLRetriever
server = imap.gmail.com
username = <em>googleaccountname</em>
password = <em>googleaccountpassword</em>

[destination]
type = Maildir
path = ~/GmailArchive/

[options]
verbose = 2
received = false
delivered_to = false
message_log = ~/.getmail/gmail.log

I tested this by running:

/opt/local/bin/getmail -ln --rcfile getmailrc.gmailarchive

but was presented with an error message:

Configuration error: SSL not supported by this installation of Python

That was solved by running:

sudo ./port install py25-socket-ssl

(which installed zlib, openssl and py25-socket-ssl), after which I could re-run the getmail command and watch as my terminal session was filled with messages being downloaded (and the folder at ~/GmailArchive/new started to fill up). Then I saw a problem – even though I have a few thousand messages, I noticed that getmail was only ever downloading the contents of my Inbox.

Eventually, I solved this by adding the following line to the [retriever] section of the getmail configuration file:

mailboxes = ("[Google Mail]/All Mail",)

This took a while to work out because many blog posts on the subject suggest that the mailbox name will include [GMail] but I found I needed to use [Google Mail] (I guess that could be the difference between GMail and the Google Mail service provided as part of Google Apps). After making the change I was able to download a few thousand messages, although it took a few tries (the good news is that getmail will skip messages it has already retrieved). Strangely, although the Google Mail web interface says that there are 3268 items in my All Mail folder, getmail finds 5320 (and, thankfully, doesn’t seem to include the spam, which would only account for 1012 of the difference anyway).

In addition, the getmail help text explains that multiple mailboxes may be selected by adding to the tuple of quoted strings but, if there is just a single value, a trailing comma is required.

Having tested manual mail retrieval, I set up a cron job to retrieve mail on a schedule. Daily would have been fine for backup purposes but I could also schedule a more frequent job to pull updates every few minutes:

crontab -e

launched vim to edit the cron table and I added the following line:

4,14,24,34,44,54 * * * * /opt/local/bin/getmail -ln --rcfile getmailrc.gmailarchive

I then opened up a terminal window and (because running lots of terminal windows makes me feel like a real geek) ran:

tail -f ~/.getmail/gmail.log

to watch as messages were automatically downloaded every 10 minutes at 4, 14, 24, 34, 44, and 54 minutes past the hour.

This also means that I get 6 messages an hour in my the local system mailbox (/var/mail/username) to tell me how the cron job ran so I chose to disable e-mail alerting for the cron job by appending &gt;/dev/null 2&gt;&amp;1 to the crontab entry.

Many of the posts on this subject suggest using POP to download the mail, but Google limits POP transfers so it will require multiple downloads. Peng.u.i.n writes that IMAP should help to alleviate this (although that wasn’t my experience). He also suggests using several mbox files (instead of a single mbox file or a maildir) to backup mail (e.g. one file per calendar quarter) and Matt Cutts suggests backing up to mbox and maildir formats simultaneously:

[destination]
type = MultiDestination
destinations = (’[mboxrd-destination]‘, ‘[maildir-destination]‘)

[mboxrd-destination]
type = Mboxrd
path = ~/GmailArchive.mbox

[maildir-destination]
type = Maildir
path = ~/GmailArchive/

If you do decide to use a mbox file, then it will need to be created first using:

touch ~/GmailArchive.mbox

In Chris Latko’s post on pulling mail out of Gmail and retaining the labels, he describes some extra steps, noteably that the timestamps on mail are replaced with the time it was archived, so he has a PHP script to read each message and restore the original modification time.

Aside from the MacPorts installation, the process is the same on a Unix/Linux machine and, for Windows users, Gina Trapani has written about backing up GMail using fetchmail with Cygwin as the platform.

Comments

1

Comment from Van Hire
Time: Tuesday 17 March 2009, 22:54

Does google mail allow you to use smtp for external progs such a outlook? If so then isnt this an easier way to back up your gmail?

2

Comment from Mark Wilson
Time: Tuesday 17 March 2009, 23:14

Sure, Microsoft Outlook will connect to an IMAP server and store mail in a PST file (unless you’re running an SMTP server locally, SMTP is for outbound mail only). Outlook Express/Windows Mail could also be used to take a copy of the mail – or Mozilla Thunderbird will do the same for an MBOX file. All of those options involve installing yet another bloated app though, and there’s something to be said for a few simple commands running in the background and watching some text scroll by on a Terminal window every 10 minutes.

I know… I’m a geek.

3

Comment from Mark Wilson
Time: Monday 6 April 2009, 21:17

One more reason to use this method… it runs in the background so, even if I forget to launch a client application (e.g. Outlook), my mail is still backed up.

4

Comment from Joe Shaw
Time: Sunday 14 June 2009, 20:48

Strangely, although the Google Mail web interface says that there are 3268 items in my All Mail folder, getmail finds 5320 (and, thankfully, doesn’t seem to include the spam, which would only account for 1012 of the difference anyway).

The 3268 items in gmail are the number of conversations. The 5320 are actual individual emails.

5

Comment from Mark Wilson
Time: Monday 15 June 2009, 23:09

@Joe – thank you for clearing that up for me – it should have been obvious really!

Write a comment

Please note the rules for comments and the privacy policy and data protection notice. I'm sorry but, because not everyone sticks to the rules, I've had to implement some spam prevention measures - if you're experiencing difficulties leaving a comment, please let me know.





The following XHTML tags may be used: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>