markwilson.it

Souping up SyncToy

Posted on Wednesday 9 November 2011Tuesday 8 November 2011 By Mark Wilson

This content is 14 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

I used to back up my work PC to a set of Virtual Hard Disk (.VHD) files until one day I needed to recover from a failure, and I found that the hard drive encryption software we use prevented me from running a restore. That forced me to find another solution and one of my ReadyNAS devices (sadly not the one that recently suffered two disk failures on the same RAID 1 volume, taking with it a big chunk of my data) is now dedicated to backing up my work PC, with a regular file copy taking place.

I have a drive mapped to a share on the NAS and the command line version of Microsoft’s SyncToy tool (synctoycmd.exe) is set to run as a scheduled task every evening at 10pm. Then, at 11pm, the NAS powers down until 8am the next day. The idea is that, as long as my PC is connected to my home network, it backs up all of the important files, at a time by which I should have stopped working.

Unfortunately I’m not convinced that it’s working as it should be – just because the Windows 7 Task Scheduler tells me that the task completed doesn’t mean that SyncToy ran successfully (incidentally, if you are having problems with SyncToy on Windows 7, this thread might help). I was googling for a solution and came across eXDee’s batch files (sometimes the old ways are the best) to check for network connectivity, presence of the appropriate volume and then run synctoycmd.exe, recording a log file on the way. Bingo.

So, here are my versions (only minor updates from eXDee’s originals), called each night from Task Scheduler and a simple check of the lastsync.log file should tell me whether the backup worked or not.

Incidentally, don’t be fooled (as I was) by the synctoycmd.exe output that says it saved time by not copying any files. That’s the output from the preview run and there is a long period after this during which there are no status updates whilst the actual file copies take place.

synctoy.bat

This is the control file, to be called from Task Scheduler or run manually from the command line:
@echo off
title SyncToy run in progress…
echo Attempting file sync. Please wait…
sync.bat >lastsync.log

sync.bat

This is the file that checks for the presence of my NAS and for a mapped drive before it backs up my data. You’ll need to subsititue your own IP address but I’m particularly impressed by eXDee’s code to look for a TTL rather than a ping success/failure (smart move). Note I haven’t mapped a drive if the connection is not there, although that is a possible enhancement:
@echo off
echo SyncToy Log starting at
time /T
date /T
echo ##############################################
echo Checking connection to NAS…
echo ##############################################
PING -n 2 -w 10 192.168.1.14 |find “TTL=” && goto NAS
goto PINGFAIL

:NAS
echo ##############################################
echo NAS is online. Checking for share…
if exist “F:\Synced with Company PC\” goto SYNC
goto NASFAIL

:SYNC
echo ##############################################
echo Drive is mapped. Begin syncing files…
echo ##############################################
cd “C:\Program Files\SyncToy 2.1\”
SyncToyCmd.exe -R
if %ERRORLEVEL% == 0 goto SUCCESS
goto SYNCFAIL

:PINGFAIL
echo ##############################################
echo NAS not found. Exiting
goto END

:NASFAIL
echo ##############################################
echo Share not found. Exiting
goto END

:SUCCESS
echo ##############################################
echo Synctoy finished successfully. Exiting
goto END

:SYNCFAIL
echo ##############################################
echo Synctoy Failed. Exiting
goto END

:END
echo ##############################################
echo Synctoy Log ending at
time /T
date /T

lastsync.log

An example of a run (the failures were down to file access, rather than any issue with the scripts):

SyncToy Log starting at
21:00
08/11/2011
##############################################
Checking connection to NAS…
##############################################
Reply from 192.168.1.14: bytes=32 time=3ms TTL=64
Reply from 192.168.1.14: bytes=32 time=39ms TTL=64
##############################################
NAS is online. Checking for share…
##############################################
Drive is mapped. Begin syncing files…
##############################################
Preview of Work Folder Backup (C:\Users\markw\Documents\Work\, F:\Synced with company PC\Work\) in time 00:03:08:253.
SyncToy action was ‘Echo’
Found 2 actions to perform.
Found 47,158 files that did not require action.
Analyzed 250.5 files per second.
Avoided copying 135,013,767,205 bytes in 47,158 files.
Saved approximately 03:00:27:00 by not copying any files.

SyncToy run of Work Folder Backup (C:\Users\markw\Documents\Work\, F:\Synced with company PC\Work\) completed at 08/11/2011 21:03:27.
SyncToy action was ‘Echo’.
SyncToy options were:
Active for run all
All files included
No files excluded
Do not check file contents
Include read-only files
Include hidden files
Include system files
Backup older files (send to Recycle Bin)
All subfolders included
SyncToy run took 00:00:00:610.
Copied 5,932,607,488 bytes in 2 files in 00:00:00:610.
Bytes per second 9,725,586,045.9, files per second 3.3.
Avoided copying 135,013,767,205 bytes in 47,158 files that did not require action.
Saved approximately 00:00:13:882 by not copying all files.
Warning: 4 failures occured.
You can retry by selecting “Run” again or select “Preview” to see
the operations that remain to be performed.

The Sync operation completed successfully on folder pair ‘Work Folder Backup’ but some files were skipped. Please look at the logs for more details.
##############################################
Synctoy Failed. Exiting
##############################################
Synctoy Log ending at
21:03
08/11/2011

More on NoSQL, Hadoop and Microsoft’s entry to the world of big data

Posted on Tuesday 8 November 2011Monday 7 November 2011 By Mark Wilson

Yesterday, my article on Microsoft’s forays into the world of big data went up on Cloud Pro. It’s been fun learning a bit about the subject (far more than is in that article – because big data is a big theme in my work at the moment) and I wanted to share some more info that didn’t fit into my allotted 1000 words.

Microsoft Fellow Dr David DeWitt gave an excellent keynote on Day 3 of the SQL PASS 2011 summit last month and it’s a great overview of how Hadoop works. Of course, he has a bias towards use of RDBMS systems but the video is well worth watching for it’s introduction to NoSQL, the differences between key value stores and Hadoop-type systems, and the description of the Hadoop components and how they fit together (skip the first 18 minutes and, if the stream doesn’t work, try the download – the deck is available too). Grant Fritchey and Jen McCown have written some great notes to go with Dr DeWitt’s keynote too. For more about when you might use Hadoop, Jeremiah Peschka has a good post.

Microsoft’s SQOOP implementation is not the first – Cloudera have been integrating SQL and Hadoop for a couple of years now. Meanwhile, Buck Woody has a great overview of Microsoft’s efforts in the big data space.

I also mentioned Microsoft StreamInsight (formerly code-named “Austin”) in the post (the Complex Event Processing capability inside SQL Server 2008 R2) and Microsoft’s StreamInsight Team has posted what they call “the basics” of event processing. It seems to require coding, but is probably useful to anyone who is getting started with this stuff. For those of us who are a little less code-oriented, Andrew Fryer’s overview of StreamInsight (together with a more general post on CEP) is worth a read, together with Simon Munro’s post on where StreamInsight fits in.

Shortly after I sent my article to Cloud Pro’s Editor, I saw Mike Walsh’s “Microsoft Loves Your Big Data” post. I like this because it cuts through the press announcements and talks about what is really going on: interoperability; and becoming a player themselves. Critically:

“They aren’t copying, or borrowing or trying to redo… they are embracing”

And that is what I really think makes a refreshing change.

Handy to know about: fuel cover emergency release on an Audi A4

Posted on Monday 7 November 2011Tuesday 8 November 2011 By Mark Wilson

A couple of days ago, my wife called me and said the low fuel warning light had come on on my car as she set out to take the kids swimming (a 25 mile round trip). “No worries”, I said, “you’ve got enough to get home – I’ll fill it up later”. Fast forward to today, when I drove to the filling station only to find that the cover on the fuel filler cap (controlled by the central locking) wouldn’t open. Thankfully, I was close to home, so I went back (fuel range now showing as 5 miles!) and called the lease company’s breakdown service, who said I might have to wait up to 90 minutes for a technician. Not great, but acceptable – and at least I was home.

A few minutes later I got a call from Volkswagen/Audi Assistance and 15 minutes after that the technician was on site (the RAC provide the Volkswagen/Audi Assistance service – but with dedicated technicians, so a different queue).

I explained the problem and he tried (and failed) to open the fuel cover the same way that I did… then he popped open the boot, removed a cover and pulled on a wire – which promptly opened the offending fuel door. Result! If only I’d known about it at the petrol station an hour earlier. (For reference, the car is a 2009 Audi A4 Avant – the B8 model – but I wouldn’t be surprised if the A5 has a similar mechanism.)

So full marks to VW/Audi Assistance – both for the rapid response and for following me to the filling station in case I ran out of diesel on the way.

And, for anyone else with a fuel cover that’s linked to the central locking on the car, it might be worth checking if there is an emergency release…

SQL Server and Hadoop – unlikely bedfellows but a powerful combination

Posted on Monday 7 November 2011Saturday 14 January 2017 By Mark Wilson

Big Data is hard to avoid – what does Microsoft’s embrace of Hadoop mean for IT Managers?

There are two words that seem particularly difficult to avoid at the moment: big data. Infrastructure guys instinctivly shy away from data but such is its prevalence that big data is much more than just the latest IT buzzword and is becoming a major theme in our industry right now

But what does “big data” actually mean? It’s one of those phrases that, like cloud computing earlier, it is being “adopted” by vendors to mean whatever they want it to.

The McKinsey Global Institute describes big data as “the next frontier for innovation, competition and productivity” but, put simply, it’s about analysing masses of unstructured (or semi-structured) data which, until recently, was considered too expensive to do anything with.

That data comes from a variety of sources including sensors, social networks and digital media and it includes text, audio, video, click-streams, log files and more. Cynics who scoff at the description of “big” data (what’s next, “huge” data?) miss the point that it’s not just about the volume of the data (typically many petabytes) but also the variety and frequency of that data. Some even refer to it as “nano data” because what we’re actually looking at is massive sets of very small data.

Processing big data typically involves distributed computer systems and one project that has come to the fore is Apache Hadoop – a framework for development of open-source software for reliable, scalable distributed computing.

Over the last few weeks though, there have been some significant announcements from established IT players, not all of whom are known for embracing open source technology. This indicates a growing acceptance for big data solutions in general and specifically for solutions that include both open- and closed- source elements.

When Microsoft released a SQL Server-Hadoop (SQOOP) Connector,there were questions about what this would mean for CIOs and IT Managers who may previously have viewed technologies like Hadoop as a little esoteric.

The key to understanding what this would mean would be understanding the two main types of data: structured and unstructured. Structured data tends to be stored in a relational database management system (RDBMS), for example Microsoft SQL Server, IBM DB2, Oracle 11G or MySQL.

By structuring the data with a schema, tables, keys and all manner of relationships it’s possible to run queries (with a language like SQL) to analyse the data and techniques have developed over the years to optimise those queries. By contrast, unstructured data has no schema (at least not a formal one) and may be as simple as a set of files. Structured data offers maturity, stability and efficiency but unstructured data offers flexibility.

Secondly, there needs to be an understanding of the term “NoSQL”. Commonly misinterpreted as an instruction (no to SQL), it really means not only SQL – i.e. there are some types of data that are not worth storing in an RDBMS. Rather than following the database model of extract, transform and load (ETL), with a NoSQL system the data arrives and the application knows how to interpret the data, providing a faster time to insight from data acquisition.

Just as there are two main types of data, there are two main types of NoSQL system: key/value stores (like MongoDB or Windows Azure Table Storage) can be thought of as NoSQL OLTP; Hadoop is more like NoSQL data warehousing and is particularly suited to storing and analysing massive data sets.

One of the key elements towards understanding Hadoop is understanding how the various Hadoop components work together. There’s a degree of complexity so perhaps it’s best to summarise by saying that the Hadoop stack consists of a highly distributed, fault tolerant, file system (HDFS) and the MapReduce framework for writing and executing distributed, fault tolerant, algorithms. Built on top of that are query languages (live Hive and Pig) and then we have the layer where Microsoft’s SQOOP connector sits, connecting the two worlds of structured and unstructured data.

The trouble is that SQOOP is just a bridge – and not a particularly efficient one either – working on SQL data in the unstructured world involves subdivision of the SQL database so that MapReduce can work correctly.

Because most enterprises have both the structured and unstructured data, we really need tools that allow us to analyse and manage data in multiple environments – ideally without having to go back and forth. That’s why there are so many vendors jumping on the big data bandwagon but it seems that a SQOOP connector is not the only work Microsoft is doing in the big data space:

SQL Server 2008 R2 includes a complex event processing (CEP) capability called StreamInsight. The principle is that streams of data can be monitored, managed and mined for particular events (instead of running queries across data, run the data through a set of queries looking for matches) and this can help organisations to respond quickly to new opportunities – maybe even adopting a predictive business model.
The next version of SQL Server will include a new data analysis tool called Power View which will even be supported on competitive mobile operating systems (including iOS and Android).
Windows Azure includes table storage – a key/value pair storage solution with partitioning.
Also on Azure, Microsoft is creating a new Data Explorer tool to create rich data sets that can be published as a service and an iterative MapReduce runtime codenamed “Daytona” for scaling data analytics across hundreds of processing cores.
Microsoft is also creating new implementations of the Hadoop stack for Windows Azure and Windows Server (including a Hive ODBC driver and a Hive Add-in for Excel) but it also has a competing technology called LINQ to HPC (formerly codenamed Dryad) that allows a Windows High Performance Compute (HPC) cluster to not only perform parallel computing but also to integrate with Azure (the theory behind this is that big data jobs are typically I/O-bound, rather than compute-bound).

In our increasingly cloudy world, infrastructure and platforms are rapidly becoming commoditised. We need to focus on software that allows us to derive value from data to gain some business value. Consider that Microsoft is only one vendor, then think about what Oracle, IBM, Fujitsu and others are doing. If you weren’t convinced before, maybe HP’s Autonomy purchase is starting to make sense now?

Looking specifically at Microsoft’s developments in the big data world, it therefore makes sense to see the company get closer to Hadoop. The world has spoken and the de facto solution for analysing large data sets seems to be HDFS/MapReduce/Hive (or similar).

Maybe Hadoop’s success comes down to HDFS and MapReduce being based on work from Google whilst Hive and Pig are supported by Facebook and Yahoo respectively (i.e. they are all from established Internet businesses). But, by embracing Hadoop (together with porting its tools to competitive platforms), Microsoft is better placed to support the entire enterprise with both their structured and unstructured needs.

[This post was originally written as an article for Cloud Pro.]

Accessing my iCloud photostream from a Windows PC

Posted on Monday 7 November 2011Friday 4 November 2011 By Mark Wilson

I use a lot of Apple products and, not surprisingly, when iOS5 was released, I upgraded my iPhone and my iPad. One of the big advancements with iOS5 is the integration with iCloud, Apple’s cloud service for synchronising data between devices so, when I took a look a few days later I was a bit confused. From a Windows PC I logged in and saw links for Mail, Contacts, Calendar, Find My iPhone and iWork – all with familiar icons but I couldn’t fathom is where my photostream is. Certainly not visible in iCloud…

It turns out that there is a separate application needed to sync an iCloud photostream with a Windows PC. I installed it, it crashed (something to do with being behind our proxy servers at work, I think) but after a PC reboot and connection to my home network, photos from my iOS devices started showing up in the %userprofile%\My Pictures\Photo Stream\My Photo Stream folder. The iCloud Control Panel for Windows also integrates with Safari 5.1.1 or Internet Explorer 8 bookmarks and with Outlook 2007 Contacts and Calendars).

All I need now is the ability to sync ActiveSync contacts from my iPad (the ones I have in Office 365)… I wish.

SharePoint, Dropbox, and shadow IT

Posted on Friday 4 November 2011Friday 4 November 2011 By Mark Wilson

This morning I had a problem with SharePoint. Well, when I say the problem was with SharePoint, it could be considered a “layer 8 problem” (i.e. user error) but it still illustrates a major issue with corporate IT provision – not just in my organisation but in many, many businesses, all over the world.

You see, last night, I uploaded a presentation to our intranet. It was a 20MB file over an ADSL/VPN connection and the browser upload session timed out so I used SharePoint’s Windows Explorer view (which I think is WebDAV). The file was copied, I edited the properties in the browser and all was good, I thought.

Fast forward to this morning and people were telling me the links to the presentation in my team’s newsletter didn’t work. But they did for me… embarrassingly (because the newsletter goes right up the company – to CEO level), I sent an email with the correct link in naked form (horrible long URL, rather than as a hyperlink on some nice text) but people were still getting HTTP 404 responses (file not found).

To cut a long story short, the WebDAV upload had not checked in the file (by design, I now think) and even editing the properties afterwards didn’t. I could see the file, but no-one else could. Once the file was checked in all was well – except from my red face (and my insistence that HTTP 404 isn’t a permissions error – that would be 403).

I lost a good chunk of this morning on this and the related clean-up activities when, essentially, all I wanted to do was share a file with some colleagues – a common business requirement that shouldn’t really be a problem in 2011. So I tweeted:

[blackbirdpie url=”http://twitter.com/markwilsonit/statuses/132406672181305344″]

I expected a deluge of people supporting SharePoint and telling me that I’m just a dumb user, what I actually got was RTs (showing this is not just an issue for me) and then a succession of people suggesting various Dropbox-like products for that could be used by corporates.

Lots of people are suggesting Box.net and there’s Dropbox for Teams, OxygenCloud and ShareFile too. I suppose, taken at face value this sort of product is exactly what my tweet asked for but it’s not really a corporate version of Dropbox that I need – it’s the simplicity of Dropbox (dump “stuff” in a folder and it’s wherever I need it – in the cloud, on other machines, available to share with others, etc.) – I’m sure there are many solutions that do this, with varying degrees of success (just that Microsoft SharePoint is not one of them…). But technology is only one part of the issue.

My scenario (and the reason I’m writing this) is actually a perfect example of why we have shadow IT in organisations today. End users (consumers) want to do “something”. That “something” is hard to do with their enterprise tools, so they find another way around the problem. Over time that solution becomes embedded – that’s when the problems start for the CIO (or, maybe, for the individual who didn’t follow the stated IT policy…). Those problems generally boil down to one of two things: security and manageability. In this case, the file is already available on SlideShare, but it could have been something confidential – like the business model I was creating yesterday afternoon – and that wouldn’t have been something I wanted floating around on servers that my company doesn’t control.

I’m sure that the multitude of “solutions” to my problem are all great in their own way but if I start to use them, well, all I’ll really be doing is perpetuating the issue of shadow IT.

(Incidentally, I did come across some interesting projects from the responses I received: remember Novell iFolder? it’s still around in open source form from Kablink; and VMware’s Project Octopus could have potential too.)

Movember 2011/Fit at 40 update

Posted on Tuesday 1 November 2011Saturday 26 November 2011 By Mark Wilson

Today marks the start of Movember and, although I’d like to support the Mo’ Bro’s and Sisters out there, unfortunately this year I won’t be sporting a ‘tache.

I grew one last year and, aside from the fact that Mrs. W. was less than impressed (quite happy with my usual face fuzz, but not with a dodgy moustache), it didn’t go down too well at work – Movember is just not established enough in the UK for me to meet with potential customers sporting dubious facial hair!

Even those of us who can’t take part in Movember can still support it virtually – all the Mo’ Bro’s are raising money for charities working with Prostate Cancer (so you could donate via the Movember website) or, alternatively, my Fit at 40 Challenge continues and I’m still working hard to raise money for The Prostate Cancer Charity at the same time as losing weight and getting fit.

So where am I at? To be perfectly honest, I’m a little behind where I would like to be but still making progress. Two weeks ago, I ran the Buckingham 10K (my second 10K race) and beat my London time, although I was disappointed that I couldn’t push hard on the downhill stretch at the end because my knee was hurting and I didn’t want to risk injury (stats). Thankfully it seems OK now – I’ve run a couple of 5-milers since without issue. Mixed in with some spinning, the occasional bike ride and some swimming, the exercise is going well and I’m starting to see the results. After months of not losing much weight (but clearly gaining muscle), I’m now noticing the difference on my belt loop, and am tantalisingly close to having shed the second stone.

So, I may not be able to grow a ‘tache for Movember but I can push hard on my fit at 40 challenge – if I get below 15st 10lbs (100kg) this month (and I certainly intend to), why not donate to The Prostate Cancer Charity via my JustGiving page?

Useful Links: October 2011

Posted on Monday 31 October 2011Monday 31 October 2011 By Mark Wilson

A list of items I’ve come across recently that I found potentially useful, interesting, or just plain funny:

Periodic table of the (HTML) elements – HTML5 elements represented as a periodic table (via Michael Gilletl)
Should I work for free? – Decision process for use when asked to work for free (clean version)
Expand URL – Useful URL Expander (API and web interface) – tested with t.co, bit.ly (and mwil.it)
Akamai real-time web monitor – Visualisation of real-time web stats from Akamai (via Tim Biller)
Floating Sheep – Some great visualisations – mapping trend data to geography
Kindlegraph – Autographed eBooks!

Why should the UK move to GMT+1? And do we really need “daylight saving time”?

Posted on Sunday 30 October 2011Sunday 30 October 2011 By Mark Wilson

As I wondered around the house this morning resetting a plethora of clocks (two ovens, four heating themostats, two wall clocks in the childrens’ bedrooms, fixed-line phone, two mobile phones, two cars – thankfully the computers, PVR, radio, some of the mobile phones and many other devices change themselves), I did have to wonder “what is the point?”. Maybe it’s because my son woke me at 5.30 (because his body clock has not suddenly changed overnight) and consequentially I’m grumpy…

At this time of year, the media churns out the same old stories (it must be great for editors – rehash the article from 6 months ago and put it out again) and the gist is always that someone wants to change the time zone that the UK operates on, the Scots don’t want to (because being 500 miles north does make quite a difference), meanwhile someone else thinks it would be good to be on the same time zone as the rest of Western Europe (except Ireland).

But this is England. The Greenwich Meridian runs through London. Universal time is based around Greenwich Mean Time. Why should we suddenly end up on GMT+1 in the winter and GMT+2 in the summer? Why do the clocks need to change at all?

Some will argue road safely concerns in the winter – but we either have to travel to work/school in the dark or travel home in the dark so that doesn’t make much sense to me. I believe that the original intention with daylight saving time was to increase the available hours for farming – except that farmers work all hours, and it doesn’t actually increase the number of hours in the day when we have daylight or darkness! With modern machinery the farmers where I live work through the night at harvest-time, so I don’t really see that being an issue.

Advocates of a switch in time zone are suggesting harmonisation for businesses in the UK and continental Europe to work the same hours. But how many businesses really work from 9-5? There are still local customs that lead to long lunches and late evenings in some parts of Europe; I certainly don’t get to go home because the clock has hit a certain time; and, anyway, business is global these days – we’re all well used to working across timezones.

If we do have to mess with the timezones, why would it even have to apply in Scotland (who are vehemently against any further variation in “daylight saving time”)? Unlike England, Wales and Northern Ireland, Scotland has it’s own parliament and can elect to do whatever it likes (the Welsh and Northern Irish assemblies have some powers too). Australia and the United States work across multiple timezones (I’m sure some other countries do too) – so could the UK, if that’s what it takes (although we are tiny in comparison)! I don’t know too much about the US (except that East cost is 5 hours behind us and the west coat is 8; and that their clocks change at a different time of year so I always miss at least one conference call!) but some Australian states don’t even observe DST. So why not abolish DST in Engalnd and Wales, and let Scotland decide what it wants to do? Northern Ireland could probably abolish DST too, but I can see why they would want to follow Éire’s lead.

It seems to me that daylight saving is an outdated concept, that (in England at least) causes disruption twice a year for no real advantage. So, instead of moving forward yet another hour, I propose we stay on GMT indefinitely!