More on NoSQL, Hadoop and Microsoft’s entry to the world of big data

This content is 12 years old. I don't routinely update old blog posts as they are only intended to represent a view at a particular point in time. Please be warned that the information here may be out of date.

Yesterday, my article on Microsoft’s forays into the world of big data went up on Cloud Pro. It’s been fun learning a bit about the subject (far more than is in that article – because big data is a big theme in my work at the moment) and I wanted to share some more info that didn’t fit into my allotted 1000 words.

Microsoft Fellow Dr David DeWitt gave an excellent keynote on Day 3 of the SQL PASS 2011 summit last month and it’s a great overview of how Hadoop works. Of course, he has a bias towards use of RDBMS systems but the video is well worth watching for it’s introduction to NoSQL, the differences between key value stores and Hadoop-type systems, and the description of the Hadoop components and how they fit together (skip the first 18 minutes and, if the stream doesn’t work, try the download – the deck is available too). Grant Fritchey and Jen McCown have written some great notes to go with Dr DeWitt’s keynote too.  For more about when you might use Hadoop, Jeremiah Peschka has a good post.

Microsoft’s SQOOP implementation is not the first – Cloudera have been integrating SQL and Hadoop for a couple of years now. Meanwhile, Buck Woody has a great overview of Microsoft’s efforts in the big data space.

I also mentioned Microsoft StreamInsight (formerly code-named “Austin”) in the post (the Complex Event Processing capability inside SQL Server 2008 R2) and Microsoft’s StreamInsight Team has posted what they call “the basics” of event processing. It seems to require coding, but is probably useful to anyone who is getting started with this stuff. For those of us who are a little less code-oriented, Andrew Fryer’s overview of StreamInsight (together with a more general post on CEP) is worth a read, together with Simon Munro’s post on where StreamInsight fits in.

Shortly after I sent my article to Cloud Pro’s Editor, I saw Mike Walsh’s “Microsoft Loves Your Big Data” post. I like this because it cuts through the press announcements and talks about what is really going on: interoperability; and becoming a player themselves. Critically:

“They aren’t copying, or borrowing or trying to redo… they are embracing”

And that is what I really think makes a refreshing change.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.