Category: Technology

  • More on NoSQL, Hadoop and Microsoft’s entry to the world of big data

    Yesterday, my article on Microsoft’s forays into the world of big data went up on Cloud Pro. It’s been fun learning a bit about the subject (far more than is in that article – because big data is a big theme in my work at the moment) and I wanted to share some more info that didn’t fit into my allotted 1000 words.

    Microsoft Fellow Dr David DeWitt gave an excellent keynote on Day 3 of the SQL PASS 2011 summit last month and it’s a great overview of how Hadoop works. Of course, he has a bias towards use of RDBMS systems but the video is well worth watching for it’s introduction to NoSQL, the differences between key value stores and Hadoop-type systems, and the description of the Hadoop components and how they fit together (skip the first 18 minutes and, if the stream doesn’t work, try the download – the deck is available too). Grant Fritchey and Jen McCown have written some great notes to go with Dr DeWitt’s keynote too.  For more about when you might use Hadoop, Jeremiah Peschka has a good post.

    Microsoft’s SQOOP implementation is not the first – Cloudera have been integrating SQL and Hadoop for a couple of years now. Meanwhile, Buck Woody has a great overview of Microsoft’s efforts in the big data space.

    I also mentioned Microsoft StreamInsight (formerly code-named “Austin”) in the post (the Complex Event Processing capability inside SQL Server 2008 R2) and Microsoft’s StreamInsight Team has posted what they call “the basics” of event processing. It seems to require coding, but is probably useful to anyone who is getting started with this stuff. For those of us who are a little less code-oriented, Andrew Fryer’s overview of StreamInsight (together with a more general post on CEP) is worth a read, together with Simon Munro’s post on where StreamInsight fits in.

    Shortly after I sent my article to Cloud Pro’s Editor, I saw Mike Walsh’s “Microsoft Loves Your Big Data” post. I like this because it cuts through the press announcements and talks about what is really going on: interoperability; and becoming a player themselves. Critically:

    “They aren’t copying, or borrowing or trying to redo… they are embracing”

    And that is what I really think makes a refreshing change.

  • SQL Server and Hadoop – unlikely bedfellows but a powerful combination

    Big Data is hard to avoid – what does Microsoft’s embrace of Hadoop mean for IT Managers?

    There are two words that seem particularly difficult to avoid at the moment: big data. Infrastructure guys instinctivly shy away from data but such is its prevalence that big data is much more than just the latest IT buzzword and is becoming a major theme in our industry right now

    But what does “big data” actually mean? It’s one of those phrases that, like cloud computing earlier, it is being “adopted” by vendors to mean whatever they want it to.

    The McKinsey Global Institute describes big data as “the next frontier for innovation, competition and productivity” but, put simply, it’s about analysing masses of unstructured (or semi-structured) data which, until recently, was considered too expensive to do anything with.

    That data comes from a variety of sources including sensors, social networks and digital media and it includes text, audio, video, click-streams, log files and more. Cynics who scoff at the description of “big” data (what’s next, “huge” data?) miss the point that it’s not just about the volume of the data (typically many petabytes) but also the variety and frequency of that data. Some even refer to it as “nano data” because what we’re actually looking at is massive sets of very small data.

    Processing big data typically involves distributed computer systems and one project that has come to the fore is Apache Hadoop – a framework for development of open-source software for reliable, scalable distributed computing.

    Over the last few weeks though, there have been some significant announcements from established IT players, not all of whom are known for embracing open source technology. This indicates a growing acceptance for big data solutions in general and specifically for solutions that include both open- and closed- source elements.

    When Microsoft released a SQL Server-Hadoop (SQOOP) Connector,there were questions about what this would mean for CIOs and IT Managers who may previously have viewed technologies like Hadoop as a little esoteric.

    The key to understanding what this would mean would be understanding the two main types of data: structured and unstructured. Structured data tends to be stored in a relational database management system (RDBMS), for example Microsoft SQL Server, IBM DB2, Oracle 11G or MySQL.

    By structuring the data with a schema, tables, keys and all manner of relationships it’s possible to run queries (with a language like SQL) to analyse the data and techniques have developed over the years to optimise those queries. By contrast, unstructured data has no schema (at least not a formal one) and may be as simple as a set of files.  Structured data offers maturity, stability and efficiency but unstructured data offers flexibility.

    Secondly, there needs to be an understanding of the term “NoSQL”.  Commonly misinterpreted as an instruction (no to SQL), it really means not only SQL – i.e. there are some types of data that are not worth storing in an RDBMS.  Rather than following the database model of extract, transform and load (ETL), with a NoSQL system the data arrives and the application knows how to interpret the data, providing a faster time to insight from data acquisition.

    Just as there are two main types of data, there are two main types of NoSQL system: key/value stores (like MongoDB or Windows Azure Table Storage) can be thought of as NoSQL OLTP; Hadoop is more like NoSQL data warehousing and is particularly suited to storing and analysing massive data sets.

    One of the key elements towards understanding Hadoop is understanding how the various Hadoop components work together. There’s a degree of complexity so perhaps it’s best to summarise  by saying that the Hadoop stack consists of a highly distributed, fault tolerant, file system (HDFS) and the MapReduce framework for writing and executing distributed, fault tolerant, algorithms. Built on top of that are query languages (live Hive and Pig) and then we have the layer where Microsoft’s SQOOP connector sits, connecting the two worlds of structured and unstructured data.

    The trouble is that SQOOP is just a bridge – and not a particularly efficient one either – working on SQL data in the unstructured world involves subdivision of the SQL database so that MapReduce can work correctly.

    Because most enterprises have both the structured and unstructured data, we really need tools that allow us to analyse and manage data in multiple environments – ideally without having to go back and forth. That’s why there are  so many vendors jumping on the big data bandwagon but it seems that a SQOOP connector is not the only work Microsoft is doing in the big data space:

    In our increasingly cloudy world, infrastructure and platforms are rapidly becoming commoditised. We need to focus on software that allows us to derive value from data to gain some business value. Consider that Microsoft is only one vendor, then think about what Oracle, IBM, Fujitsu and others are doing. If you weren’t convinced before, maybe HP’s Autonomy purchase is starting to make sense now?

    Looking specifically at Microsoft’s developments in the big data world, it therefore makes sense to see the company get closer to Hadoop. The world has spoken and the de facto solution for analysing large data sets seems to be HDFS/MapReduce/Hive (or similar).

    Maybe Hadoop’s success comes down to HDFS and MapReduce being based on work from Google whilst Hive and Pig are supported by Facebook and Yahoo respectively (i.e. they are all from established Internet businesses).  But, by embracing Hadoop (together with porting its tools to competitive platforms), Microsoft is better placed to support the entire enterprise with both their structured and unstructured needs.

    [This post was originally written as an article for Cloud Pro.]

  • Accessing my iCloud photostream from a Windows PC

    I use a lot of Apple products and, not surprisingly, when iOS5 was released, I upgraded my iPhone and my iPad. One of the big advancements with iOS5 is the integration with iCloud, Apple’s cloud service for synchronising data between devices so, when I took a look a few days later I was a bit confused. From a Windows PC I logged in and saw links for Mail, Contacts, Calendar, Find My iPhone and iWork – all with familiar icons but I couldn’t fathom is where my photostream is. Certainly not visible in iCloud…

    It turns out that there is a separate application needed to sync an iCloud photostream with a Windows PC. I installed it, it crashed (something to do with being behind our proxy servers at work, I think) but after a PC reboot and connection to my home network, photos from my iOS devices started showing up in the %userprofile%\My Pictures\Photo Stream\My Photo Stream folder.  The iCloud Control Panel for Windows also integrates with Safari 5.1.1 or Internet Explorer 8 bookmarks and with Outlook 2007 Contacts and Calendars).

    All I need now is the ability to sync ActiveSync contacts from my iPad (the ones I have in Office 365)… I wish.

  • SharePoint, Dropbox, and shadow IT

    This morning I had a problem with SharePoint. Well, when I say the problem was with SharePoint, it could be considered a “layer 8 problem” (i.e. user error) but it still illustrates a major issue  with corporate IT provision – not just in my organisation but in many, many businesses, all over the world.

    You see, last night, I uploaded a presentation to our intranet. It was a 20MB file over an ADSL/VPN connection and the browser upload session timed out so I used SharePoint’s Windows Explorer view (which I think is WebDAV).  The file was copied, I edited the properties in the browser and all was good, I thought.

    Fast forward to this morning and people were telling me the links to the presentation in my team’s newsletter didn’t work. But they did for me… embarrassingly (because the newsletter goes right up the company – to CEO level), I sent an email with the correct link in naked form (horrible long URL, rather than as a hyperlink on some nice text) but people were still getting HTTP 404 responses (file not found).

    To cut a long story short, the WebDAV upload had not checked in the file (by design, I now think) and even editing the properties afterwards didn’t. I could see the file, but no-one else could. Once the file was checked in all was well – except from  my red face (and my insistence that HTTP 404 isn’t a permissions error – that would be 403).

    I lost a good chunk of this morning on this and the related clean-up activities when, essentially, all I wanted to do was share a file with some colleagues – a common business requirement that shouldn’t really be a problem in 2011. So I tweeted:

    [blackbirdpie url=”http://twitter.com/markwilsonit/statuses/132406672181305344″]

    I expected a deluge of people supporting SharePoint and telling me that I’m just a dumb user, what I actually got was RTs (showing this is not just an issue for me) and then a succession of people suggesting various Dropbox-like  products for that could be used by corporates.

    Lots of people are suggesting Box.net and there’s Dropbox for TeamsOxygenCloud and ShareFile too. I suppose, taken at face value this sort of product is exactly what my tweet asked for but it’s not really a corporate version of Dropbox that I need – it’s the simplicity of Dropbox (dump “stuff” in a folder and it’s wherever I need it – in the cloud, on other machines, available to share with others, etc.) – I’m sure there are many solutions that do this, with varying degrees of success (just that Microsoft SharePoint is not one of them…). But technology is only one part of the issue.

    My scenario (and the reason I’m writing this) is actually a perfect example of why we have shadow IT in organisations today. End users (consumers) want to do “something”. That “something” is hard to do with their enterprise tools, so they find another way around the problem. Over time that solution becomes embedded – that’s when the problems start for the CIO (or, maybe, for the individual who didn’t follow the stated IT policy…). Those problems generally boil down to one of two things: security and manageability. In this case, the file is already available on SlideShare, but it could have been something confidential – like the business model I was creating yesterday afternoon – and that wouldn’t have been something I wanted floating around on servers that my company doesn’t control.

    I’m sure that the multitude of “solutions” to my problem are all great in their own way but if I start to use them, well, all I’ll really be doing is perpetuating the issue of shadow IT.

    (Incidentally, I did come across some interesting projects from the responses I received: remember Novell iFolder? it’s still around in open source form from Kablink; and VMware’s Project Octopus could have potential too.)

  • Useful Links: October 2011

    A list of items I’ve come across recently that I found potentially useful, interesting, or just plain funny:

  • Virtual Worlds (@stroker at #DigitalSurrey)

    Last night saw another Digital Surrey event which, I’ve come to find, means another great speaker on a topic of interest to the digerati in and around Farnham/Guildford (although I also noticed that I’m not the only “foreigner” at Digital Surrey with two of the attendees travelling from Brighton and Cirencester).

    This time the speaker was Lewis Richards, Technical Portfolio Director in the Office of Innovation at CSC, and the venue was CSC’s European Innovation Centre.  Lewis spoke with passion and humour about the development of virtual worlds, from as far back as the 18th century. As a non-gamer (I do have an Xbox, but I’m not heavily into games), I have to admit quite a lot of it was all new to me, but fascinating nevertheless, and I’ve drawn out some of the key points in this post (my edits in []) – or you can navigate the Prezi yourself – which Lewis has very kindly made public!

    • The concept of immersion (in a virtual world) has existed for more than 200 years:
    • In 1909, E.M Forster wrote The Machine Stops, a short story in which everybody is connected, with a universal book for reference purposes, and communications concepts not unlike today’s video conferencing – this was over a hundred years ago!
    • In [1957], Morton Heilig invented the Sensorama machine which allowed viewers to enter a world of virtual reality with a combination of film and mechanics for a 3D, stereo experience with seat vibration, wind in the hair and smell to complete the illusion.
    • The first heads up displays and virtual reality headsets were patented in the 1960s (and are not really much more usable today).
    • In 1969, ARPANET was created – the foundation of today’s Internet and the world wide web [in 1990].
    • In [1974], the roleplay game Dungeons and Dragons was created (in book form), teaching people to empathise with virtual characters (roleplay is central to the concept of virtual worlds); the holodeck (Rec Room) was first referenced in a Star Trek cartoon in 1974; and, back in 1973, Myron Krueger had coined the term artificial reality [Krueger created a number of virtual worlds in his work (glowflow, metaplay, physic space, videoplace)].
    • Lewis also showed a video of a “B-Spline Control” which is not unlike the multitouch [and Kinect tracking] functionality we take for granted today – indeed, pretty much all of the developments from the last 40-50 years have been iterative improvements – we’ve seen no real paradigm shifts.
    • 1980s developments included:
      • William Gibson coined the term cyberspace in his short stories (featured in Omni magazine).
      • Disney’s Tron; a film which still presents a level of immersion to which we aspire today.
      • The Quantum Link network  service, featuring the first multiplayer network game (i.e. not just one player against a computer).
    • In the 1990s, we saw:
      • Sir Tim Berners-Lee‘s World Wide Web [possibly the biggest step forward in online information sharing, bringing to life E.M. Forster’s universal book].
      • The first use of the term avatar for a digital manifestation of oneself (in Neal Stephenson’s Snow Crash).
      • Virtual reality suits
      • Sandboxed virtual worlds (AlphaWorld)
      • Strange Days, with the SQUID (Super-conducting Quantum Interference Device) receptor – still looking for immersion – getting inside the device – and The Matrix was still about “jacking in” to the network.
      • Virtual cocoons (miniaturised, electronic, versions of the Sensorama – but still too intrusive for mass market adoption)
    • The new millennium brought Second Life (where, for a while, almost every large corporation had an island) and World of Warcraft (WoW) – a behemoth in terms of revenue generation – but virtual worlds have not really moved forward. Social networking blinded us and took the mass market along a different path for collaboration; meanwhile kids do still play games and virtual reality is occuring – it’s just not in the mainstream.
    • Lewis highlighted how CSC uses virtual worlds for collaboration; how they can also be used as training aids; and how WoW encouraged team working and leadership, and how content may be created inside virtual worlds with physical value (leading to virtual crime).
    • Whilst virtual reality is not really any closer as a consumer concept than in 1956 there are some real-world uses (such as virtual reality immersion being used to take away feelings of pain whilst burns victims receive treatment).
    • Arguably, virtual reality has become, just, “reality” – everywhere we go we can communicate and have access to our “stuff” – we don’t have to go to a virtual world but Lewise asks if we will ever give up the dream of immersion – of “jacking in” [to the matrix].
    • What is happening is augmented reality – using our phone/tablets, etc. to interact between physical and virtual worlds. Lewis also showed some amazing concepts from Microsoft Research, like OmniTouch, using a short-range depth camera and a pico projector to turn everyday objects into a surface to interact with; and Holodesk for direct 3D interactions.
    • Lewis explained that virtual worlds are really a tool – the innovation is in the technology and the practical uses are around virtual prototyping, remote collaboration, etc. [like all innovations, it’s up to us to find a problem, to which we can apply a solution and derive value – perhaps virtual worlds have tended to be a technology looking for a problem?]
    • Lewis showed us CSC’s Teleplace, a virtual world where colleagues can collaborate (e.g. for virtual bid rooms and presentations), saving a small fortune in travel and conference call costs but, just to finish up with a powerful demo, he asked one of the audience for a postcode, took the Google Streetview URL and pasted it into a tool called Blue Mars Lite – at which point his avatar could be seen running around inside Streetview. Wow indeed! That’s one virtual world in which I have to play!
  • Big things are happening

    I saw a great video from Cisco this morning. The fact it’s from Cisco isn’t really relevant (indeed, if I showed it without the last few seconds you woudn’t know) but it’s a great example of how IT is shaping the world that we live in (or, maybe, how the world we live in is driving technology):

    In case you can’t see the video above, here are some of the key statistics it contains:

    • Humans created more data in 2009 alone than in all previous years combined.
    • Over the last 15 years, network speeds have increased 18 million times.
    • Information is moving to the cloud; 8/10 IT Managers plan to use cloud computing within the next 3 years.
    • By 2015, tools and automation will eliminate 25% of IT labour hours.
    • We’re using multiple devices: by 2015 there will be nearly one mobile-connected device for every person on earth.
    • 2/3 of employees believe they should be able to access information using company-issued devices at any time, at any location.
    • 60% believe they don’t need to be in an office to be productive.
    • This is creating entirely new forms of collaboration.
    • “The real impact of the information revolution isn’t about information management but on relationships; the ability to allow not dozens, or hundreds, but thousands of people to meaningfully interact” [Dr Michael Schrage, MIT].
    • By 2015 companies will generate 50% of web sales via their social presence and mobile applications.
    • Social business software will become a $5bn business by 2013.
    • Who sits at the centre of all this? Who is managing these exponential shifts? The CIO.

    Of course, we might expect to see many of these figures cited by a company selling social collaboration software and networking equipment but they are a good indication of the way things are heading.  I would place more emphasis on empowered employees and customers redefining IT provisioning (BYO, for example); on everything as a service (XaaS) changing the IT delivery model, on the need for a new architecture to manage the “app Internet”; and on big data – which will be a key theme for the next few years.

    Whatever the technologies underpinning the solution – the overall direction is for IT to provide business services that add value and enhance business agility rather than simply being part of “the cost of doing business” – maybe we need more videos like this to help us think about the possibilities?

  • Is technology at the heart of business, or is it simply an enabler?

    I saw a video from Cisco this morning, and found it quite inspirational. The fact it’s from Cisco isn’t really relevant (indeed, if I showed it without the last few seconds you woudn’t know) but it’s a great example of how IT is shaping the world that we live in – or, more precisely, how the world is shaping the direction that IT is taking:

    In case you can’t see the video above, here are some of the key statistics it contains:

    • Humans created more data in 2009 alone than in all previous years combined.
    • Over the last 15 years, network speeds have increased 18 million times.
    • Information is moving to the cloud; 8/10 IT Managers plan to use cloud computing within the next 3 years.
    • By 2015, tools and automation will eliminate 25% of IT labour hours.
    • We’re using multiple devices: by 2015 there will be nearly one mobile-connected device for every person on earth;
    • 2/3 of employees believe they should be able to access information using company-issued devices at any time, at any location;
    • 60% believe they don’t need to be in an office to be productive;
    • This is creating entirely new forms of collaboration.
    • “The real impact of the information revolution isn’t about information management but on relationships; the ability to allow not dozens, or hundreds, but thousands of people to meaningfully interact” [Dr Michael Schrage, MIT].
    • By 2015 companies will generate 50% of web sales via their social presence and mobile applications.
    • Social business software will become a $5bn business by 2013.
    • Who sits at the centre of all this? Who is managing these exponential shifts? The CIO.

    Some impressive numbers here – and we might expect to see many of these figures cited by a company selling social collaboration software and networking equipment but they are a good indication of the way things are heading.  I would place more emphasis on empowered employees and customers redefining IT provisioning (BYO, for example); on everything as a service (XaaS) changing the IT delivery model; on the need for a new architecture to manage the “app Internet”; and on big data – which will be a key theme for the next few years.

    Whatever the technologies underpinning the solution – the overall direction is for IT to provide business services that add value and enhance business agility rather than simply being part of “the cost of doing business”.

    I think Cisco’s video does a rather good job of illustrating the change that is occurring but the real benefits come when we are able to use technology as an enabler for business services that create new opportunities, rather than responding to existing pressures.

    I’d love to hear what our customers, partners and competitors think – is technology at the heart of the digital revolution, or is it simply an enabler for new business services?

    [This post originally appeared on the Fujitsu UK and Ireland CTO Blog and was written with assistance from Ian Mitchell.]

  • Cloudwashing

    Cloud this, cloud that: frankly I’m tired of hearing about “the cloud” and, judging from the debate I’ve had on Twitter this afternoon, I’m not alone.

    The trouble is that the term “cloud” has been abused and has become a buzzword (gamification is another – big data could be next…).

    I don’t doubt the advantages of cloud computing – far from it – it’s a fantastically powerful business model and it’s incredibly disruptive in our industry. And, like so many disruptive innovations, organisations are faced with a choice – to adopt the disruptive technology or to try and move up the value chain. (Although, in this case, why not both? Adopt the disruptive tech and move up the value chain?)

    My problem with cloud marketing is not so much about over-use of the term, it’s about the mis-use of it. And that’s confused the marketplace. There is a pretty good definition of cloud from the American National Institute of Science and Technology (NIST) but it’s missing some key service models (data as a service, business process as a service) so vendors feel the need to define their own “extensions”.

    My point is that cloud is about the business model, about how the service is provided, about some of the essential characteristics that provide flexibility in IT operation. That flexibility allows the business to become more responsive to change and, in turn, the CIO may more quickly deliver the services that the CEO asks of them.

    It’s natural that business to business (B2B) service providers include cloud as a major theme in their marketing (indeed, in their continued existence as a business).  That’s because delivery of business services and the mechanisms used to ensure that the service is responsive to business needs (on demand self-service, broad network access, resource pooling, rapid elasticity, and measured service) are crucial. Unfortunately, “the cloud” has now crossed the divide into the business to consumer (B2C) space and that’s where it all starts to turn bad.

    At the point where “the cloud” is marketed to consumers it is watered down to be meaningless (ignoring the fact that “the cloud” is actually many “clouds”, from multiple providers). So often “the cloud” is really just a service offered via the Internet. Consumers don’t care about “the cloud” – they just want their stuff, when they want it, where they want it, for as little financial outlay as possible. To use an analogy from Joe Baguley, Chief Cloud Technologist, EMEA at VMware – “you don’t market the electricity grid, you market the electricity and the service, not the infra[structure]”.

    I’d like to suggest that marketing cloud to consumers is pointless and, ultimately, it’s diluting the real message: that cloud is a way of doing business, not about a particular technology. What do you think?

  • Rebuilding my site: please excuse the appearance

    Regular readers may have noticed that this site is looking a little… different… right now.

    Unfortunately, my hosting provider told me last night that they had a disk failure on the server. Normally that wouldn’t be a problem (that’s why servers have redundant components right? Like RAID on the disks?) but it seems this “server” is just a big PC. I can’t get too mad though… the MySQL database backup scripts have been failing for a month and it was my sloppyness that didn’t chase that up, and it was me who hadn’t made sure I had a recent copy of the file system…

    So, as things stand:

    • I think I have restored all posts from 2004 until almost the end of August 2011;
    • I need to restore the later posts and comments (using copies from FeedBlitz, Google Reader, etc.);
    • There are no plugins (so things look odd); Some of the plugins have been reinstalled (but things may still look odd);
    • There are no graphics (they were hosted outside WordPress) I’ve restored all most of the graphics and other external media but there are still some I need to track down;
    • I have not restored the theme (so I’m using the WordPress defaults and there is no mobile theme);
    • The theme I’m using does not specify UTF-8 encoding so lots of  characters; Still some spurious characters appearing on some pages…
    • There are no fewer ads (which you might be happy about, but I do still need to pay the bills).

    Please bear with me whilst I get things back… it may take some time as it needs to fit in between other activities but it might also be a good thing (new theme has been long overdue and I might even get smarter about my backups…).

    And, if you spot another problem, please let me know.

    [Updated at various points as the site has been restored]