Linked data: connecting and exploiting big data

Earlier this year, I gave a lightning talk on Structuring Big Data at the CloudCamp London Big Data Special – the idea being that, if we’re not careful, big data will provide yet another silo of information to manage and that linked data could be useful to connect the various data sources (transactional databases, data warehouses, and now big data too).

At the time I mentioned that this was part of a white paper that I was writing with my manager, Ian Mitchell (@IanMitchell2) and our paper on using linked data to connect and exploit big data has now been published on the Fujitsu website.

This week Oracle kicks off its Big Data and Extreme Analytics Summit and Fujitsu are one of the sponsors. An except from the paper is included in the conference brochure and I’ll be at the Manchester event next Tuesday – do come along and say hello if you’re at the event and, even if you’re not, please do check out the paper – I’d love to hear your feedback.

Re-architecting for the cloud and lower costs

One of the presentations I saw at the recent London Cloud Camp (and again at  Unvirtual) was Justin McCormack (@justinmccormack)’s lightning talk on “re-architecting for the ‘green’ cloud and lower costs” (is re-architecting a word? I don’t think so but re-designing doesn’t mean the same in this context!).

Justin has published his slides but he’s looking at ways to increase the scalability of our existing cloud applications. One idea is to build out parallel computing systems with many power-efficient CPUs (e.g. ARM chips) but Amdahl’s law kicks in so there is no real performance boost by building out – in fact, the line is almost linear so there is no compelling argument.

Instead, Justin argues that we currently write cloud applications that use a lot of memory (Facebook is understood to have around 200TB of memory cache). That’s because memory is fast and disk is slow. But with the advent of solid state devices we have something in between (that’s also low-power).

Instead of writing apps to live in huge RAM caches, we can use less memory, and more flash drives. The model is  not going to be suitable for all applications but it’s fine for “quite big data” – i.e. normal, medium latency applications. A low-power cloud is potentially a low-cost middle ground with huge cost saving potential, if we can write cloud applications accordingly.

Justin plans to write more on the subject soon – keep up with his thoughts on the  Technology of Content blog.

Will commoditisation drive us all to the public cloud (eventually)?

Tomorrow night, it’s CloudCamp London, which has prompted me to write a post based on one of the presentations from the last event in March.  I already wrote up Joe Baguley’s talk on why the consumerisation of IT is nothing to do with iPads but I also wanted to mention Simon Wardley (from the CSC Leading Edge Forum)’s introduction to CloudCamp.

As it happens, Simon already wrote a blog post that looks at the topic he covered (private vs. enterprise clouds) and his CloudCamp slides are below:

  • The basic principle is that, eventually, services trend towards utility services/commodities. There are some barriers to overcome along the way but commoditisation will always come.
  • One interesting phenomenon to note is the Jevons Paradox, whereby, as technology progresses and efficiency of resource usage rises, so does the rate of consumption. So, that kills off the theory that the move to cloud will decrease IT budgets!
  • For cloud purists, only a public cloud is really “cloud computing” but Simon talked about a continuum from legacy datacentres to “the cloud”. Hybrid clouds have a place in mitigating transitional risk.
  • Our legacy architectures leave us with a (legacy) problem. First came N+1 resilience but then we got better hardware; then we scaled out and designed for failure (e.g. API calls to rebuild virtual machines) using software and “good enough” components.
  • Using cloud architectures and resilient virtual machines we invented “the enterprise cloud”, sitting somewhere between a traditional datacentre and the public cloud.
  • But we need to achieve greater efficiencies – to do more, faster (even if the overall budget doesn’t increase due to the Jevons Paradox). To drive down the costs of providing each virtual machine (i.e. each unit of scale) we trade disruption and risk against operational efficiency. That drives us towards the public cloud.
  • In summary, Simon suggests that public utility markets are the future, with hybrid environments as a transition strategy. Enterprise clouds should be expected to trend towards niche roles (e.g. to deliver demanding servive level agreements or to meet specific security requirements) whilst increasing portability between clouds makes competing public cloud offerings more attractive.