Tag Archives: NoSQL

Technology

Big data according to the Oracle

After many years of working mostly with Microsoft infrastructure products, the time came for me to increase my breadth of knowledge and, with that, comes the opportunity to take a look at what some of the other big players in our industry are up to.  Last year, I was invited to attend the Oracle UK User Group Conference where I had my first experience of the world of Oracle applications; and last week I was at the Oracle Big Data and Extreme Analytics Summit in Manchester, where Fujitsu was one of the sponsors (and an extract from one of my white papers was in the conference programme).

It was a full day of presentations and I’m not sure that reproducing all of the content here makes a lot of sense, so here’s an attempt to summarise it… although even a summary could be a long post…

Big data trends, techniques and opportunities

Tim Jennings (@tjennings) from Ovum set the scene and explained some of the ways in which big data has the potential to change the way in which we work as businesses, citizens and consumers (across a variety of sectors).

Summing up his excellent overview of big data trends, techniques and opportunities, Tim’s key messages were that:

  1. Big data is characterised by volume, variety and velocity [I'd add value to that list].
  2. Big data represents a change in the mentality of analytics, away from precise analysis of well-bound sources to rough-cut exploratory analysis of all the data that’s practical to aggregate.
  3. Enterprise should identify business cases for big data and the techniques and processes required to exploit them.
  4. Enterprises should review existing business intelligence architectures and methods and plan the evolution towards a broader platform capable of handling the big data lifecycle.

And he closed by saying that “If you don’t think that big data is relevant to your organisation, then you are almost certainly missing an opportunity that others will take.”

Some other points I picked up from Tim’s presentation:

  • Big data is not so much unstructured as variably-structured.
  • The mean size of an analytical data set is 3TB (growing but not that huge) – don’t think you need petabytes of data for big data tools and techniques to be relevant.
  • Social network analytics is probably the world’s largest (free) marketing focus group!

Big Data – Are You Ready?

Following the analyst introduction, the event moved on to the vendor pitch.  This was structured around a set of videos which I’ve seen previously, in which a fictitious American organisation grapples with a big data challenge, using an over-sized actor (and an under-sized one) to prove their point. I found these videos a little tedious the first time I saw them, and this was the second viewing for me.  For those who haven’t had the privilege, the videos are on YouTube and I’ve embedded the first one below (you can find the links on an Oracle’s Data Warehouse Insider blog post).


The key points I picked up from this session were:

  • Oracle see big data as a process towards making better decisions based on four stages: decide, acquire, organise and analyse.
  • Oracle considers that there are three core technologies for big data: Oracle NoSQL, Hadoop, and R; brought together by Oracle Engineered Systems (AKA the “buy our stuff” pitch).

Cloudera

Had I been at the London event I would have been extremely privileged to see Doug Cutting, Hadoop creator and now Chief Architect at Cloudera speak about his work in this field.  Doug wasn’t available to speak at the Manchester event so Oracle showed us a pre-recorded interview.

For those who aren’t familiar with Cloudera (I wasn’t), it’s effectively a packaged open source big data solution (based on Hadoop and related technologies) providing an enterprise big data solution, with support.

The analogy given was that of a “big data operating system” with Cloudera doing for Hadoop what Red Hat does for Linux.

Perhaps most pertenent of Doug Cutting’s commenst was that we are at the beginning of a revolution in data processing where people can afford to save data and use it to learn, to get a “higher resolution picture of what’s going on and use it to make more informed decisions”.

Capturing the asset – acquire and organise

After a short pitch from Infosys (who have a packaged data platform, although personally, I’d be looking to the cloud…) and an especially cringeworthy spoof Lady Gaga video (JavaZone’s Lady Java) we moved on to enterprise NoSQL. In effect, Oracle has created a NoSQL database using the Berkeley key value database and a Java driver (containing much of the logic to avoid single points of failure) that they claim offers a simple data model, scalability, high availability, transparent load balancing and simple administration.

Above all, Oracle’s view is that, because it’s provided and maintained by Oracle, there is a “single throat to choke”.  In effect, in the same way that we used to say no-one got fired for buying IBM, they are suggesting no-one gets fired for buying Oracle.

That may be true, but it’s my understanding that big data is fuelled by low-cost commodity hardware (infrastructure as a service) and open source software – and whilst Oracle may have a claim on the open source front, the low-cost commodity hardware angle is not one that sits well in the Oracle stable…

Through partnership with Cloudera (which leaves some wondering if  that will last any longer than the Red Hat partnership did?), Oracle is positioning a Hadoop solution for their customer base:

Oracle describe Cloudera as the Redhat for Hadoop, but also say they won't develop their own release; they said that for Linux originally
@debralilley
Debra Lilley

Despite (or maybe in spite of) the overview of HDFS and MapReduce, I’m still not sure how Cloudera  sits alongside Oracle NoSQL but their “big data appliance” includes both options. Now, when I used to install servers, appliances were typically 1U “pizza box” servers. Then they got virtualised – but now it seems they have grown to become whole racks (Oracle) or even whole containers (Microsoft).

Oracle’s view on big data is that we can:

  1. Acquire data with their Big Data Appliance.
  2. Organise/Analyse aggregated results with Exadata.
  3. Decide at “the speed of thought” with Exalytics.

That’s a lot of Oracle hardware and software…

In an attempt not to position Oracle’s more traditional products as old hat, the next presenter suggested that big data is complementary and not really about old and new but about familiar and unfamiliar. Actually, I think he has a point: at some point “big” data just becomes “data” (and gets boring again?) but this session gave an overview of an information architecture challenge as new classes of data (videos and images, documents, social data, machine-generated data, etc.) create a divide between transactional data and big data, which is not really unstructured but better described as semi-structured and which uses sandboxes to analyse and discover new meaning from data.

Oracle has big data connectors to integrate with other (Oracle) solutions including: a HiveQL-based data integrator; a loader to move Hadoop data into Oracle 11G; a SQL-HDFS connector; and an R connector to run scripts with API access to both Hadoop and more traditional Oracle databases. There are also Oracle products such as GoldenGate to replicate data in heterogeneous data environments

[My view, for what it's worth, is that we shouldn't be moving big data around, duplicating (or triplicating) data - we should be linking and indexing it to bridge the divide between the various silos of "big" data and "traditional" data.]

Finding the value – analyse and decide

Speaking of a race to gain insight analytics becoming the CIO’s top priority for 2013 and business intelligence usage doubling by 2014, the next session looked at some business analytics techniques and characteristics, which can be summarised as:

  • I suspect something – a data scientist or analyst needs to find proof and turn into a predictive model to deploy into business process (classification).
  • I want to know if that matters – “I wish I knew” (visual exploration and discovery).
  • I want to make the best decision now – decisions at the speed of thought in the context of a business process.

This led on to a presentation about the rise of the data scientist and making maths cool (except it didn’t, especially with a demo of some not-very-attractive visualisations run on an outdated  Windows XP platform) and introduction of the R language for statistical analysis and visualisation.

Following this was a presentation about Oracle’s recently-acquired Endeca technology which actually sounds pretty interesting as it digests a variety of data sources and creates a data model with an information-discovery front-end that promises “the simplicity of search plus the power of BI”.

The last presentation of this segment looked at Oracle’s Exalytics in-memory database servers (a competitor to SAP Hana) bundling bsuiness intelligence software, adaptive in-memory caching (and columnar compression) with information discovery tools.

Wrap-up

I learned a lot about Oracle’s view of big data but that’s exactly what it was – one vendor’s view on this massively hyped and expanding market segment. For me, the most useful session of the day was from Ovum’s Tim Jennings and if that was all I took away, it would have been worthwhile.

In fairness, it was good to learn some more about the Oracle solutions too but I do wish vendors (including my own employer) would sometimes drop the blatant product marketing and consider the value of some vendor agnostic thought leadership. I truly believe that, by showing customers a genuine understanding of their business, the issues that they face and the directions that business and technology and heading in,  the solutions will sell themselves if they truly provide value. On the other hand, by telling me that Oracle has a complete, open and integrated solution for everything and what I really need is to buy more technology from the Oracle stack and… well, I’d better have a good story to convince the CFO that it’s worthwhile…

Slidedecks and other materials from the Oracle Big Data and Extreme Analytics Summit are available on the Oracle website.

Technology

More on NoSQL, Hadoop and Microsoft’s entry to the world of big data

Yesterday, my article on Microsoft’s forays into the world of big data went up on Cloud Pro. It’s been fun learning a bit about the subject (far more than is in that article – because big data is a big theme in my work at the moment) and I wanted to share some more info that didn’t fit into my allotted 1000 words.

Microsoft Fellow Dr David DeWitt gave an excellent keynote on Day 3 of the SQL PASS 2011 summit last month and it’s a great overview of how Hadoop works. Of course, he has a bias towards use of RDBMS systems but the video is well worth watching for it’s introduction to NoSQL, the differences between key value stores and Hadoop-type systems, and the description of the Hadoop components and how they fit together (skip the first 18 minutes and, if the stream doesn’t work, try the download - the deck is available too). Grant Fritchey and Jen McCown have written some great notes to go with Dr DeWitt’s keynote too.  For more about when you might use Hadoop, Jeremiah Peschka has a good post.

Microsoft’s SQOOP implementation is not the first – Cloudera have been integrating SQL and Hadoop for a couple of years now. Meanwhile, Buck Woody has a great overview of Microsoft’s efforts in the big data space.

I also mentioned Microsoft StreamInsight (formerly code-named “Austin”) in the post (the Complex Event Processing capability inside SQL Server 2008 R2) and Microsoft’s StreamInsight Team has posted what they call “the basics” of event processing. It seems to require coding, but is probably useful to anyone who is getting started with this stuff. For those of us who are a little less code-oriented, Andrew Fryer’s overview of StreamInsight (together with a more general post on CEP) is worth a read, together with Simon Munro’s post on where StreamInsight fits in.

Shortly after I sent my article to Cloud Pro’s Editor, I saw Mike Walsh’s “Microsoft Loves Your Big Data” post. I like this because it cuts through the press announcements and talks about what is really going on: interoperability; and becoming a player themselves. Critically:

“They aren’t copying, or borrowing or trying to redo… they are embracing”

And that is what I really think makes a refreshing change.

%d bloggers like this: