Seven technology trends to watch 2017-2020

Just over a week ago, risual held its bi-annual summit at the risual HQ in Stafford – the whole company back in the office for a day of learning with a new format: a mini-conference called risual:NXT.

I was given the task of running the technical track – with 6 speakers presenting on a variety of topics covering all of our technical practices: Cloud Infrastructure; Dynamics; Data Platform; Unified Intelligent Communications and Messaging; Business Productivity; and DevOps – but I was also privileged to be asked to present a keynote session on technology trends. Unfortunately, my 35-40 minutes of content had to be squeezed into 22 minutes… so this blog post summarises some of the points I wanted to get across but really didn’t have the time.

1. The cloud was the future once

For all but a very small number of organisations, not using the cloud means falling behind. Customers may argue that they can’t use cloud service because of regulatory or other reasons but that’s rarely the case – even the UK Police have recently been given the green light (the blue light?) to store information in Microsoft’s UK data centres.

Don’t get me wrong – hybrid cloud is more than tactical. It will remain part of the landscape for a while to come… that’s why Microsoft now has Azure Stack to provide a means for customers to run a true private cloud that looks and works like Azure in their own datacentres.

Thankfully, there are fewer and fewer CIOs who don’t see the cloud forming part of their landscape – even if it’s just commodity services like email in Office 365. But we need to think beyond lifting and shifting virtual machines to IaaS and running email in Office 365.

Organisations need to transform their cloud operations because that’s where the benefits are – embrace the productivity tools in Office 365 (no longer just cloud versions of Exchange/Lync/SharePoint but a full collaboration stack) and look to build new solutions around advanced workloads in Azure. Microsoft is way ahead in the PaaS space – machine learning (ML), advanced analytics, the Internet of Things (IoT) – there are so many scenarios for exploiting cloud services that simply wouldn’t be possible on-premises without massive investment.

And for those who still they can compete with the scale that Microsoft (Amazon and Google) operate at, this video might provide some food for thought…

(and for a similar video from a security perspective…)

2. Data: the fuel of the future

I hate referring to data as “the new oil”. Oil is a finite resource. Data is anything but finite! It is a fuel though…

Data is what provides an economic advantage – there are businesses without data and those with. Data is the business currency of the future. Think about it: Facebook and Google are entirely based on data that’s freely given up by users (remember, if you’re not paying for a service – you are the service). Amazon wouldn’t be where it is without data.

So, thinking about what we do with that data: the 1st wave of the Internet was about connecting computers, 2nd was about people, the 3rd is devices.

Despite what you might read, IoT is not about connected kettles/fridges. It’s not even really about home automation with smart lightbulbs, thermostats and door locks. It’s about gathering information from billions of sensors out there. Then, we take that data and use it to make intelligent decisions and apply them in the real world. Artificial intelligence and machine learning feed on data – they are ying and yang to each other. We use data to train algorithms, then we use the algorithms to process more data.

The Microsoft Data Platform is about analytics and data driving a new wave of insights and opening up possibilities for new ways of working.

James Watt’s 18th Century steam engine led to an industrial revolution. The intelligent cloud is today’s version – moving us to the intelligence revolution.

3 Blockchain

Bitcoin is just one implementation of something known as the Blockchain. In this case as a digital currency.

But Blockchain is not just for monetary transactions – it’s more than that. It can be used for anything transactional. Blockchain is about a distributed ledger. Effectively, it allows parties to trust one another without knowing each other. The ledger is a record of every transaction, signed and tamper-proof.

The magic about Blockchain is that as the chain gets longer so does the entropy and the encryption level – effectively, the more the chain is used, the more secure it gets. That means infinite integrity.

(Read more in Jamie Skella’s “A blockchain explaination your parents could understand”.)

Blockchain is seen as strategic by Microsoft and by the UK government and it’s early days but we will see where people want to talk about integrity and data resilience with integrity. Databases – anything transactional – can be signed with blockchain.

A group of livestock farmers in Arkansas is using blockchain technology so customers can tell where their dinner comes from. They are applying blockchain technology to trace products from ‘farm to fork’ aiming to provide consumers with information about the origin and quality of the meat they buy.

Blockchain is finding new applications in the enterprise and Microsoft has announced the CoCo Framework to improve performance, confidentiality and governance characteristics of enterprise blockchain networks (read more in Simon Bisson’s article for InfoWorld). There’s also Blockchain as a service (in Azure) – and you can find more about Microsoft’s plans by reading up on “Project Bletchley”.

(BTW, Bletchley is a town in Buckinghamshire that’s now absorbed into Milton Keynes. Bletchley Park was the primary location of the UK Government’s wartime code-cracking efforts that are said to have shortened WW2 by around 2 years. Not a bad name for a cryptographic technology, hey?)

4 Into the third dimension

So we’ve had the ability to “print” in 3 dimensions for a while but now 3D is going further.Now we’re taking physical worlds into the virtual world and augmenting with information.

Microsoft doesn’t like the term augmented reality (because it’s being used for silly faces on photos) and they have coined the term mixed reality to describe taking untethered computing devices and creating a seamless overlap between physical and virtual worlds.

To make use of this we need to be able to scan and render 3D images, then move them into a virtual world. 3D is built into next Windows 10 release (the Fall Creators update, due on 17 October 2017). This will bring Paint 3D, a 3D Gallery, View 3D for our phones – so we can scan any object and import to a virtual world. With the adoption rates of new Windows 10 releases then that puts 3D on a market of millions of PCs.

This Christmas will see lots of consumer headsets in the market. Mixed reality will really take off after that. Microsoft is way ahead in the plumbing – all whilst we didn’t notice. They held their Hololens product back to be big in business (so that it wasn’t a solution without a problem). Now it can be applied to field worker scenarios, visualising things before they are built.

To give an example, recently, I had a builder quote for a loft extension at home. He described how the stairs will work and sketched a room layout – but what if I could have visualised it in a headset? Then imagine picking the paint, sofas, furniture, wallpaper, etc.

The video below shows how Ford and Microsoft have worked together to use mixed reality to shorten and improve product development:

5 The new dawn of artificial intelligence

All of the legends of AI are set by sci-fi (Metropolis, 2001 AD, Terminator). But AI is not about killing us all! Humans vs. machines? Deep Blue beating people at Chess, Jeopardy, then Google taking on Go. Heading into the economy and displacing jobs. Automation of business process/economic activity. Mass unemployment?

Let’s take a more optimistic view! It’s not about sentient/thinking machines or giving human rights to machines. That stuff is interesting but we don’t know where consciousness comes from!

AI is a toolbox of high-value tools and techniques. We can apply these to problems and appreciate the fundamental shift from programming machines to machines that learn.

Ai is not about programming logical steps – we can’t do that when we’re recognising images, speech, etc. Instead, our inspiration is biology, neural networks, etc. – using maths to train complex layers of neural networks led to deep learning.

Image recognition was “magic” a few years ago but now it’s part of everyday life. Nvidia’s shares are growing massively due to GPU requirements for deep learning and autonomous vehicles. And Microsoft is democratising AI (in its own applications – with an intelligent cloud, intelligent agents and bots).

NVIDIA Corporation stock price growth fuelled by demand for GPUs

So, about those bots…

A bot is a web app and a conversational user interface. We use them because natural language processing (NLP) and AI are here today. And because messaging apps rule the world. With bots, we can use Human language as a new user interface; bots are the new apps – our digital assistants.

We can employ bots in several scenarios today – including customer service and productivity – and this video is just one example, with Microsoft Cortana built into a consumer product:

The device is similar to Amazon’s popular Echo smart speaker and a skills kit is used to teach Cortana about an app; Ask “skillname to do something”. The beauty of Cortana is that it’s cross-platform so the skill can show up wherever Cortana does. More recently, Amazon and Microsoft have announced Cortana-Alexa integration (meanwhile Siri continues to frustrate…)

AI is about augmentation, not replacement. It’s true that bots may replace humans for many jobs – but new jobs will emerge. And it’s already here. It’s mainstream. We use recommendations for playlists, music, etc. We’re recognising people, emotions, etc. in images. We already use AI every day…

6 From silicon to cells

Every cell has a “programme” – DNA. And researchers have found that they can write code in DNA and control proteins/chemical processes. They can compile code to DNA and execute, creating molecular circuits. Literally programming biology.

This is absolutely amazing. Back when I was an MVP, I got the chance to see Microsoft Research talk about this in Cambridge. It blew my mind. That was in 2010. Now it’s getting closer to reality and Microsoft and the University of Washington have successfully used DNA for storage:

The benefits of DNA are that it’s very dense and it lasts for thousands of years so can always be read. And we’re just storing 0s and 1s – that’s much simpler than what DNA stores in nature.

7 Quantum computing

With massive data storage… the next step is faster computing – that’s where Quantum computing comes in.

I’m a geek and this one is tough to understand… so here’s another video:

Quantum computing is starting to gain momentum. Dominated by maths (quantum mechanics), it requires thinking in equations, not translating into physical things in your head. It has concepts like superposition (multiple states at the same time) and entanglement. Instead of gates being turned on/off it’s about controlling particles with nanotechnology.

A classical 2 bit on-off takes 2 clock cycles. One quantum bit (a Qubit) has multiple states at the same time. It can be used to solve difficult problems (the RSA 2048 challenge problem would take a billion years on a supercomputer but just 100 seconds on a 250-bit quantum computer). This can be applied to encryption and security, health and pharma, energy, biotech, environment, materials and engineering, AI and ML.

There’s a race for quantum computing hardware taking place and China sees this as a massively strategic direction. Meanwhile, the UK is already an academic centre of excellence – now looking to bring quantum computing to market. We’ll have usable devices in 2-3 years (where “usable” means that they won’t be cracking encryption, but will have initial applications in chemistry and biology).

Microsoft Research is leading a consortium called Station Q and, later this year, Microsoft will release a new quantum computing programming language, along with a quantum computing simulator. With these, developers will be able to both develop and debug quantum programs implementing quantum algorithms.

Predicting the future?

Amazon, Google and Microsoft each invest over $12bn p.a. on R&D. As demonstrated in the video above, their datacentres are not something that many organisations can afford to build but they will drive down the cost of computing. That drives down the cost for the rest of us to rent cloud services, which means more data, more AI – and the cycle continues.

I’ve shared 7 “technology bets” (and there are others, like the use of Graphene) that I haven’t covered – my list is very much influenced by my work with Microsoft technologies and services. We can’t always predict the future but all of these are real… the only bet is how big they are. Some are mainstream, some are up and coming – and some will literally change the world.

Credit: Thanks to Rob Fraser at Microsoft for the initial inspiration – and to Alun Rogers (@AlunRogers) for helping place some of these themes into context.

The “wheel of fortune”

Last week, I wrote about the White Book of Big Data – a publication I co-authored last year at Fujitsu.

One of the more interesting (for me) sections of the document was an idea from one of my colleagues, providing a model to determine next steps in forming a strategy for embracing a new approach (in this case to move forward towards gaining value from the use of a big data solution but it can be applied to other scenarios too).

The model starts with a “wheel” diagram and, at the centre is the first decision point. All organisations exist to generate profit (even non-profits work on the same principles, they just don’t return those profits to shareholders).  There are two ways to increase profit: reducing cost; or increasing revenue.

For each of the reduce cost/increase revenue sectors, there are two more options: direct or indirect.

These four selections lead to a number of other opportunities and these may be prioritised to determine which areas to focus on in a particular business scenario.

With those priorities highlighted, a lookup table can be used to suggest appropriate courses of action to take next.

It’s one of those models that’s simple and, I think, quite elegant. I’ll be looking to adopt this in other scenarios in future and I thought that readers of this blog might find it useful too…

Take a look at the book if you want to see this working in practice – “the wheel” is on page 37.

The White Book of Big Data

Almost exactly a year ago, I was part of a team at Fujitsu that wrote a short publication called the White Book of Big Data.

This was the third book in the successful “white book” series, aimed at helping CIOs to cut through vendor hype on technology and business trends, following on from the White Book of Cloud Adoption and the White Book of Cloud Security.

At the time, I was keen to shout about this work but couldn’t track down an externally-visible link (and I was asked not to publish it directly myself).  Now, when big data has become such an incredibly over-hyped term (so much so that I try not to use the term myself), I’ve found that the book has been available for some time via the Cloud Solutions page on the Fujitus website!

Irrespective of the time it’s taken for me to be able to write about this (and any bias I may have as one of the authors) I still think it’s a useful resource for anyone trying to cut through the vendor hype.  At no point does it try to directly sell Fujitsu products – and I’d be interested in any feedback that anyone has after reading it.  If you’d like to read the book, you can download a PDF.

As I’ve changed roles since the book was published, I think it’s unlikely I’ll be involved in any future publications of this type (I always wanted to create a White Book of “Bring Your Own” Computing) – unless I can encourage any of my marketing colleagues to sponsor a White Book of Messaging!

The annotated world – the future of geospatial technology? (@EdParsons at #DigitalSurrey)

Tonight’s Digital Surrey was, as usual, a huge success with a great speaker (Google’s @EdParsons) in a fantastic venue (Farnham Castle).  Ed spoke about the future of geospatial data – about annotating our world to enhance the value that we can bring from mapping tools today but, before he spoke of the future, he took a look at how we got to where we are.

What is geospatial information? And how did we get to where we are today?

Geospatial information is very visual, which makes it powerful for telling stories and one of the most famous and powerful images is that of the Earth viewed from space – the “blue marble”. This emotive image has been used many times but has only been personally witnessed by around 20 people, starting with the Apollo 8 crew, 250000 miles from home, looking at their own planet. We see this image with tools like Google Earth, which allows us to explore the planet and look at humankind’s activities. Indeed about 1 billion people use Google Maps/Google Earth every week – that’s about a third of the Internet population, roughly equivalent to Facebook and Twitter combined [just imagine how successful Google would be if they were all Google+ users…]. Using that metric, we can say that geospatial data is now pervasive – a huge shift over the last 10 years as it has become more accessible (although much of the technology has been around longer).

The annotated world is about going beyond the image and pulling out info otherwise invisible information, so, in a digital sense, it’s now possible to have map of 1:1 scale or even beyond. For example, in Google Maps we can look at StreetView and even see annotations of buildings. This can be augmented with further information (e.g restrictions in the directions in which we can drive, details about local businesses) to provide actionable insight. Google also harvests information from the web to create place pages (something that could be considered ethically dubious, as it draws people away from the websites of the businesses involved) but it can also provide additional information from image recognition – for example identifying the locations of public wastebins or adding details of parking restrictions (literally from text recognition on road signs). The key to the annotated web is collating and presenting information in a way that’s straightforward and easy to use.

Using other tools in the ecosystem, mobile applications can be used to easily review a business and post it via Google+ (so that it appears on the place page); or Google MapMaker may be used by local experts to add content to the map (subject to moderation – and the service is not currently available in the UK…).

So, that’s where we are today… we’re getting more and more content online, but what about the next 10 years?

A virtual (annotated) world

Google and others are building a virtual world in three dimensions. In the past, Google Earth pulled data from many sets (e.g. building models, terrain data, etc.) but future 3D images will be based on photographs (just as, apparently, Nokia have done for a while). We’ll also see 3D data being using to navigate inside buildings as well as outside. In one example, Google is working with John Lewis, who have recently installed Wi-Fi in their stores – to use this to determine a user’s location determination and combine this with maps to navigate the store. The system is accurate to about 2-3 metres [and sounds similar to Tesco’s “in store sat-nav” trial] and apparently it’s also available in London railway stations, the British Museum, etc.

Father Ted would not have got lost in the lingerie department if he had Google's mapping in @! says @ #DigitalSurrey
@markwilsonit
Mark Wilson

Ed made the point that the future is not driven by paper-based cartography, although there were plenty of issues taken with this in the Q&A later, highlighting that we still use ancient maps today, and that our digital archives are not likely to last that long.

Moving on, Ed highlighted that Google now generates map tiles on the fly (it used to take 6 weeks to rebuild the map) and new presentation technologies allow for client-side rendering of buildings – for example St Pauls Cathedral, in London. With services such as Google Now (on Android), contextual info may be provided, driven by location and personality

With Google’s Project Glass, that becomes even more immersive with augmented reality driven by the annotated world:

Although someone also mentioned to me the parody which also raises some good points:

Seriously, Project Glass makes Apple’s Siri look way behind the curve – and for those who consider the glasses to be a little uncool, I would expect them to become much more “normal” over time – built into a normal pair of shades, or even into prescription glasses… certainly no more silly than those Bluetooth earpieces the we used to use!

Of course, there are privacy implications to overcome but, consider what people share today on Facebook (or wherever) – people will share information when they see value in it.

Big data, crowdsourcing 2.0 and linked data

At this point, Ed’s presentation moved on to talk about big data. I’ve spent most of this week co-writing a book on this topic (I’ll post a link when it’s published) and nearly flipped when I heard the normal big data marketing rhetoric (the 3 Vs)  being churned out. Putting aside the hype, Google should know quite a bit about big data (Google’s search engine is a great example and the company has done a lot of work in this area) and the annotated world has to address many of the big data challenges including:

  • Data integration.
  • Data transformation.
  • Near-real-time analysis using rules to process data and take appropriate action (complex event processing).
  • Semantic analysis.
  • Historical analysis.
  • Search.
  • Data storage.
  • Visualisation.
  • Data access interfaces.

Moving back to Ed’s talk, what he refers to as “Crowdsourcing 2.0” is certainly an interesting concept. Citing Vint Cerf (Internet pioneer and Google employee), Ed said that there are an estimated 35bn devices connected to the Internet – and our smartphones are great examples, crammed full of sensors. These sensors can be used to provide real-time information for the annotated world: average journey times based on GPS data, for example; or even weather data if future smartphones were to contain a barometer.

Linked data is another topic worthy of note, which, at its most fundamental level is about making the web more interconnected. There’s a lot of work been done into ontologies, categorising content, etc. [Plug: I co-wrote a white paper on the topic earlier this year] but Google, Yahoo, Microsoft and others are supporting schema.org as a collection of microformats, which are tags that websites can use to mark up content in a way that’s recognised by major search providers. For example, a tag like <span itemprop="addresscountry">Spain</span> might be used to indicate that Spain is a country with further tags to show that Barcelona is a city, and that Noucamp is a place to visit.

Ed’s final thoughts

Summing up, Ed reiterated that paper maps are dead and that they will be replaced with more personalised information (of which, location is a component that provides content). However, if we want the advantages of this, we need to share information – with those organisations that we trust and where we know what will happen with that info.

Mark’s final thoughts

The annotated world is exciting and has stacks of potential if we can overcome one critical stumbing point that Ed highliughted (and I tweeted):

In order to create a more useful, personal, contextual web, organisations need to gain our trust to share our information #DigitalSurrey
@markwilsonit
Mark Wilson

Unfortunately, there are many who will not trust Google – and I find it interesting that Google is an advocate of consuming open data to add value to its products but I see very little being put back in terms of data sets for others to use. Google’s argument is that it spent a lot of money gathering and processing that data; however it could also be argued that Google gets a lot for free and maybe there is a greater benefit to society in freely sharing that information in a non-proprietary format (rather than relying on the use of Google tools). There are also ethical concerns with Google’s gathering of Wi-Fi data, scraping website content and other such issues but I expect to see a “happy medium” found, somewhere between “Don’t Be Evil” and “But we are a business after all”…

Thanks as always to everyone involved in arranging and hosting tonight’s event – and to Ed Parsons for an enlightening talk!

Big data according to the Oracle

After many years of working mostly with Microsoft infrastructure products, the time came for me to increase my breadth of knowledge and, with that, comes the opportunity to take a look at what some of the other big players in our industry are up to.  Last year, I was invited to attend the Oracle UK User Group Conference where I had my first experience of the world of Oracle applications; and last week I was at the Oracle Big Data and Extreme Analytics Summit in Manchester, where Fujitsu was one of the sponsors (and an extract from one of my white papers was in the conference programme).

It was a full day of presentations and I’m not sure that reproducing all of the content here makes a lot of sense, so here’s an attempt to summarise it… although even a summary could be a long post…

Big data trends, techniques and opportunities

Tim Jennings (@tjennings) from Ovum set the scene and explained some of the ways in which big data has the potential to change the way in which we work as businesses, citizens and consumers (across a variety of sectors).

Summing up his excellent overview of big data trends, techniques and opportunities, Tim’s key messages were that:

  1. Big data is characterised by volume, variety and velocity [I’d add value to that list].
  2. Big data represents a change in the mentality of analytics, away from precise analysis of well-bound sources to rough-cut exploratory analysis of all the data that’s practical to aggregate.
  3. Enterprise should identify business cases for big data and the techniques and processes required to exploit them.
  4. Enterprises should review existing business intelligence architectures and methods and plan the evolution towards a broader platform capable of handling the big data lifecycle.

And he closed by saying that “If you don’t think that big data is relevant to your organisation, then you are almost certainly missing an opportunity that others will take.”

Some other points I picked up from Tim’s presentation:

  • Big data is not so much unstructured as variably-structured.
  • The mean size of an analytical data set is 3TB (growing but not that huge) – don’t think you need petabytes of data for big data tools and techniques to be relevant.
  • Social network analytics is probably the world’s largest (free) marketing focus group!

Big Data – Are You Ready?

Following the analyst introduction, the event moved on to the vendor pitch.  This was structured around a set of videos which I’ve seen previously, in which a fictitious American organisation grapples with a big data challenge, using an over-sized actor (and an under-sized one) to prove their point. I found these videos a little tedious the first time I saw them, and this was the second viewing for me.  For those who haven’t had the privilege, the videos are on YouTube and I’ve embedded the first one below (you can find the links on an Oracle’s Data Warehouse Insider blog post).


The key points I picked up from this session were:

  • Oracle see big data as a process towards making better decisions based on four stages: decide, acquire, organise and analyse.
  • Oracle considers that there are three core technologies for big data: Oracle NoSQL, Hadoop, and R; brought together by Oracle Engineered Systems (AKA the “buy our stuff” pitch).

Cloudera

Had I been at the London event I would have been extremely privileged to see Doug Cutting, Hadoop creator and now Chief Architect at Cloudera speak about his work in this field.  Doug wasn’t available to speak at the Manchester event so Oracle showed us a pre-recorded interview.

For those who aren’t familiar with Cloudera (I wasn’t), it’s effectively a packaged open source big data solution (based on Hadoop and related technologies) providing an enterprise big data solution, with support.

The analogy given was that of a “big data operating system” with Cloudera doing for Hadoop what Red Hat does for Linux.

Perhaps most pertenent of Doug Cutting’s commenst was that we are at the beginning of a revolution in data processing where people can afford to save data and use it to learn, to get a “higher resolution picture of what’s going on and use it to make more informed decisions”.

Capturing the asset – acquire and organise

After a short pitch from Infosys (who have a packaged data platform, although personally, I’d be looking to the cloud…) and an especially cringeworthy spoof Lady Gaga video (JavaZone’s Lady Java) we moved on to enterprise NoSQL. In effect, Oracle has created a NoSQL database using the Berkeley key value database and a Java driver (containing much of the logic to avoid single points of failure) that they claim offers a simple data model, scalability, high availability, transparent load balancing and simple administration.

Above all, Oracle’s view is that, because it’s provided and maintained by Oracle, there is a “single throat to choke”.  In effect, in the same way that we used to say no-one got fired for buying IBM, they are suggesting no-one gets fired for buying Oracle.

That may be true, but it’s my understanding that big data is fuelled by low-cost commodity hardware (infrastructure as a service) and open source software – and whilst Oracle may have a claim on the open source front, the low-cost commodity hardware angle is not one that sits well in the Oracle stable…

Through partnership with Cloudera (which leaves some wondering if  that will last any longer than the Red Hat partnership did?), Oracle is positioning a Hadoop solution for their customer base:

Oracle describe Cloudera as the Redhat for Hadoop, but also say they won't develop their own release; they said that for Linux originally
@debralilley
Debra Lilley

Despite (or maybe in spite of) the overview of HDFS and MapReduce, I’m still not sure how Cloudera  sits alongside Oracle NoSQL but their “big data appliance” includes both options. Now, when I used to install servers, appliances were typically 1U “pizza box” servers. Then they got virtualised – but now it seems they have grown to become whole racks (Oracle) or even whole containers (Microsoft).

Oracle’s view on big data is that we can:

  1. Acquire data with their Big Data Appliance.
  2. Organise/Analyse aggregated results with Exadata.
  3. Decide at “the speed of thought” with Exalytics.

That’s a lot of Oracle hardware and software…

In an attempt not to position Oracle’s more traditional products as old hat, the next presenter suggested that big data is complementary and not really about old and new but about familiar and unfamiliar. Actually, I think he has a point: at some point “big” data just becomes “data” (and gets boring again?) but this session gave an overview of an information architecture challenge as new classes of data (videos and images, documents, social data, machine-generated data, etc.) create a divide between transactional data and big data, which is not really unstructured but better described as semi-structured and which uses sandboxes to analyse and discover new meaning from data.

Oracle has big data connectors to integrate with other (Oracle) solutions including: a HiveQL-based data integrator; a loader to move Hadoop data into Oracle 11G; a SQL-HDFS connector; and an R connector to run scripts with API access to both Hadoop and more traditional Oracle databases. There are also Oracle products such as GoldenGate to replicate data in heterogeneous data environments

[My view, for what it’s worth, is that we shouldn’t be moving big data around, duplicating (or triplicating) data – we should be linking and indexing it to bridge the divide between the various silos of “big” data and “traditional” data.]

Finding the value – analyse and decide

Speaking of a race to gain insight analytics becoming the CIO’s top priority for 2013 and business intelligence usage doubling by 2014, the next session looked at some business analytics techniques and characteristics, which can be summarised as:

  • I suspect something – a data scientist or analyst needs to find proof and turn into a predictive model to deploy into business process (classification).
  • I want to know if that matters – “I wish I knew” (visual exploration and discovery).
  • I want to make the best decision now – decisions at the speed of thought in the context of a business process.

This led on to a presentation about the rise of the data scientist and making maths cool (except it didn’t, especially with a demo of some not-very-attractive visualisations run on an outdated  Windows XP platform) and introduction of the R language for statistical analysis and visualisation.

Following this was a presentation about Oracle’s recently-acquired Endeca technology which actually sounds pretty interesting as it digests a variety of data sources and creates a data model with an information-discovery front-end that promises “the simplicity of search plus the power of BI”.

The last presentation of this segment looked at Oracle’s Exalytics in-memory database servers (a competitor to SAP Hana) bundling bsuiness intelligence software, adaptive in-memory caching (and columnar compression) with information discovery tools.

Wrap-up

I learned a lot about Oracle’s view of big data but that’s exactly what it was – one vendor’s view on this massively hyped and expanding market segment. For me, the most useful session of the day was from Ovum’s Tim Jennings and if that was all I took away, it would have been worthwhile.

In fairness, it was good to learn some more about the Oracle solutions too but I do wish vendors (including my own employer) would sometimes drop the blatant product marketing and consider the value of some vendor agnostic thought leadership. I truly believe that, by showing customers a genuine understanding of their business, the issues that they face and the directions that business and technology and heading in,  the solutions will sell themselves if they truly provide value. On the other hand, by telling me that Oracle has a complete, open and integrated solution for everything and what I really need is to buy more technology from the Oracle stack and… well, I’d better have a good story to convince the CFO that it’s worthwhile…

Slidedecks and other materials from the Oracle Big Data and Extreme Analytics Summit are available on the Oracle website.

Short takes: Flexible working and data protection for mobile devices

It’s been another busy week and I’m still struggling to get a meaningful volume of blog posts online so here are the highlights from a couple of online events I attended recently…

Work smarter, not harder… the art of flexible working

Citrix Online has been running a series of webcasts to promote its Go To Meeting platform and I’ve attended a few of them recently. The others have been oriented towards presenting but, this week, Lynne Copp from the Work Life Company (@worklifecompany) was talking about embracing flexible working. As someone who has worked primarily from home for a number of years now, it would have been great for me to get a bit more advice on how to achieve a better work/life balance (it was touched upon, but most of the session seemed to be targeted how organisations need to change to embrace flexible working practices) but some interesting resources have been made available including:

Extending enterprise data protection to mobile devices

Yesterday, I joined an IDC/Autonomy event looking at the impact of mobile devices on enterprise data protection.

IDC’s Carla Arend (@carla_arend) spoke about how IDC sees four forces of IT industry transformation in cloud, mobility, big data/analytics and social business. I was going to say “they forgot consumerisation” but then it was mentioned as an overarching topic. I was certainly surprised that the term used to describe the ease of use that many consumer services provide was that we have been “spoiled” but the principle that enterprise IT often lags behind is certainly valid!

Critically the “four forces of IT industry transformation” are being driven by business initiatives – and IT departments need to support those requirements. The view put forward was that IT organisations that embrace these initiatives will be able to get funding; whilst those who still take a technology-centric view will be forced to continue down the line of doing more with less (which seems increasingly unsustainable to me…).

This shift has implications for data management and protection – managing data on premise and in the cloud, archiving data generated outside the organisation (e.g. in social media, or other external forums), managing data on mobile devices, and deciding what to do with big data (store it all, or just some of the results?)

Looking at BYOD (which is inevitable for most organisations, with or without the CIO’s blessing!) there are concerns about: who manages the device; who protects it (IDC spoke about backup/archive but I would add encryption too); what happens to data when a device is lost/stolen, or when the device is otherwise replaced; and how can organisations ensure compliance on unmanaged devices?

Meanwhile, organisational application usage is moving outside traditional office applications too, with office apps, enterprise apps, and web apps running on increasing numbers of devices and new machine (sensor) and social media data sets being added to the mix (often originating outside the organisation). Data volumes create challenges too, as well as the variety of locations from which that data originates or resides. This leads to a requirement to carefully consider which data needs to be retained and which may be deleted.

Cloud services can provide some answers and many organisations expect to increasingly adopt cloud services for storage – whether that is to support increasing volumes of application data, or for PC backups. IDC is predicting that the next cloud wave will be around the protection of smart mobile devices.

There’s more detail in IDC’s survey results (European Software Survey 2012, European Storage Survey 2011) but I’ve certainly given the tl;dr view here…

Unfortunately I didn’t stick around for the Autonomy section… it may have been good but the first few minutes were feeling too much like a product pitch to me (and to my colleague who was also online)… sometimes I want the views, opinions and strategic view – thought leadership, rather than sales – and I did say it’s been a busy week!

Linked data: connecting and exploiting big data

Earlier this year, I gave a lightning talk on Structuring Big Data at the CloudCamp London Big Data Special – the idea being that, if we’re not careful, big data will provide yet another silo of information to manage and that linked data could be useful to connect the various data sources (transactional databases, data warehouses, and now big data too).

At the time I mentioned that this was part of a white paper that I was writing with my manager, Ian Mitchell (@IanMitchell2) and our paper on using linked data to connect and exploit big data has now been published on the Fujitsu website.

This week Oracle kicks off its Big Data and Extreme Analytics Summit and Fujitsu are one of the sponsors. An except from the paper is included in the conference brochure and I’ll be at the Manchester event next Tuesday – do come along and say hello if you’re at the event and, even if you’re not, please do check out the paper – I’d love to hear your feedback.

More on NoSQL, Hadoop and Microsoft’s entry to the world of big data

Yesterday, my article on Microsoft’s forays into the world of big data went up on Cloud Pro. It’s been fun learning a bit about the subject (far more than is in that article – because big data is a big theme in my work at the moment) and I wanted to share some more info that didn’t fit into my allotted 1000 words.

Microsoft Fellow Dr David DeWitt gave an excellent keynote on Day 3 of the SQL PASS 2011 summit last month and it’s a great overview of how Hadoop works. Of course, he has a bias towards use of RDBMS systems but the video is well worth watching for it’s introduction to NoSQL, the differences between key value stores and Hadoop-type systems, and the description of the Hadoop components and how they fit together (skip the first 18 minutes and, if the stream doesn’t work, try the download – the deck is available too). Grant Fritchey and Jen McCown have written some great notes to go with Dr DeWitt’s keynote too.  For more about when you might use Hadoop, Jeremiah Peschka has a good post.

Microsoft’s SQOOP implementation is not the first – Cloudera have been integrating SQL and Hadoop for a couple of years now. Meanwhile, Buck Woody has a great overview of Microsoft’s efforts in the big data space.

I also mentioned Microsoft StreamInsight (formerly code-named “Austin”) in the post (the Complex Event Processing capability inside SQL Server 2008 R2) and Microsoft’s StreamInsight Team has posted what they call “the basics” of event processing. It seems to require coding, but is probably useful to anyone who is getting started with this stuff. For those of us who are a little less code-oriented, Andrew Fryer’s overview of StreamInsight (together with a more general post on CEP) is worth a read, together with Simon Munro’s post on where StreamInsight fits in.

Shortly after I sent my article to Cloud Pro’s Editor, I saw Mike Walsh’s “Microsoft Loves Your Big Data” post. I like this because it cuts through the press announcements and talks about what is really going on: interoperability; and becoming a player themselves. Critically:

“They aren’t copying, or borrowing or trying to redo… they are embracing”

And that is what I really think makes a refreshing change.

SQL Server and Hadoop – unlikely bedfellows but a powerful combination

Big Data is hard to avoid – what does Microsoft’s embrace of Hadoop mean for IT Managers?

There are two words that seem particularly difficult to avoid at the moment: big data. Infrastructure guys instinctivly shy away from data but such is its prevalence that big data is much more than just the latest IT buzzword and is becoming a major theme in our industry right now

But what does “big data” actually mean? It’s one of those phrases that, like cloud computing earlier, it is being “adopted” by vendors to mean whatever they want it to.

The McKinsey Global Institute describes big data as “the next frontier for innovation, competition and productivity” but, put simply, it’s about analysing masses of unstructured (or semi-structured) data which, until recently, was considered too expensive to do anything with.

That data comes from a variety of sources including sensors, social networks and digital media and it includes text, audio, video, click-streams, log files and more. Cynics who scoff at the description of “big” data (what’s next, “huge” data?) miss the point that it’s not just about the volume of the data (typically many petabytes) but also the variety and frequency of that data. Some even refer to it as “nano data” because what we’re actually looking at is massive sets of very small data.

Processing big data typically involves distributed computer systems and one project that has come to the fore is Apache Hadoop – a framework for development of open-source software for reliable, scalable distributed computing.

Over the last few weeks though, there have been some significant announcements from established IT players, not all of whom are known for embracing open source technology. This indicates a growing acceptance for big data solutions in general and specifically for solutions that include both open- and closed- source elements.

When Microsoft released a SQL Server-Hadoop (SQOOP) Connector,there were questions about what this would mean for CIOs and IT Managers who may previously have viewed technologies like Hadoop as a little esoteric.

The key to understanding what this would mean would be understanding the two main types of data: structured and unstructured. Structured data tends to be stored in a relational database management system (RDBMS), for example Microsoft SQL Server, IBM DB2, Oracle 11G or MySQL.

By structuring the data with a schema, tables, keys and all manner of relationships it’s possible to run queries (with a language like SQL) to analyse the data and techniques have developed over the years to optimise those queries. By contrast, unstructured data has no schema (at least not a formal one) and may be as simple as a set of files.  Structured data offers maturity, stability and efficiency but unstructured data offers flexibility.

Secondly, there needs to be an understanding of the term “NoSQL”.  Commonly misinterpreted as an instruction (no to SQL), it really means not only SQL – i.e. there are some types of data that are not worth storing in an RDBMS.  Rather than following the database model of extract, transform and load (ETL), with a NoSQL system the data arrives and the application knows how to interpret the data, providing a faster time to insight from data acquisition.

Just as there are two main types of data, there are two main types of NoSQL system: key/value stores (like MongoDB or Windows Azure Table Storage) can be thought of as NoSQL OLTP; Hadoop is more like NoSQL data warehousing and is particularly suited to storing and analysing massive data sets.

One of the key elements towards understanding Hadoop is understanding how the various Hadoop components work together. There’s a degree of complexity so perhaps it’s best to summarise  by saying that the Hadoop stack consists of a highly distributed, fault tolerant, file system (HDFS) and the MapReduce framework for writing and executing distributed, fault tolerant, algorithms. Built on top of that are query languages (live Hive and Pig) and then we have the layer where Microsoft’s SQOOP connector sits, connecting the two worlds of structured and unstructured data.

The trouble is that SQOOP is just a bridge – and not a particularly efficient one either – working on SQL data in the unstructured world involves subdivision of the SQL database so that MapReduce can work correctly.

Because most enterprises have both the structured and unstructured data, we really need tools that allow us to analyse and manage data in multiple environments – ideally without having to go back and forth. That’s why there are  so many vendors jumping on the big data bandwagon but it seems that a SQOOP connector is not the only work Microsoft is doing in the big data space:

In our increasingly cloudy world, infrastructure and platforms are rapidly becoming commoditised. We need to focus on software that allows us to derive value from data to gain some business value. Consider that Microsoft is only one vendor, then think about what Oracle, IBM, Fujitsu and others are doing. If you weren’t convinced before, maybe HP’s Autonomy purchase is starting to make sense now?

Looking specifically at Microsoft’s developments in the big data world, it therefore makes sense to see the company get closer to Hadoop. The world has spoken and the de facto solution for analysing large data sets seems to be HDFS/MapReduce/Hive (or similar).

Maybe Hadoop’s success comes down to HDFS and MapReduce being based on work from Google whilst Hive and Pig are supported by Facebook and Yahoo respectively (i.e. they are all from established Internet businesses).  But, by embracing Hadoop (together with porting its tools to competitive platforms), Microsoft is better placed to support the entire enterprise with both their structured and unstructured needs.

[This post was originally written as an article for Cloud Pro.]

The future Internet and the Intelligent Society

Last week, I spent an evening with the British Computer Society’s Internet Specialist Group, where I’d been asked to present on where I see the Internet developing in future – an always-on, connected vision of joined-up services to deliver greater benefit across society.

I started out with a brief retrospective of the last 42 years of Internet development and at look at the way we use the Internet today, before I introduced the concept of human-centric computing and, in particular, citizen-centric computing as featured in Rebecca MacKinnon’s TED talk about the need to take back the Internet. This shows how we need any future Internet to evolve in a citizen-centric manner, building a world where government and technology serve people and leads nicely into some of the concepts introduced in the Technology Strategy Board‘s Future Internet Report.

After highlighting out the explosion in the volumes of data and the number of connected devices, I outlined the major enabling components for the future Internet – far more than “bigger pipes” – although we do need a capable access mechanism, infrastructure for the personalisation of cloud services and for machine to machine (M2M) transactions; and finally, for convergence that delivers a transformational change in both public and private service delivery.

Our vision is The Intelligent Society; bringing physical and virtual worlds into harmony to deliver greater benefit across society. As consumerisation takes hold, technology is becoming more accessible, even commoditised in places, for on delivery of on-demand, stateless services. Right now we have a “perfect storm” where a number of technologies are maturing and falling into alignment to deliver our vision.

These technologies break down into: the devices (typically mobile) and sensors (for M2M communications); the networks that join devices to services; and the digital utilities that provide on demand computing and software resources for next-generation digital services. And digital utilities are more than just “more cloud” too – we need to consider interconnectivity between clouds, security provision and the compute power required to process big data to provide analytics and smart responses.

There’s more detail in the speaker notes on the deck (and I should probably write some more blog posts on the subject) but I finished up with a look at Technology Perspectives – a resource we’ve created to give a background context for strategic planning.

As we develop “the Internet of the future” we have an opportunity to deliver benefit, not just in terms of specific business problems, but on a wide scale that benefits entire populations. Furthermore, we’ve seen that changing principles and mindsets are creating the right conditions for these solutions to be incubated and developed alongside maturing technologies that enabling this vision and making it a reality.

This isn’t sci-fi, this is within our reach. And it’s very exciting.

[This post originally appeared on the Fujitsu UK and Ireland CTO Blog.]