How the cloud can help to cope with big data applications

Advice Adrian Bridgwater Jun 27, 2011

Lots of companies are talking about how the cloud is going to handle big data applications - but what does that actually mean?

It seems very natural to talk about ‘big data’ and the cloud in the same sentence; after all, cloud computing is conspicuously renowned for its scalability and extensibility. But what does ‘big data’ in the cloud really mean to companies working with massive dataset challenges?

Shall we start with a definition of big data? Of course, no single piece of data is necessarily big. The ‘bigness’ comes from the collection and collusion of many pieces of data into a so-called dataset. As such, the tools, processes and procedures around the data itself essentially define what big data is. When a company chooses (or needs) to handle terabytes (or even petabytes) of potentially dynamic fast moving data, we’re in big data territory.

We can also define big data by the difficulties it causes in terms of its capture, storage, search, sharing and analytical needs. When these tasks present themselves, more often than not it’s because we have big data challenges ahead of us, so where will we find it?

Where do we find big data?
Both the UK Met Office and the US National Oceanic and Atmospheric Administration collect vast sums of data as they analyse climatic conditions around the planet, so this is big data in its purest form. For a more down to earth example, let’s think about a large retailer or a financial intuition with huge Complex Event Processing (CEP) challenges; a large amount of transactional related data needs to be managed, analysed and intelligently stored at light speed. Big data is here and it’s here to stay.

As compliance and regulatory pressures further compel businesses to store data archives, the big data mountain grows and the crags and escarpments become more treacherous. A new compass and set of cloud-based crampons is needed if we are going to work with big data and still be able to breathe oxygen.

For chief information and technology officers approaching new and as yet uncharted big data challenges, the cloud computing model of IT delivery promises to answer some of the immediate scalability needs here. But handling big data in the cloud as a core business resource is not a plug-and play affair. The architectural approach used in a ‘normal’ non-cloud based data intensive grid is, for example, very different from the virtualised infrastructure companies will need to use to drive big data in the cloud.

"These [architectures] have different server/storage configurations, different environmental (power and Heating Ventilation and AC) profiles and different data ingest/migration patterns. In a number of larger enterprises (especially those with global needs like major financial institutions and retail organisations), IDC expects to see the emergence of separate data centres designed specifically for big data workloads. Concentration of data streams and compute resources makes more sense for both performance and telecommunications cost reasons. Basically, the data centre becomes the ‘big data’ system,” says Richard Villars, VP of storage and IT executive strategies with IDC.