SQL Server and Hadoop - unlikely bedfellows but combine powerfully
Big Data is hard to avoid – what does Microsoft’s embrace of Hadoop mean for IT Managers?
Because most enterprises have both the structured and unstructured data, we really need tools that allow us to analyse and manage data in multiple environments – ideally without having to go back and forth. That’s why there are so many vendors jumping on the big data bandwagon but it seems that a SQOOP connector is not the only work Microsoft is doing in the big data space:
- SQL Server 2008 R2 includes a complex event processing (CEP) capability called StreamInsight. The principle is that streams of data can be monitored, managed and mined for particular events (instead of running queries across data, run the data through a set of queries looking for matches) and this can help organisations to respond quickly to new opportunities – maybe even adopting a predictive business model.
- The next version of SQL Server will include a new data analysis tool called Power View which will even be supported on competitive mobile operating systems (including iOS and Android).
- Windows Azure includes table storage – a key/value pair storage solution with partitioning.
- Also on Azure, Microsoft is creating a new Data Explorer tool to create rich data sets that can be published as a service and an iterative MapReduce runtime codenamed “Daytona” for scaling data analytics across hundreds of processing cores.
- Microsoft is also creating new implementations of the Hadoop stack for Windows Azure and Windows Server (including a Hive ODBC driver and a Hive Add-in for Excel) but it also has a competing technology called LINQ to HPC (formerly codenamed Dryad) that allows a Windows High Performance Compute (HPC) cluster to not only perform parallel computing but also to integrate with Azure (the theory behind this is that big data jobs are typically I/O-bound, rather than compute-bound).
In our increasingly cloudy world, infrastructure and platforms are rapidly becoming commoditised. We need to focus on software that allows us to derive value from data to gain some business value. Consider that Microsoft is only one vendor, then think about what Oracle, IBM, Fujitsu and others are doing. If you weren’t convinced before, maybe HP’s Autonomy purchase is starting to make sense now?
Looking specifically at Microsoft’s developments in the big data world, it therefore makes sense to see the company get closer to Hadoop. The world has spoken and the de facto solution for analysing large data sets seems to be HDFS/MapReduce/Hive (or similar).
Maybe Hadoop’s success comes down to HDFS and MapReduce being based on work from Google whilst Hive and Pig are supported by Facebook and Yahoo respectively (i.e. they are all from established Internet businesses). But, by embracing Hadoop (together with porting its tools to competitive platforms), Microsoft is better placed to support the entire enterprise with both their structured and unstructured needs.
Mark Wilson is a Strategy Manager for a major systems integrator and an independent technology writer. With almost two decades experience of the IT industry, Mark has a background in leading large IT infrastructure projects in the UK, mainland Europe and Australia and now focuses on providing thought leadership to help customers to shape business and technology strategy.
Follow Mark on Twitter @markwilsonit .