What SQL Server 2012 does for cloud data

Advice Mary Branscombe Apr 17, 2012

Microsoft is pushing System Center and Windows Server as strands of its private cloud philosophy - what does this mean for big data?

Big data is the quintessential digital shoebox; instead of filtering data into a data warehouse by cherry picking what you know is useful and putting it into a cube designed for what you know you’re going to do with it, you collect masses of unstructured data in the hope that you can find something new and useful from it.

Microsoft technical fellow and long term SQL Server architect Dave Campbell jokes that the most important question to ask when building a large data warehouse is “have you figured out what 90 percent of the data to throw out yet” but big data is the opposite. You need all of it, for when you figure out new models to build from it, and storage is cheap.

Say you’re tracking the location of ambulances so you can dispatch the closest unit to an emergency; is it worth keeping that data? If you can correlate patient outcomes with the response time of the ambulance that took them to hospital, model where you’d have to put ambulance stations to get them within five minutes driving of the most emergency calls and then model whether that’s the same during the day when people are at work and in the evening when they’re at home and traffic is quieter, it could be extremely useful. 

But you can’t store all that data the way you store other business information. HP’s new Enterprise Database Consolidation Appliance is a SQL Server private cloud in a box. It gives you up to 1,000 SQL Server databases on an eight-server Hyper-V cluster with 58TB of data in each rack and up to ten racks per appliance, with built-in load balancing. That’s perfect for a big database, especially with the new ‘always on’ high availability support in SQL Server 2012 that use shared storage and smart updating to combine database mirroring and log shipping using multiple secondaries for really seamless failover (and a virtual network name that applications and services use so they can reconnect without knowing that they’re talking to a different system).

Private cloud with SQL Server 2012 gives you resource pooling, elasticity, self-service (through System Center 2012), and usage-based pricing for chargeback. You can scale out to SQL Azure for hybrid cloud, using the new Visual Studio-based SQL Server Data Tools to develop and deploy the same databases on premise and to SQL Azure (the schema have to be different but the development environment checks against the platform you’re going to deploy to, so migrating database apps to the Azure cloud temporarily or permanently is less work). You get cloud scalability by sharding database tables across multiple databases using federation and SQL Azure Data Sync allows two-way synchronisation between the two in close to real-time.

If you’re not interested in public cloud, the HP appliance is a way of getting all the private cloud advantages without the hard work of building your own database infrastructure.

But it’s not what you need for big data because you can’t process it in the same way. Rather than normalising and replicating to get the ‘one version of the truth’ businesses rely on a data warehouse for, Campbell points out that you have to accept that “In this new world there are multiple versions of the truth. There's the version of the truth that's tidy; that's the one in your data warehouse. There are also gritty versions of the truth from your telemetry; all of the things that could be talking to you, things that if you knew how to process, you could tell if someone is hacking you. There’s the social one that your customers and employees are creating – you can monitor Twitter for sentiment because you hope you will find someone who is not satisfied before they pick up the phone and call us or find a way they're using something that we hadn't thought of.”

Big data not big databases
That’s why SQL Azure and, soon, the new SQL Server 2012 support Hadoop as well as the familiar relational databases and cubes, with the tools both to do quick and dirty MapReduce requests in a JavaScript console on a web page in the PowerShell command line and to explorer data visually in the PowerPivot and PowerView BI Tools as if it was coming out of a structured database.

Using the wizard interface, you’ll be able to pull millions of rows of data out of Hadoop in a matter of seconds (compared to the hours it would take to read that much data off a single drive), combine it with other data sources (from OData or your data warehouse) in an in-memory OLAP-like model and analyse it as if it came from a SQL Server cube in the first place.