- Cloud Essentials
- Software as a Service
- Accounting / Financial
- Asset Management
- Business Intelligence
- Business Process Management
- Compliance & Risk Management
- Content Management
- Document Management
- Help Desk Management
- IT / Application Management
- Project Management
- Transportation & Logistics
- Infrastructure as a Service
- Platform as a Service
What SQL Server 2012 does for cloud data
Microsoft is pushing System Center and Windows Server as strands of its private cloud philosophy - what does this mean for big data?
Whether you’re working through Hive or pushing the data out of Hadoop and into SQL Server, those MapReduce commands are pulling structured answers out of information stored in Hadoop that’s usually called unstructured. It would be more accurate to call it data that hasn’t yet been structured, because if there isn’t a way of structuring it you’re not going to get anything useful out of it. MapReduce lets you figure out quickly which of many possible ways to structure the data gives you useful information.
Once you’ve done that, you might carry on using MapReduce to apply the structure each time, or you might apply the structure and pull the data across into the structured environment of SQL Server where you can index it (and get performance benefits like the new columnstore index in SQL Server 2012, which lets users look at just the columns they’re interested in far more quickly than having to do the processing to pull those out of the row-wise tables every time.
Working with a Hadoop cluster on Azure has two advantages. For one, it takes ten minutes to create a cluster using a wizard rather than having to set it up using only configuration files (which even database professionals can find confusing). The Metro-style interface certainly makes it simple to see the status of your cluster; how much data you’re storing, which ports are open for transferring data (from Azure Storage or Amazon S3 or any other sources, and the progress of any jobs, including how much disk I/O and processing is going on.
Working with Azure Hadoop
The Azure Hadoop service is currently in its second Community Technology Preview, which gives you the command prompt for configuration, plus you’ll soon be able to use PowerShell to manage an Azure Hadoop cluster.
But you don’t have to use Azure to take advantage of Hadoop with SQL Server; you can use the Hive ODBC driver and Excel add-in (also in a second Community Technology Preview) to work live against the MapReduce results you’re already generating from your Hadoop system.
And with either option, when you’ve found the structure you want to keep applying to the Hadoop data, the new free SQL Connector for Hadoop (which is also available for Parallel Data Warehouse) can take data directly into SQL Server from Hadoop (again, bypassing Hive). The Hadoop distribution for both Azure and on-premise SQL Server running on Windows Server won’t be available straight away, but it will ship in 2012. And it’s going to stay compatible with other Hadoop distributions through Microsoft’s partnership with Yahoo spin-out Hortonworks that gives it a higher priority on proposing contributions back into Apache Hadoop.
As always, Microsoft is taking an extremely pragmatic approach to new technologies. If you think of NoSQL less as a philosophical approach and more as ‘not just SQL’ and big data as just another source of information, with SQL Server 2012 and SQL Azure Microsoft is making all of these yet-to-be structured data sources into just another data source businesses that already rely on SQL Server can take advantage of.