Wednesday, March 26, 2014

SAP jacks up Business Warehouse to in- memory speed

7:33 PM TECHNOLOGY No comments

SAP jacks up Business Warehouse to in-

memory speed

Wednesday, 26 March 2014

SAP adds in-memory processing and greater Hadoop

integration to keep existing customers, not to draw new

ones

SAP jacks up Business Warehouse to in-memory speed

Credit: iStockphoto

In-memory databases are fast becoming the rule and not the exception. Oracle, IBM, Microsoft,Pivotal -- there's barely a major database vendor or analytics technology that isn't getting a shot of in-memory processing power.

Now add SAP's Business Warehouse product to that list, with SAP using the term "in-memory data fabric" to describe the kind of in-memory processing it has built. In truth, it's mostly a way to keep SAP's existing customer base securely locked in, rather than having to draw in greenfield customers.

The big advantage to in-memory work is speed, which SAP is touting as a major selling point for Business Warehouse 7.4. SAP's claim is not just that transactions on individual nodes are speedy, but that queries across multiple nodes also remain fast and scale well. SAP claims it has run 8 million row queries across 111 SAP HANA instances, with results returned in 330 milliseconds, as opposed to a single instance returning results in 250 milliseconds.

SAP's other major claim to fame here is how it can keep these results speedy by using SAP HANA's Smart Data Access technology, which allows data stored remotely in many other sources to be processed as if it were being kept locally in HANA. When a query is executed, HANA determines what part of the query can be executed on the remote target and returns only the relevant data. Not all kinds of data and not all data sources work the same way, but the idea is to provide this kind of behavior across as many data sources as possible.

If this sounds vaguely like the way Hadoop's workings, you're not far off. Recently, HANA added Hadoop as a data processing target, with HANA either leveraging Hadoop's batch processing power or using it for raw HDFS storage as needed. SAP's reseller agreements with the likes of Hortonworks is a big indicator as to how it feels about having SAP HANA and Hadoop as complementary, not competitive, technologies.

In-memory processing is one of those technologies that has a basic definition, but it can be implemented multiple ways. The core idea is consistent enough: When processing one or more queries from a given data source, keep as much of the data resident in memory as possible, rather than copying it selectively from storage. But the products that us it all vary widely. The way Microsoft SQL Server 2014 does it, for instance, is probably nothing like the way Pivotal HD does it, if only because Pivotal HD was written from the inside out to perform such analytics, while SQL Server has to integrate in-memory processing with other behaviors.

SAP's approach is to treat in-memory processing like a low-level substrate, and other levels of the product are built on top of it, rather than have it as an adjunctive method for how data is handled (such as the way SQL Server lets you mark individual tables for in-memory processing).

When I spoke to SAP's Neil McGovern, senior director for product innovation, he noted that one of the issues with a conventional relational database is how much of its code is optimized for overcoming the fact that data is by default being stored to disk. "If you say, 'OK, this [table] is going to be memory resident,' you're still doing all that optimization for disk even though the data is in memory. That's why you'd see only a 30 times speed improvement in SQL Server where you could see 30 times 30 in HANA, or even more. The paradigm is built up from in-memory to start with."

If SAP's approach to in-memory processing is to make it an underlying technology, how likely is it it will face competition from projects like Cloudera's commercial version of Apache Spark, which performs real-time analytics inside Hadoop? What about GridGain and ScaleOut, two other Hadoop-based solutions for real-time analytics? It's tempting to say how Hadoop could be, in the hands of the right vendors and with the right software on top of it, made into that more of a direct competitor for what SAP is cooking up.

But like Oracle, SAP will be tough to displace in part because both companies also offer comprehensive applications as their big sell, not just a low-level technology. And both companies have their existing customer bases sewn up tight with their software licensing and maintenance contracts -- obstacles nearly every third-party competitor for Oracle/SAP's business has come up against time and again.

Based on the way SAP has expanded partnerships with hardware vendors and offeredsubscription pricing for HANA, it has every intention of keeping the territory it's staked out for itself. Disrupting SAP will involve more than just having in-memory processing as a technology; it'll require a whole new approach -- maybe one like Infor's -- that will make existing SAP and Oracle customers feel like they're missing out.

Wednesday, March 26, 2014