EMC has announced the Federation Business Data Lake, a new converged Big Data solution jointly developed by EMC and its federation partners, Pivotal and VMware to greatly simplify the building and deployment of a Data Lake for customers. It is very complex technology, and only EMC’s more sophisticated partners are likely to handle it, but it should be a strong opportunity for them.
“This is a pretty big move by us from just selling and promoting products to promoting and selling outcomes for our customers,” said Jeremy Burton, EMC’s President, Products and Marketing, in a videotape made to launch the event. “Federation Business Data Lake is our second Big Data solution. It builds on top of our Federation Enterprise Hybrid Cloud.”
Burton discussed the context where Data Lake is needed, in a world that will have somewhere between 30 and 200 billion devices by 2020, and where machines will no longer be just inanimate things that do a task – like a tractor – but intelligent things. The tractor, for instance, will be able to determine a plan for planting, and sense what is going on in the land at the same time, like moisture levels, to help bring about a precision farming revolution that will increase yields, and farmer profits. Burton said this is happening now in some places, with companies like Monsanto, and will eventually happen everywhere.
“It won’t be just about the product, but about the data the product generates,” Burton said.
Big Data analytics can combine context and intent and do this kind of thing in real time, instead of having to work from old information – last year’s data, in the case of farming. Companies with successful Data Lakes can leverage the data and predictive models to build new products, applications and business models to redefine their industry, moving far beyond traditional data warehousing.
‘Instead of a world where there was a lot of structured collection of this data, and you worked on summarization or aggregates, you now are storing all of the baseline detail data, and instead of standard reporting and measures to figure out what’s happening, you’re moving into the realm where everything is trending towards predictive and analytic insight, machines learning techniques and doing those things in real time,” said Scott Yara, President and Head of products, Pivotal. “That’s what’s fundamentally different about this new world of Big Data, and that’s why we are seeing this explosion of applications and creation of all types of new services.”
Yara said that it’s also very challenging to do this effectively, opening a lot of questions for organizations how to build a data platform to do this. A Business Data Lake contains both structured and unstructured data from a wide variety of sources, with predictive analytics that leverage the data. Deploying a Data Lake requires deploying and configuring the right analytics platform and the right corresponding storage for each analytics use case, from Hadoop to real-time. Once the environment is created, data must be loaded with all the right access rights and governance applied to the data sets. This process of operationalizing the data has been a complex and time-consuming task.
“The real question, is how do we make this easy,” Yara said.
That, of course, was the segue to the Federation Business Data Lake Solution, and how EMC, Pivotal and VMware could come together to make the task of building Data Lakes much easier.
The Data Lake Storage Foundation is built on EMC Isilon, EMC’s scale-out file storage solution, and EMC Elastic Cloud Storage, the object foundation for the Data Lake. Running on top of this is a fully virtualized (by VMware) analytics foundation, the Pivotal Big Data Suite, including PivotalHD, a Hadoop engine for analytics, Gemfire, a NoSQL data base that does real time analytics, HAWQ, a massively parallel SQL interface to Hadoop, and the Greenplum data base, a scale-out MPP database which EMC believes to be leaps and bounds above traditional warehousing technologies. Then on top of the analytics framework comes Pivotal Cloud Foundry, the foundation for building the applications, allowing you to simply deploy the applications to your cloud. The core technology also has an open ecosystem with broad ecosystem support, fully supported.
“It’s a fully engineered solution,” said Josh Kahn, SVP Global Solutions at EMC. “First, we did all the integration work and testing, to make sure it all works. Second, we built a predefined set of analytics use cases, to help customers navigate the mapping of use cases to the right analytic platform, to the right storage platform. And then finally, we built automated provisioning and configuration to make sure the IT organization can simply and easily establish that environment and get up and running quickly.”
Kahn said that to support customer choice, EMC will also deliver two additional Business Data Lakes to enable integration with customer choice of Hadoop distribution, one for Cloudera and one for Hortonworks.
Kahn then laid out ambitious plans for expanded functionality in 2015. This includes expanding the Platform Manager and adding new modules, giving a plug and play capability to allow different engines to plug into the management layer. There will also be an Index Engine that sits on top of that, providing the ability to search across the Data Lake – and data outside the Data Lake as well. They are also adding a Data Governor, to apply policies to all the information in the Data Lake.
“Last, but not least, definitely not least, we are going to provide self-service capabilities, because this is what everybody expects today,” Kahn said. “The ability to go in, ask for their own environment, ask for their own data, and be up and running in an instant.” Thus while the original version of Data Lake will be run by IT, the not-so-long term plan is to empower business users to do it on their own, and be able to build their own Data Lakes.
The market for the Federation Business Data Lake will not be limited to very large organizations.
“It’s certainly not something suitable for smaller businesses, but it is for the middle to large to very large – and not just for very large,” said Michael Kerr, director of channels at EMC Canada. “All these businesses face the same challenges.”
Kerr said that while the role of vendors and their channel has always been to help partners get to the Holy Grail, saying it in the past has always been easier than delivering.
“With this, however, we are getting closer to that final road map that lets us combine incredibly complex technologies,” he said. “We don’t want to suggest it’s simple to do. It’s not. This is complex technology. However, this is a clear road map on a definitive path to full data analytic leverage of information in what we call the Data Lake. Our sophisticated partners are moving to this direction as well. They’ve been in integration for a long time, but they see a shift to end-user business outcome selling. This gives them the ability to assist IT to transform to a service entity. This also gives partners more consultative respect, as well as a longer term annuity.”
Kerr said that all the individual components of this solution are already available to partners, and while most resellers will not be capable of selling or installing it, some will.
“It’s likely just the top six in Canada max, and even they may need to work with EMC Consulting to work with it tomorrow,” he said. “These partners are looking for this kind of leadership with us. They know their business model is changing under their feet, that they cannot just continue to sell stuff, even though that model will continue for a long time to come. The top six are looking for that next play.”
Kerr said that the Federation Business Data Lake isn’t likely to become a pure cloud offering, because of security and governance issues, but that their hybrid cloud offering has been picking up momentum in Canada.
“We are starting to see adoption of converged and hybrid cloud gather steam now, and we will certainly be leading with this,” he said.
The Federation Business Data Lake will be offered in Directed Availability in April 2015.