IBM Bets On Apache Spark As 'The Future Of Enterprise Data' - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

Data Management

IBM Bets On Apache Spark As 'The Future Of Enterprise Data'

The key problem Spark resolves is access to data across the enterprise. IBM initiatives include providing courses to train 1 million data scientists and engineers to use it.

7 Data Center Disasters You'll Never See Coming
7 Data Center Disasters You'll Never See Coming
(Click image for larger view and slideshow.)

IBM is making a major commitment to the future of Apache Spark, with a series of initiatives announced today. IBM will offer Apache Spark as a service on Bluemix; commit 3,500 researchers to work on Spark-related projects; donate IBM SystemML to the Spark ecosystem; and offer courses to train 1 million data scientists and engineers to use Spark.

The commitment to Spark is "right in the heart of what [IBM] has been doing," said Rob Thomas, VP for product development for IBM Analytics, in an interview. That database heritage hearkens back to earlier commitments to Linux, and even further back to IBM's DB2 database product, he said. But it is rare for IBM to make a technological bet such as Spark, he added.

"This is the future of enterprise data." Thomas continued. "Anyone using data will have to leverage Spark."

(Image: Geralt via Pixabay)

(Image: Geralt via Pixabay)

The key problem Spark resolves is access to data across the enterprise. A typical large corporation will have hundreds, if not thousands of data sets residing in different databases across its IT system.

A data scientist can certainly craft an algorithm to plumb the depths of any database. But "it takes a data scientist 90 days of work" to craft that algorithm, Thomas said. "Today, if you port it to another system, you are talking about another 90 days of work" to re-craft and adjust that algorithm in order to get it to work. Spark "eliminates that second 90 days." he said. A Spark-based system can seamlessly and transparently access and analyze any database, without additional development and delay.

[ What's in store for Hadoop? Read Will 2015 Be The 'Year Of Hadoop'?. ]

Another virtue Spark possesses is ease of use. Developers can concentrate on building the solution, instead of building an engine from scratch.

IBM sponsored a hackathon recently during which more than 100 teams crafted new Spark-based apps in about 10 days. One team made a genomic cloud system to analyze DNA samples, another created a search engine to gauge public opinion based on sentiments perceived in text. Thomas pointed to these projects as "proof of concept" to show how quickly a competent team of two or three people complete a project using Spark.

"The weakest part of Spark is the machine learning piece," Thomas noted. To that end, IBM will make available its SystemML machine learning technology to add learning capability to Spark apps, working with partner Databricks. This is not an algorithm library, but an engine that understands algorithms, Thomas said of SystemML.

While Spark looks promising, nothing will come of it without sufficient numbers of data scientists who actually use it. And data scientists don't grow on trees. IBM wants to educate about 1 million new users through a series of partnerships with AMPLab, DataCamp, MetiStream, Galvanize, and the Big Data University MOOC. The goal here is to make available a "data scientist's work bench" where users who know the R programming language can pick up Spark and its uses very quickly, Thomas said.

Ultimately, it falls to enterprises to make the best use of big data technology such as Spark. "Knowing the problem to solve—that will drive significant business value," Thomas said. CEOs are only beginning to understand how their data can be put to best use. Thomas offered the example of Moneyball, the 2003 book on how the Oakland Athletics sharpened their play of baseball through statistical analysis. "Data can make you think differently," Thomas said. And therein lies the quest for the advantages of insight.

William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Get Your Enterprise Ready for 5G
Mary E. Shacklett, Mary E. Shacklett,  1/14/2020
Modern App Dev: An Enterprise Guide
Cathleen Gagne, Managing Editor, InformationWeek,  1/5/2020
9 Ways to Improve IT and Operational Efficiencies in 2020
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/2/2020
White Papers
Register for InformationWeek Newsletters
State of the Cloud
State of the Cloud
Cloud has drastically changed how IT organizations consume and deploy services in the digital age. This research report will delve into public, private and hybrid cloud adoption trends, with a special focus on infrastructure as a service and its role in the enterprise. Find out the challenges organizations are experiencing, and the technologies and strategies they are using to manage and mitigate those challenges today.
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Flash Poll