MapR Streams: Big Data Analysis In Real-Time - InformationWeek

InformationWeek is part of the Informa Tech Division of Informa PLC

This site is operated by a business or businesses owned by Informa PLC and all copyright resides with them.Informa PLC's registered office is 5 Howick Place, London SW1P 1WG. Registered in England and Wales. Number 8860726.

IoT
IoT
Data Management // Software Platforms

MapR Streams: Big Data Analysis In Real-Time

MapR is again beefing up its real-time efforts for big data with the release of MapR Streams. Here's how it works.

Apple, Microsoft, IBM: 7 Big Analytics Buys You Need to Know
Apple, Microsoft, IBM: 7 Big Analytics Buys You Need to Know
(Click image for larger view and slideshow.)

Big data vendors are all trying to give their customers something called "situational awareness" -- delivering systems that provide real-time insight into sales, transactions, and other data. MapR, one of the top three Apache Hadoop distribution companies (Cloudera and Hortonworks are the other two), will get closer to that goal with the release of MapR Streams.

The company describes the technology as a real-time "global event streaming system," which will be delivered as part of its MapR Converged Data Platform in early 2016.

MapR Streams connects and tracks multiple data streams among multiple sources. Developers can use Streams to build scalable high-volume systems that can handle billions of messages among millions of topics spread out over thousands of locations.

"The operating system does a great job with processing, but lags as a data platform," Jack Norris, Chief Marketing Officer at MapR, told InformationWeek in an interview. While analysis is possible once the data is collected, it hasn't been so easy to do while the data is in stream and in use. Norris said that customers want to know what is happening in the present, rather than finding out at sundown what happened during the day.

MapR Streams aims to eliminate that time delay by allowing real-time analysis of data, regardless of source. The strategy behind the solution is to concentrate processing in the layer between the data and the apps, and identifying what type of data is being analyzed (files, tables, documents, or streams) rather than classifying it according to the silo it was drawn from.

(Image: NorthernStock/iStockphoto)

(Image: NorthernStock/iStockphoto)

This requires a change in architecture rather than a rearrangement of existing apps, Norris said, and that poses a challenge. "Some of it is scale. Some of it is the disparity of sources. Some of it is global synchronization, which requires sophisticated replication.

"Big data is generated one event at a time," Norris said. "The sum total of all that is big data." Without applying analysis to the live stream, the data simply goes into a repository and sits there until analyzed. With MapR Stream, consolidation happens in minutes, not at the end of the day.

That consolidation provides the view of "what is happening." But that view alone can lead to an operational paradigm shift in certain industries, depending on how the technology gets used.

[Find out more about MapR's real-time efforts. Read MapR Drafts JSON to Work With Hadoop.]

The system enables developers to unite analytics, transactions, and stream-processing while reducing data duplication and minimizing cluster sprawl. Cross-site replication allows the construction of global real-time apps that can provide reliable message delivery and order consistency. MapR Streams can interface with other Apache Software Foundation big data projects including Spark Streaming, Apache Storm, Apache Flink, and Apache Apex. 

Norris called the new architecture "the biggest change in enterprise computing in decades."

Why? He gave several examples. Take online retailing, where a big data insight can suggest additional, related products to add to a transaction. Do that half a million times over the course of a year and one can realize significant additional revenue.

The technique can be applied to credit card transactions, using risk mitigation to avoid the cost of fraud. It can also be used in the oil and gas industry to monitor pipeline and refinery equipment and identify scheduling opportunities for preventive maintenance without disrupting continuous operations.

**New deadline of Dec. 18, 2015** Be a part of the prestigious InformationWeek Elite 100! Time is running out to submit your company's application by Dec. 18, 2015. Go to our 2016 registration page: InformationWeek's Elite 100 list for 2016.

William Terdoslavich is an experienced writer with a working understanding of business, information technology, airlines, politics, government, and history, having worked at Mobile Computing & Communications, Computer Reseller News, Tour and Travel News, and Computer Systems ... View Full Bio

We welcome your comments on this topic on our social media channels, or [contact us directly] with questions about the site.
Comment  | 
Print  | 
More Insights
Comments
Newest First  |  Oldest First  |  Threaded View
Charlie Babcock
50%
50%
Charlie Babcock,
User Rank: Author
12/9/2015 | 7:07:07 PM
Using lots of data in near real time the goal
The goal is to get to data use in real time and that goal is still a long ways off.  But MapR, Cloudera and Hortonworks and the Spark project are all pushing in that direction. Compared to how hard it used to be to make use of a large amount of data quickly, we've come a long way on what will prove an exhausting jouring.
Gary_EL
50%
50%
Gary_EL,
User Rank: Ninja
12/8/2015 | 4:08:24 PM
Data, data, everywhere
Old news is no news at all. If all the titanic amounts of information being harvested can't be immediately analyzed and exploited, it is of significantly less value to the organizations that are spending big money to obtain it. More efforts such as described will be needed, as the IoT advances and collects more data still.
Slideshows
IT Careers: Top 10 US Cities for Tech Jobs
Cynthia Harvey, Freelance Journalist, InformationWeek,  1/14/2020
Commentary
Predictions for Cloud Computing in 2020
James Kobielus, Research Director, Futurum,  1/9/2020
News
What's Next: AI and Data Trends for 2020 and Beyond
Jessica Davis, Senior Editor, Enterprise Apps,  12/30/2019
White Papers
Register for InformationWeek Newsletters
Video
Current Issue
The Cloud Gets Ready for the 20's
This IT Trend Report explores how cloud computing is being shaped for the next phase in its maturation. It will help enterprise IT decision makers and business leaders understand some of the key trends reflected emerging cloud concepts and technologies, and in enterprise cloud usage patterns. Get it today!
Slideshows
Flash Poll