Tuesday, January 6, 2015

5 Big Data Trends to Watch for in 2015

(Reposted from Texas Enterprise Magazine)

Takeaway

  • Hadoop will lose its status and the go-to big data buzzword.
  • The low cost and high convenience of the cloud will become increasingly key to successful big data projects.
  • Security and the difficulty of managing big data projects in production will be the key challenges
2014 saw big data IPOs, huge companies betting their future on big data, and a flood of new tools and technologies promising solutions to big data problems.
2015 will be an important year as well, as big data becomes more mainstream and the growing amount of stored information makes analytics increasingly important. Here are five trends to watch for in 2015:

1 – Hadoop is Dead! (Long live Hadoop!)

There is no single buzzword more associated with big data than Hadoop. The two terms have become almost interchangeable in some contexts. But 2015 will see the waning of importance of Hadoop, and executives everywhere will scramble for a new buzzword.
Hadoop has always had issues. It is too intimidating and does not do enough on its own. The first solutions to the Hadoop-is-scary problem were tools like Pig and Hive, which sat on top of Hadoop and made big data problems more manageable. Next, companies like Hortonworks, MapR, and Cloudera began providing a Hadoop stack that let companies focus less on IT and more on how they were going to actually solve their big data problems.
The natural progression for these Hadoop vendors is to continue to differentiate their offerings and expertise, with the goal of having potential customers ask for them by name rather than generic Hadoop. Cloudera arguably began this with Impala, and the pile of cash Hortonworks is sitting on post-IPO will let them invest heavily in R&D. Meanwhile, companies with big data problems not appropriate for a turnkey solution will look less towards Hadoop and more to next-generation technologies like Apache Spark. These new offerings build on Hadoop, so the technology isn’t going anywhere; it is the Hadoop name that will have clearly jumped the shark by the end of 2015.
In short, private Hadoop vendors will move towards marketing themselves as distinct solution providers rather than Hadoop implementers, and the organizations doing big data from scratch will increasingly find newer technologies that offer more power than Hadoop while being easier to wrangle.

2 – Big Data Continues to the Cloud

IT folk in the know understand that cloud is inevitable, and big data is no exception. What will make big data take up cloud even faster than other technologies is how well suited the cloud is to big data.
Almost all big data solutions (including Hadoop) are run on large clusters of essentially off-the-shelf computers. Many organizations have a need for a large cluster, consisting of dozens or even hundreds of computers, but they do not need all this power 24/7. And with normal rates of hardware failure, even a mid-sized cluster can require a full-time staff just to run around and swap defective parts. These two issues — maintaining computing power that lies idle, and staffing an IT department — are issues instantly solved by a well-designed cloud platform. It simply does not make economic sense for an organization to maintain its own cluster, unless it’s dealing with Google-scale big data problems.
Another trend in 2015 will also drive big data work to the cloud:

3 – Big Data as a Service

Traditionally big data on the cloud means virtual machines turned on or off by an organization as computing needs change (think Amazon’s Elastic MapReduce). But more and more, big data problems will be solved by interfaces and software living in the cloud rather than as virtual machines organizations need to manage on their own.
Google fired a warning shot last year when it announced upcoming public access to its internal big data service Dataflow, which essentially lets users run code without worrying about the management of big data ETL pipelines. And an ever-growing array of startups is offering big data solutions as a service. Ersatz Labs is one such startup, offering a simple Web interface to build deep learning models — a technique that as recently as a year ago was only known to academics and researchers. More and more, these big data-centric services will make running a Hadoop cluster unnecessary overhead for the majority of organizations.

4 – Security Slows Down Big Data

Information security has become a big deal. Last year’s Target breach showed just how vulnerable many companies are, and the recent Sony Pictures hack crippled a multi-billion dollar organization. Security, long considered an afterthought, is now at the forefront of business leaders’ minds moving into 2015. No part of corporate IT will be spared, including big data.
So how will a long overdue shift to security-centric IT affect big data? Unfortunately, it is going to make many things slower and more difficult. Many big data technologies are simply not built with security in mind. 2015 is likely the year we hear about a data breach of a Hadoop cluster, with hackers downloading an amount of data that makes Sony’s 11 lost terabytes look puny. Data breaches will lead to panic, which will lead to duct-tape solutions that maybe fix the security holes but leave big data practitioners pulling their hair out contending with walls of security presently not in place.

5 – Poorly Maintained Machine Learning Comes to Haunt Early Adopters

While many organizations still struggle to implement machine learning, some organizations are far along and have run into a new problem: large bases of machine learning models are incredibly difficult to maintain, much more so than a codebase.
Google published research this year detailing struggles with maintaining a large number of machine learning models. While Google has far more machine learning models in production than most organizations, this is not a problem that will go away. Google was just one of the first to encounter this problem, and when Google encounters a big data problem, it is a problem other organizations will face in the future. (Fun fact: over 10 years ago, Google was one of the first companies to publicly discuss Hadoop-scale problems.)
While 2015 will probably not see hundreds of organizations struggling with machine learning gone out of control, 2015 will be the beginning of this discussion — a discussion that will ultimately lead to a boom in demand for an almost impossible to find skill set: data scientists who understand high-level systems architecture.
Do you agree? Disagree? Are any key trends missing? What are your big data predictions for 2015?