Since springing from the pages of a 2004 Google research paper, the open source running code we know today as “Hadoop,” has created something of a competition free-for-all. Whether by adding proprietary features or through market strategy, Hadoop vendors have sought to differentiate in order to gain the edge in the melee.
But the field is narrowing as more of the fighters drop out or join forces. With about 73% of enterprises harboring near-term plans to invest in Big Data according to Gartner surveys, the victorious few have a lot to look forward to. Especially with Hadoop being the de facto poster child for Big Data, to the point were it could be said that many companies use the terms interchangeably.
The three you hear most about were founded to deliver solutions based on Hadoop: Cloudera, Hortonworks and MapR. But there are plenty of other companies also selling Hadoop solutions. Some, like Amazon Web Services (AWS), Pivotal, IBM or Intel, are big names adding a Hadoop capability to their portfolio. Others, like Altiscale, AtScale (gee, wonder how many have confused those two), Platfora, Qubole or Trifacta are startups seeking to differentiate themselves from the Hadoop pack by tackling certain pieces of the problem.
There’s plenty of choice in the market, but the ties that bind can also be the ties that strangle. And all that freedom to differentiate has also become freedom to duplicate. There is plenty of wheel reinventing, needless replicating and customer confusion to go around. Consolidation had to come, and it has.
Intel (yes that Intel) once had its own Hadoop distribution (or distro). Outside of China, it didn’t sell very well. Seeing the writing on the wall, Intel poured over $700 million into Cloudera a year ago, and began moving customers from the Intel Hadoop distro to Cloudera’s.
Pivotal, which spun out of EMC and VMware back in 2013, also had a Hadoop distro, which they tweaked to function best with a range of former EMC tools including the Greenplum database. Pivotal’s unique attributes, like HawQ (which offers full RDBMS SQL functionality engine on HDFS), certainly attracted (and continue to attract) customer interest. But maybe the cost of maintaining and enhancing yet another Hadoop distro got tiring, as they recently decided to go in a new direction. Pivotal together with Hortonworks launched its Open Data Platform last February.
It’s pretty clear that this trend still has legs, especially given that the code of the dominant Hadoop distros gets more and more similar every day. Some players like Intel realized the value of dropping out of the scrum, shedding their distro in favor of one of the market leaders. You can bet others will see the light now that the example has been set. Almost anytime you have a growing ecosystem of related products like this, the community will eventually establish standards. As consensus builds there will be plenty of questions answered in the future, such as how components like Apache Spark and Apache Ambari will fit with Hadoop. The key truth to remember is that the market is now 20 years into an open source software adoption that isn’t done providing tremendous value. That means more evolution is on the way. In 2015, we’ll see the continued evolution of a newer, more nuanced model of Hadoop to combine deep innovation with community development.