Apache Kafka – Usage Patterns

Apache Kafka – Usage Patterns

by

As the technology industry changes new buzz words appear. From Hadoop, Spark and now to Kafka. What is Kafka Kafka is an Apache Top Level project. Apache Kafka is an open-source streaming unified, high-throughput, low-latency platform which can handle real-time data...
SQL-on-Hadoop: The Paradox of Choice

SQL-on-Hadoop: The Paradox of Choice

Hadoop has been around for a little over 10 years now. It provides you a scale-out and cost-effective solution to store and process large amount of data – which we loosely refer to as “Big Data”. More enterprises are adopting Hadoop with an objective...
HDFS Heterogeneous Storage Model

HDFS Heterogeneous Storage Model

HDFS has proven to be a scalable, fault-tolerant and distributed storage solution which is quickly being adopted by various industries. The distributed storage along with the ability to scale-out in a linear way makes the entire Hadoop framework very cost...
Spark 2.0 – What’s New

Spark 2.0 – What’s New

Earlier this month DataBricks provided an overview of Apache Spark’s next major release, Spark 2.0. The following post shows some of the changes in the abstraction, API and Libraries. Spark 2.0 is expected to be released in early June 2016. What is Apache Spark...
Apache Spark — Sparking Interest

Apache Spark — Sparking Interest

Over the past few years we have all been enthralled with the buzz generated by IoT.  Now it looks like its time for Apache Spark to take its place in the lexicon of Big Data buzzwords. While performing my research for trends on Google, I was surprised to find out that...