As the technology industry changes new buzz words appear. From Hadoop, Spark and now to Kafka. What is Kafka Kafka is an Apache Top Level project. Apache Kafka is an open-source streaming unified, high-throughput, low-latency platform which can handle real-time data...
Hadoop has been around for a little over 10 years now. It provides you a scale-out and cost-effective solution to store and process large amount of data – which we loosely refer to as “Big Data”. More enterprises are adopting Hadoop with an objective...
Earlier this month DataBricks provided an overview of Apache Spark’s next major release, Spark 2.0. The following post shows some of the changes in the abstraction, API and Libraries. Spark 2.0 is expected to be released in early June 2016. What is Apache Spark...
In a previous post I discussed how you can Delete or Remove service from Ambari. The process involved the use of Ambari API to Delete or Remove the target service in question. In this article, we will dive into the other side of Ambari interface – the database....
This post is the second in a series of 3, showing the use of Hortonworks DataFlow (HDF) – Powered by Apache NiFi – to design the DataFlow. HDF is a powerful and easy-to-use tool to distribute and process the data. In our previous article we designed a...
With the ever increasing buzz of Internet of Things (IoT) which now is slowly moving towards Internet of Any Thing (IoAT) we are finding the need of an ideal solution to move the data between various processing platform. The answer to this challenge is provided by...