CHALLENGES IN USING HADOOP VS SPARK



Challenges in using Hadoop vs spark
      Hadoop and Spark are Apache projects, they are Open source and free software products. And both especially designed to run on commodity hardware white box server system. Generally, cost wise both are cheap and equal.
      They are highly compatible with each other. By using IDBC and ODBC spark shares all MapReduce’s data sources and file formats.
      Spark 10 times more faster in batch processing and 100 times faster in memory analytics than MapReduce because MapReduce operates in steps i.e. read data from the cluster, perform an operation, write results to the cluster, read updated data from the cluster, perform next operation, write next results to the cluster, etc. but Spark does all data analytics operations in-memory and in near real-time i.e. Read data from the cluster, perform all of the requisite analytic operations, write results to the cluster, done
      MapReduce's processing style is just fine if our data operations and reporting requirements are mostly static. In case of streaming data, it is not possible in MapReduce we need to go for Spark.
      Spark used in most real-time applications such as online product recommendation, Cyber security analytics and machine log monitoring.
       Failure recovery both different in both the cases but still good. Hadoop is naturally have recovering ability from system faults or failures, since data are written to disk after every operation.
      Spark uses RDD (Resilient Distributed Datasets) which is distributed across data cluster to store data objects. This data objects can be stored in-memory or on disk. Further RDD provides full recovery system from fault or failures

Conclusion
Upon the glance which we made, it seems that we can use one without the other. Hadoop have its own Hadoop Distributed File System and also a processing component called MapReduce, so no need for Spark to do processing. Conversely, we can also use Spark without Hadoop. Spark do not have its own file management system, so it needs to be integrated with one may be HDFS or another cloud based data platform. Spark was designed for Hadoop, however they're better together. It seems that using Spark would be the default choice for any big data application. However, that’s not the case. MapReduce has made inroads into the big data market for businesses that need huge datasets brought under control by commodity systems. Spark’s speed, agility, and relative ease of use are perfect complements to MapReduce’s low cost of operation.

The truth is that Spark and MapReduce have a symbiotic relationship with each other. Hadoop provides features that Spark does not possess, such as a distributed file system and Spark provides real-time, in-memory processing for those data sets that require it. The perfect big data scenario is both Hadoop and Spark to work together on the same team.

Comments

  1. It is nice blog Thank you provide impotent information and i am searching for same information to save my timeBig data hadoop online training


    ReplyDelete
  2. Keep up the great work, I read few blog posts on this site and I believe that your website is really interesting and has loads of good info.
    hadoop big data training in chennai
    best institute for big data in chennai

    ReplyDelete
  3. It is nice blog Thank you porovide important information and i am searching for same information to save my time Big Data Hadoop Online Training

    ReplyDelete
  4. Project Management
    I am impressed with your work and skill.Fabulous outfit.Thank you so much.Good job. Keep posting
    VISUALIZATION SERVICES

    ReplyDelete

Post a Comment

Popular posts from this blog

INTRODUCTION OF HADOOP & SPARK

B.TECH DIP- System Programming