Posts

Showing posts with the label Big Data

CHALLENGES IN USING HADOOP VS SPARK

Challenges in using Hadoop vs spark •       Hadoop and Spark are Apache projects, they are Open source and free software products. And both especially designed to run on commodity hardware white box server system. Generally, cost wise both are cheap and equal. •       They are highly compatible with each other. By using IDBC and ODBC spark shares all MapReduce’s data sources and file formats. •       Spark 10 times more faster in batch processing and 100 times faster in memory analytics than MapReduce because MapReduce operates in steps i.e. read data from the cluster, perform an operation, write results to the cluster, read updated data from the cluster, perform next operation, write next results to the cluster, etc. but Spark does all data analytics operations in-memory and in near real-time i.e. Read data from the cluster, perform all of the requisite analytic operations, write results to the clust...

INTRODUCTION OF HADOOP & SPARK

INTRODUCTION Hadoop and Apache Spark both are big-data frameworks, but direct comparison of Hadoop and Spark is difficult because they do many of the same things, but are also non-overlapping in some areas. Hadoop is essentially a distributed data infrastructure, It distributes massive data collections across multiple nodes within a cluster of commodity servers, which means you don't need to buy and maintain expensive custom hardware. It also indexes and keeps track of that data, enabling big-data processing and analytics far more effectively than was possible previously. Spark, on the other hand, is a data-processing tool that operates on those distributed data collections; it doesn't do distributed storage   Hadoop have many components of modules that work together to create the Hadoop framework. The primary Hadoop framework modules are: ·          Hadoop Common ·  ...

INTRODUCTION TO BIG DATA

Big Data is a general term for data sets that are so large or complex data that outdated data processing application software is insufficient to deal with them. Normally big data refers to three major terms such as Predictive analytics, Descriptive Statistics, prescriptive analytics. Today we will discuss about this major term and their application areas. Predictive analysis: Predictive analytics is the use of data, statistical algorithms and machine learning techniques to identify the likelihood of future outcomes based on historical data. The goal is to go beyond knowing what has happened to providing a best assessment of what will happen in the future. Any industry which is trying to reduce rick in future will use this analysis method such as Banking & Financial Services , Retail , Oil, Gas & Utilities , Governments & the Public Sector , Health care, Health Insurance , Manufacturing units. Descriptive Statistics: Descriptive statistics are brief ...