COMPARISON OF IBM, GEPHI, TABLEAU, SIMBA, AZURE, LAMBDA, HADOOP, SPARK AND R

COMPARISON OF IBM, GEPHI, TABLEAU, SIMBA,

AZURE, LAMBDA, HADOOP, SPARK AND R

IBM WATSON ANALYTICS:

Watson Analytics is a smart data analysis and visualization service we can use to quickly discover patterns and meaning in our data – all on our own. With guided data discovery, automated predictive analytics and cognitive capabilities such as natural language dialogue, we can interact with data conversationally to get answers we understand. Whether we need to quickly spot a trend or we have a team that needs to visualize report data in a dashboard, Watson Analytics has we covered.

Anything with textual data would be a perfect scenario for this product as the ultimate aim for it was to create an instance which would be able to process textual data just like the way humans might do it.

It can definitely parse textual data really well.
It has some great features associated with it for analyzing the textual data.
We don't have to know real world coding to use it.

GEPHI:

Gephi is an interactive visualization and exploration platform for all kinds of social networks and complex systems, dynamic and hierarchical graphs.

Gephi is a tool for people that have to explore and understand graphs. Like Photoshop but for graphs, the user interacts with the representation, manipulate the structures, shapes and colors to reveal hidden properties. The goal is to help data analysts to make hypothesis, intuitively discover patterns, isolate structure singularities or faults during data sourcing. It is a complementary tool to traditional statistics, as visual thinking with interactive interfaces is now recognized to facilitate reasoning. This is a software for Exploratory Data Analysis, a paradigm appeared in the Visual Analytics field of research.

Define strength of relationships based on custom weighting variables.
Common algorithms for defining connectedness
Visualizations run on a spreadsheet-formatted table, which makes use more intuitive to non-programming experts
Gephi enables me to create beautiful, publication-ready graphical images.
I can write programs to be incorporated in Gephi.
The open-source platform provides versatility and the GUI allows easy use.

TABLEAU:

Tableau is a family of interactive data visualization and business intelligence software that lets we explore, visualize, and share data securely. While creating our data warehouse if we are ETLing the data, creating all our relationship and hierarchy in a separate tool and pushing them to EDW and are expecting to report and visualize out of a reporting tool, Tableau would be a perfect tool to do that.

Tableau Reader is a free desktop application that we can use to open and interact with data visualizations built in Tableau.
Tableau Desktop helps we analyze data and get quick answers to questions through visualizations, interactive dashboards, and data tables. Connect directly to the Data Warehouse, Excel spreadsheets, and other data sources.
It’s easy enough that any Excel user can learn it, but powerful enough to satisfy even the most complex analytical problems. Securely sharing our findings with others only takes seconds.
Data Source Connectivity- It provides lot of data source connection options. Tableau provides and option to connect to a file (Excel, Text, Access, CSV etc. ), connect to DataBase (Microsoft SQL Server Oracle, Amazon RedShift etc.), ODBC connections, Google Analytics, SAP Hana and many more.
Creating visualization is very easy. It's pretty much drag & drop options for creating visualizations.
Excellent mobile support. Tableau put a lot of effort into developing a robust mobile client. Sensitive Control & Reports are pixel perfect.
Good Customer Support & Low-Cost Maintenance compared to MicroStrategy, Business Objects etc.

SIMBA:

Basically, industry’s choice for standards-based relational and multi-dimensional data access and analytics on in-memory, cloud and big data. Host Analytics, the leader in cloud corporate performance management (CPM), has chosen Simba’s cloud ODBC driver technology to provide customers with easy access and analytics capabilities on their enterprise data.

More than ever, finance professionals make critical business decisions that rely on integrating and analyzing growing volumes of data from marketing, operations, sales and other applications. Simba’s cloud ODBC driver integrates the data from various applications to allow customers to perform sophisticated business analytics and make better bottom-line financial decisions. We are growing partner network to enable the seamless integration of enterprise applications with Host Analytics CPM Suite, Simba provides solution for finance and high-performance analytics in the cloud.

MICROSOFT AZURE and AMAZON LAMBDA

Features	AWS LAMBDA	MS AZURE
Scalability & availability	Automatic scaling (transparently)	Manual or metered scaling (App Service Plan), or sub-second automatic scaling (Consumption Plan)
Max # of functions	Unlimited functions	Unlimited functions
Concurrent executions	1000 parallel executions per account, per region (soft limit)	No limit
Max execution	300 sec (5 min)	300 sec (5 min)
Supported languages	JavaScript, Java, C#, and Python	C#, JavaScript, F#, Python, Batch, PHP, PowerShell
Dependencies	Deployment Packages	Npm, NuGet
Deployments	Only ZIP upload (to Lambda or S3)	Visual Studio Team Services, OneDrive, Local Git repository, GitHub, Bitbucket, Dropbox, External repository
Environment variables	Yes	App Settings and ConnectionStrings from App Services
Versioning	Versions and aliases	Cloud Source branch/tag
Event-driven	S3, SNS, SES, DynamoDB, Kinesis, CloudWatch, Cognito, API Gateway, CodeCommit, etc.	Blob, EventHub, Generic WebHook, GitHub WebHook, Queue, Http, ServiceBus Queue, Service Bus Topic, Timer triggers
HTTP(S) invocation	API Gateway	HTTP trigger
Orchestration	AWS Step Functions	Azure Logic Apps
Logging	CloudWatch Logs	App Services monitoring
Monitoring	CloudWatch & X-Ray	Application Insights
In-browser code editor	Yes	Functions environment, App Service editor
Granular IAM	IAM roles	IAM roles
Pricing	1M requests for free, then $0.20/1M invocations, plus $0.00001667/GB-sec	1 million requests for free, then $0.20/1M invocations, plus $0.000016/GB-s

HADOOP and SPARK

· Hadoop and Apache Spark both are big-data frameworks, but direct comparison of Hadoop and Spark is difficult because they do many of the same things, but are also non-overlapping in some areas.

· Hadoop is essentially a distributed data infrastructure, It distributes massive data collections across multiple nodes within a cluster of commodity servers, which means we don't need to buy and maintain expensive custom hardware. It also indexes and keeps track of that data, enabling big-data processing and analytics far more effectively than was possible previously.

· Spark, on the other hand, is a data-processing tool that operates on those distributed data collections; it doesn't do distributed storage

· Hadoop and Spark are Apache projects, they are Open source and free software products. And both especially designed to run on commodity hardware white box server system. Generally, cost wise both are cheap and equal.

· They are highly compatible with each other. By using IDBC and ODBC spark shares all MapReduce’s data sources and file formats.

· Spark 10 times more faster in batch processing and 100 times faster in memory analytics than MapReduce because MapReduce operates in steps i.e. read data from the cluster, perform an operation, write results to the cluster, read updated data from the cluster, perform next operation, write next results to the cluster, etc. but Spark does all data analytics operations in-memory and in near real-time i.e. Read data from the cluster, perform all of the requisite analytic operations, write results to the cluster, done

· MapReduce's processing style is just fine if our data operations and reporting requirements are mostly static. In case of streaming data it id not possible in MapReduce we need to go for Spark.

· Spark used in most real-time applications such as online product recommendation, Cyber security analytics and machine log monitoring.

· Failure recovery both different in both the cases but still good. Hadoop is naturally have recovering ability from system faults or failures, since data are written to disk after every operation.

· Spark uses RDD (Resilient Distributed Datasets) which is distributed across data cluster to store data objects. This data objects can be stored in-memory or on disk. Further RDD provides full recovery system from fault or failures

Oracle has adopted R as a language and environment to support Statisticians, Data Analysts, and Data Scientists in performing statistical data analysis and advanced analytics, as well as generating sophisticated graphics. In addressing the enterprise and the need to analyze Big Data, Oracle provides R integration through four key technologies:

· Oracle's supported redistribution of open source R, provided as a free download from Oracle, enhanced with dynamic loading of high performance linear algebra libraries.

· Integration of R with Oracle Database. A component of the Oracle Advanced Analytics Option. Oracle R Enterprise makes the open source R statistical programming language and environment ready for the enterprise with scalability, performance, and ease of production deployment.

· High performance native access to the Hadoop Distributed File System (HDFS) and MapReduce programming framework for R users. Oracle R Advanced Analytics for Hadoop is a component of Oracle Big Data Connectors software suite.

· An open source R package, maintained by Oracle and enhanced to use the Oracle Call Interface (OCI) libraries to handle database connections - providing a high-performance, native C-language interface to Oracle Database.

Search This Blog

Share Your Knowledge

COMPARISON OF IBM, GEPHI, TABLEAU, SIMBA, AZURE, LAMBDA, HADOOP, SPARK AND R

Comments

Post a Comment

Popular posts from this blog

INTRODUCTION OF HADOOP & SPARK

B.TECH DIP- System Programming

B.TECH CSE & ECE Computer Networking