HADOOP
BDI's Big Data Approach
For more details on Big Data please Contact us : Sales@bdisys.com

Why Big Data with BDI ?

This document gives an insight of how BDI Systems approaches BigData & Analytics. This technology can be applied across all the verticals to deliver relavent Big Data Reports.
Background:
In the classic Analytics space we have either Analytics based companies where people do algorithmic crunching and share the findings as a white paper where the findings are dead after some time. We have top notch vendors like SAP, Oracle, MSFT, and IBM who will charge licensing cost of database, platform, and 20-25 different products that they have acquired over the course of last 10 yrs. The Big 4s are coming up with good solutions for Big Data but they all are extremely costly with few talent bases available. With the advent of Big Data and Hadoop based infrastructure, today's analytics companies need to have all skills, they should be able to setup Big Data Engineering lab with engineers knowing the best method to deal with large amount of data and being data scientists as well. Finally you need a strong BI Visualization framework or tool where dashboards can be seen on all devices. Companies like BDI who have all the capabilities and ability to work end to end using home-grown Utilities & open source software can do this job at best cost and gives real time analytics.
 
What is Big Data ?
Well everybody knows this now, but let's represent this into a diagram which is explained by experts by putting all sorts of data into a data cobweb.
 
Why Big Data Analytics is required ?
Everyone knows that data volumes are growing exponentially. What's not so clear is how to unlock the value it holds. To improve the health of a person we monitor all parameters but in case of an organization more than 50% data is unstructured or partially structured but we don't use that to check the health of organization. A health of an organization is always relative. In case of internet of things, we need to keep a close watch on what is happening in our business dimension vis-à-vis competition. The organizations which don't do Big Data Analytics will probably perish in next 10 years or we can say the organizations who did Big Data in last 5-10 years are ruling the internet world today. The internet of things will bring all business platforms on Internet and Big Data will decide the financial growth, competitiveness and target markets of any progressive organization.
 
Key Skills to provide Big Data Analytics
In this document there will be repetition around these 4 Skills. In my view all these 4 skills are important to deliver a profitable real time Big Data Platform to an enterprise.
BDI either directly or through any one SME partner is able to provide all these 4 skills. This way customer can be completely sure that one vendor is able to take care of all 4 legs of Big Data Analytics & the vendor is able to get world class support for a long time.
 
What are the pre-requisites for Big Data Implementation
Different analysts have been talking about ROI from Big Data Implementations. Some says 25% return, some says 55 cents return out of 1$ investment. All these data baffles us. According to us minimum 30$ return out of $1 invested, in 5 years should be there else there is something grossly mistaken. One of the key points is to ensure all following pre-requisites are being met before you start a Big Data Project.
You have a full-fledged BI implementation system and you are happy with the ROI it has delivered to your organization.
You have done 'What-IF' analysis of your key Financial Parameters and have been flexible enough to make changes in your organization.
You have been taking regular feedback or Survey from your customers, partners, vendors etc.
You have defined a clear problem which you want to solve by Big Data Implementation.
You have allocated a specific budget to solve this problem. Part of the budget is to allocate internal resources time who can work with a Big Data partner/Vendor like BDI.
 
Note: In case you have not done 1, 2 & 3. We suggest you to take BDI's BI consultancy services and do complete BI implementation and ask your team to implement it.
Exception to Pre-Requisites
Suppose you have identified a separate module where you directly want to leverage Hadoop based data warehouse implementation to save cost/tb than you should go ahead but this is not a Big Data Implementation Project, it is like a POC.
1. Subject Matter Expert at Work
The domain expert understands the problem statement and defines the problem in a simpler way that technical team use to solve it.
Knows where the data is residing, which data is useful and what data can be used for which type of Analytics.
Can design different Scorecards, Algorithmic charts, Benchmarking Analysis reports & dashboards.
Work with end client to define the real problem, define scope and agree on a scope & drives technical team.
 
2. Unlock Big Data - Working on Data Collection Layer
Understand existing data sources.
Search and navigate data within existing systems.
Reading Web Data - Crawling or Scraping of data. Data can be even scrapped from Image, PDF, Doc, Audio file, Video file.
Reading Social Media Data - we have a connector through which we can read data from FB, Twitter, Linked-in etc. - This is a tool developed by BDI. More details on this in another blog on BDI website.
Reading Structured or Unstructured data from WEB Data Apps like Sales Force, Google Analytics etc.
Providing end to end Survey Services where data can be trapped as a normal survey or as text based Survey. BDI has complete end to end Survey Platform and Services [www.BDIsurvey.com]
We have dynamic HTML5 forms where metric data can be entered from Mobile devices which will be directly used for dashboards - This is tool provided by BDI which is part of our HTML5 portal which runs on all devices.
 
3. Data Processing Layer
Data Clean Up - The unstructured data can be too large and most of it might be meaningless. But its collective message is meaningful and impactful. One needs to filter this data. Convert data to lower case, remove Punctuation marks, stem words for exact matches etc. Use NLP - Natural Language processing techniques at this step.
Categorization & Classification of Data - Use Machine Learning Tools like Apache Mahout or Enterprise R. Both are different tools that provide different Algorithms on Clustering and Classification. Automated Text Conversion is also used here for proper classification of unstructured data.
Finding the relations of different data and pushing this into Hadoop File System. It uses various tools and paves the way for modern data warehousing that will change the manner in which we think of a conventional database.
Hadoop Framework -
Hadoop based DWH Implementation Benefits:-
In Hadoop you don't need to know what questions need to ask before designing data warehouse - Hadoop is flexible
Simple Algorithms on Big Data outperform complex models
Powerful ability to analyse unstructured data
You can save Millions in TCO
10x Faster, 100X Cheaper long term solution
Maintains the same SLAs as you have been maintaining
Changes can be implemented without impacting users
 
Data Organization Layer
The relevant data can now be moved into HIVE [Again a Row- Column DB like MySQL or Oracle].
A query can be written on this data [Hive Query Language to get relevant data].
The latency is high at Hive so a Data Mart Layer is being created. In memory like Spark/Shark are in R&D stage but will come out soon to give in-memory flavour to open source database.
Here different tools like Cloudera, Impala etc. can also be used as a MPP [massively parallel processing] query engine.
Here we can arrange data from other Structured Databases as well.
 
Data Warehousing Layer
Bring in Hive data using a Cron job or a scheduler.
Bring in relevant structured data like Finance, Production, Inventory, Sales, and HR etc. from different systems.
Apply traditional DWH practices to read the data in easiest possible way.
Inspect the existing DWH and improve them further for more effective Big Data reports.
 
Data Connector Layer of Analytic Engine
Once we have the relevant data, now using a connector we can read the data in an analytics Engine - it can be R Server or Any third Party Analytic Application like Tableau or Jasper or SAP Business Objects or BDI?s
The data is fed into R Server and data is pushed back into Server layer of BI Visualization framework.