BI Tools: The Do’s & Don’t’s of Integrating Hadoop

Face-cover 01-01
Big data can translate to big wins for your company, but making it work means working smarter. Hadoop makes it simple to distribute storage and process very large data sets. Make Hadoop work for you even further by pairing it with your BI tools like Tableau, Excel and Qlik. Read on to understand best practices of BI on Hadoop with the following do’s and don’t’s.

Don’t

Move & copy data

Moving and copying data has been necessary since the beginning of data warehousing because data had to be put into a form that was acquariable. With the advent of big data tools like Hadoop, and with companies like Atscale, data movement is no longer necessary. The side effects like data redundancies no longer occur.

Instead of moving and copying data, query the data in one place with Hadoop. Rather than writing data three times, you should be able to write it once.

Have multiple definitions of reality

We started with wanting to have a single data warehouse to house big data. We’ve ended up with a series of data marts with unconsolidated data. Each department/function builds their own vertical stack, so data integrity is put into question and the management of that data is distributed to the different lines of business. This process is difficult and costly to manage. Business users are defining their own definitions of reality, and that causes trouble, calling into question the integrity of the data.

Instead of having multiple definitions of reality, use Hadoop to create a single semantic layer. Hadoop can act as a centralized enterprise data warehouse, or a data lake. You don’t have to worry about pre-formatting data; you just store it. Worry about transforming it or making use of it later.

Scale up with proprietary hardware

Scaling up will limit you to a single machine or server.

Instead of scaling up, scale out with your Hadoop cluster. With Hadoop, all data, regardless of form or function, works in a single data architecture. This allows you to horizontally scale the data infrastructure by buying hardware and not changing any processes. Performance stays constant.

Do relational schemas

Relational schemas are limiting. Fitting business semantics into rows and columns is getting more difficult to do with the proliferation of data and the data collecting that is being done today.

Instead of doing relational schemas, leverage Hadoop’s schema-on-demand.

Lock yourself into proprietary stacks

Be open in terms of the engines you’re using to access data and the methods being used to query that data. Hadoop breaks the proprietary lock and allows you to do the same data distribution but in an open-source environment and at open-source costs. Use open-source engines and any BI tools.

As a reminder:

Do

Query your data in one place.
Create a single semantic layer.
Scale out, not up, with your Hadoop cluster.
Leverage Hadoop’s schema-on-demand.
Use open-source engines and any BI tool.

Find better insights into your big data analytics with a solid strategy. Hadoop delivers speed-of-thought performance so that you can understand where your big data is today and where it needs to be going tomorrow.

NEW BOOK

Make Insights Actionable with AI and BI - book stack

DOWNLOAD NOW

March 5, 2019

What’s the best BI tool for Hadoop?

March 4, 2019

Three Important Questions to ask about Chief Data Officers

The Ultimate Big Data Architecture Checklist

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others

Semantic Layer Summit: Join Industry Leaders on April 24, 2024

BI Tools: The Do’s and Don’t’s of Integrating Hadoop

Don’t

Do

NEW BOOK