Five Sure-Fire Ways to Completely Ruin Your Data

If you’ve been following our blog, you know that we love to share our best practices and industry expert advice with you. What should you avoid doing with your data? Five members of the AtScale team share their thoughts.

Name: Dave Mariani, AtScale Co-Founder and Chief Strategy Officer

A: I think one of the biggest mistakes data engineers make is what I call “premature aggregation”. That is, summarizing data too early in the data processing pipeline which makes the fine-grained data impossible or extremely difficult to access for business users and data scientists. Rather, I recommend that customers store all the data that they capture “as is” as files in a data lake and only summarize data if absolutely necessary for performance reasons.

Name: Chris Oshiro, Field CTO

A: While there are advantages to denormalizing the data for some multidimensional use cases, I believe one of the biggest mistakes is denormalizing as a default behavior and trying to denormalize the data at a super granular level. While this begins to avoid the JOINs which can be costly, the technique of denormalization often creates very sparsely filed data given the combination of different datasets that naturally don’t belong in a single table. This impacts all of your calculations; you have to be extra careful around things like NULL handling for example. Also different data sets are recorded at different levels of granularity, so denormalization often aggregates data up to the least granular dataset. This loses a ton of data and again adds even more complexity to calculations because you could lose proper counts.

Name: Stella Valcheva, BI Expert

A: Absence of a company-wide data strategy leads to poor quality and a lot of overhead for consolidation and analysis. If data is not handled according to a corporate standard, a company would end up having to deal with data silos, misinterpretation and eventually wrong insights. In case the different departments or geo-locations have different understanding of the data and its role in the company, it becomes very costly to achieve full data synchronisation and therefore unlikely to rely on it for important decision making. Moreover, it is not a problem that could be solved solely by introducing an integrated IT solution. It is the company culture and the responsibility of each employee, beginning from the C-levels, that would turn data into an asset rather than a liability.

Name: Petar Staykov, Senior BI Architect

A: I think a common mistake that companies make is to organize their data in project-related structures rather than domain-related ones. By doing it this way, every project adds datasets according to its goals without taking into account the already delivered data sets from other projects. Over time, the data redundancy introduced grows and grows and different transformations are executed in different projects over similar data sets. Company data should be business process-oriented and not project-specific. In this way, every single project contributes to the common semantic layer and supports the company’s data-driven strategy.

Name: Gergana Ilieva, BI Expert

A: Data is a double-edged sword, razor sharp on both sides. As organizations engage with increasing volumes, the need for company-wide governance becomes even more acute. Often data is locked away and accessible only by certain teams. Different business units operate in different ways and can’t see the value of data held in other places. Data is often stored in disjointed places with no clear structure; bits are missing; and there is no collection protocol. Often companies do not have a central oversight on the process, ending up with increasing expenses due to data redundancy, inaccurate or incomplete insights, ineffective operations, and data leaks.

Related Reading:

NEW BOOK

Make Insights Actionable with AI and BI - book stack

DOWNLOAD NOW

December 19, 2019

Top 5 Big Data Analytics Tools of 2019

June 13, 2019

What is Unified Data, and Why Do You Need It?

April 29, 2019

CDO Version.2019: Seven Keys To Defining The Job

Privacy Overview

This website uses cookies to improve your experience while you navigate through the website. Out of these, the cookies that are categorized as necessary are stored on your browser as they are essential for the working of basic functionalities of the website. We also use third-party cookies that help us analyze and understand how you use this website. These cookies will be stored in your browser only with your consent. You also have the option to opt-out of these cookies. But opting out of some of these cookies may affect your browsing experience.

Necessary

Always Enabled

Necessary cookies are absolutely essential for the website to function properly. These cookies ensure basic functionalities and security features of the website, anonymously.

Cookie	Duration	Description
cookielawinfo-checkbox-analytics	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Analytics".
cookielawinfo-checkbox-functional	11 months	The cookie is set by GDPR cookie consent to record the user consent for the cookies in the category "Functional".
cookielawinfo-checkbox-necessary	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookies is used to store the user consent for the cookies in the category "Necessary".
cookielawinfo-checkbox-others	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Other.
cookielawinfo-checkbox-performance	11 months	This cookie is set by GDPR Cookie Consent plugin. The cookie is used to store the user consent for the cookies in the category "Performance".
viewed_cookie_policy	11 months	The cookie is set by the GDPR Cookie Consent plugin and is used to store whether or not user has consented to the use of cookies. It does not store any personal data.

Functional

Performance

Analytics

Others