October 9, 2019How Digital Transformation Enables Big Data Analytics
A version of this article originally appeared on the Cloudera VISION blog.
One of my favorite parts of my job at AtScale is that I get to spend time with customers and prospects, learning what’s important to them as they move to a modern data architecture. Lately, a consistent set of six themes has emerged during these discussions. The themes span industries, use cases and geographies, and I’ve come to think of them as the key principles underlying an enterprise data architecture.
Whether you’re responsible for data, systems, analysis, strategy or results, you can use the 6 principles of modern data architecture to help you navigate the fast-paced modern world of data and decisions. Think of them as the foundation for data architecture that will allow your business to run at an optimized level today, and into the future.
1. View data as a shared asset.
Enterprises that start with a vision of data as a shared asset ultimately outperform their competition, as CIO explains. Instead of allowing departmental data silos to persist, these enterprises ensure that all stakeholders have a complete view of the company. And by “complete,” I mean a 360-degree view of customer insights along with the ability to correlate valuable data signals from all business functions, including manufacturing and logistics. The result is improved corporate efficiency.
2. Provide the right interfaces for users to consume data in modern data analytics architectures.
Putting data in one place isn’t enough to achieve the vision of a data-driven culture. The days of purely using a systems-based data warehouse are long gone. Modern data architecture requires enterprises to use data warehouses, data lakes, and data marts to meet scalability needs.
Your head might be spinning right now. How do warehouses, lakes, and marts function within a modern data analytics architecture? Here’s an easy breakdown:
- Data warehouses: The central location where all data is stored
- Data lakes: A smaller repository of specific data stored in its raw format
- Data marts: The serving layer — AKA a simplified database focused on a specific team or line of business
To take advantage of this structure, data needs to be able to move freely to and from warehouses, lakes, and marts. And for people (and systems) to benefit from a shared data asset, you need to provide the interfaces that make it easy for users to consume that data. This might be in the form of an OLAP interface for business intelligence, an SQL interface for data analysts, a real-time API for targeting systems, or the R language for data scientists. In the end, it’s about letting your people work with the tools they know and are right for the job they need to perform.
3. Ensure security and access controls.
Unified data platforms like Snowflake, Google BigQuery, Amazon Redshift, and Hadoop necessitated the enforcement of data policies and access controls directly on the raw data, instead of in a web of downstream data stores and applications. Data security projects like Apache Sentry makes this approach to unified data security a reality. Look to technologies that secure your modern data architecture and deliver broad self-service access, without compromising control.
4. Establish a common vocabulary.
By investing in an enterprise data hub, enterprises can now create a shared data asset for multiple consumers across the business. That’s the beauty of modern data analytics architectures. However, it’s critical to ensure that users of this data analyze and understand it using a common vocabulary. Product catalogs, fiscal calendar dimensions, provider hierarchies and KPI definitions all need to be common, regardless of how users consume or analyze the data. Without this shared vocabulary, you’ll spend more time disputing or reconciling results than driving improved performance.
5. Curate the data.
Curating your data is essential to effectively implement a modern data analytics architecture. Time and time again, I’ve seen enterprises that have invested in Hadoop or a cloud-based data lake like Amazon S3 or Google Cloud Platform start to suffer when they allow self-serve data access to the raw data stored in these clusters. Without proper data curation (which includes modeling important relationships, cleansing raw data and curating key dimensions and measures), end users can have a frustrating experience—which will vastly reduce the perceived and realized value of the underlying data. By investing in core functions that perform data curation, you have a better chance of realizing the value of the shared data asset.
6. Eliminate data copies and movement throughout your modern data architecture.
Every time data is moved there is an impact; cost, accuracy and time. Talk to any IT group, or business user for that matter, and they all agree; the fewer times data has to be moved, the better. Part of the promise of cloud data platforms and distributed file systems like Hadoop is a multi-structure, multi-workload environment for parallel processing of massive data sets. These data platforms scale linearly as workloads and data volumes grow. By eliminating the need for additional data movement, modern enterprise data architectures can reduce cost (time, effort, accuracy), increase “data freshness” and optimize overall enterprise data agility.
Regardless of your industry, the role you play in your organization or where you are in your big data journey, I encourage you to adopt and share these principles as a means of establishing a sound foundation for building a modern big data architecture. While the path can seem long and challenging, with the right framework and principles, you can successfully make this transformation sooner than you think.
Tell us about your core principles to Modern Data Architecture. What do you insist on day in and day out to manage big data for your organization? We’d love to know your insights.
Ready to take the next step in your big data journey? Learn about implementing ways to scale data analytics via modern data models, like the hub-and-spoke approach.
About the Author: As head of product management, Josh drives AtScale’s product roadmap and strategy. He started his career in data and analytics as the product manager for the first “Datamart in a Box” at Broadbase, and he ran product management at Yahoo! for one of the largest data and analytics operations in the world. Josh joined AtScale from Pivotal, where he was responsible for data products such as Greenplum, Pivotal HD and HAWQ.