Data Literacy is a capability and set of skills that enable insights consumers, creators and enablers to understand what data is, how to use it and how to learn from it, including answering business questions to make decisions and take actions with greater clarity, confidence and common purpose / alignment. In short, Data literacy describes the ability to read, analyze, and argue with data.
The purpose of Data Literacy is to enable the enterprise to be able to use data to improve their understanding, learning, answering business questions, planning, deciding and acting, including understanding what data is and how to use it.
Key Capabilities of Data Literacy Program
Key capabilities to consider when implementing a Data Literacy program are as follows:
- Having a data-centric analytical mindset
- Understanding data as a means to answer questions more effectively with greater clarity, confidence and alignment in terms of understanding / interpretation
- Different types of data
- Common data sources
- Types of analysis
- Data hygiene Tools, techniques, and frameworks
Data literacy can help non-data professionals read and understand data and use it to inform their decision-making. As such, data literacy is increasingly important not just for executive leadership, but for managers and employees who want to increase the value they bring to their organization.
Key Components of a Data Literacy Program
Data analysis refers to reading and interpreting data to glean insights from it. While analysis can be conducted using statistical models, algorithms, and other complex tools and frameworks, you can also achieve it by simply reviewing data and drawing conclusions from it.
There are several types of data analysis you can use. Four of the most common are:
- Descriptive analysis, which seeks to explain or describe what has happened
- Diagnostic analysis, which seeks to explain or diagnose why something has happened
- Predictive analysis, which seeks to forecast what might happen
- Prescriptive analysis, which seeks to prescribe a course of action that will lead to a desired outcome
Data wrangling is the act of transforming data from a raw state into a form that can be more readily used. The practice is also commonly known as data munging or data cleaning. While data wrangling can take many forms, the most common examples involve removing errors and filling gaps in data.
Data visualization is the process of creating graphical or visual representations of data and often a crucial piece of effectively communicating insights. Data visualizations serve an important role in making data more accessible to others both inside and outside an organization—especially those who may not be data literate. An example of data visualization in action is a chart or graph that helps investors understand a company’s quarterly earnings report.
The Data Ecosystem
The concept of the data ecosystem refers to all of the components an organization leverages to collect, store, and analyze data. This includes physical infrastructure, such as server space and cloud storage solutions, and non-physical components, such as data sources, programming languages, code packages, algorithms, and software.
Data governance refers to the processes and practices an organization uses to formally manage its data assets. The concept can be likened to a rulebook specifically designed to ensure an organization’s data remains accurate, secure, and complete. In fact, many organizations distribute a company “data policy” to new hires alongside the employee handbook.
The Data Team
Finally, it’s important to understand who the key players are on your organization’s data team and the different roles they play—regardless of whether you directly work with them.
Data teams can be structured in several ways depending on your organization’s size and how prominently data is leveraged in day-to-day activities. That being said, most data teams include:
- Data scientists, who leverage advanced mathematics, programming, and tools to conduct and manage large-scale analyses
- Data engineers, who are responsible for building and maintaining datasets that are leveraged in data projects
- Data analysts, who conduct the majority of the analyses an organization requires
Primary Uses of Data Literacy
Data Literacy is used to increase organizational capabilities to use data to improve understanding, learning, answering business questions, planning, deciding and taking action with greater speed, clarity, confidence and alignment / interpretation.
Data literacy skills are not only required by the analytics or the IT team; all departments and roles within an organization can benefit from data literacy skills. Data literacy enables employees to ask the right questions, gather the right data and connect the right data points to derive meaningful and actionable business insights. It also ensures that all employees understand how to manage and use data in ways that are ethical and compliant.
Business Benefits of Building Data Literacy
The main benefits of Data Literacy are greater speed, clarity, confidence and alignment / interpretation for understanding, learning, answering business questions, planning, deciding and taking action.
Common Roles and Responsibilities of a Data Literacy Program
Roles important to Data Literacy are as follows:
- Insights Consumers – Insights consumers (e.g. business leaders and analysts) are responsible for using insights and analyses created by insights creators to improve business performance, including through improved awareness, plans, decisions and actions.
- Insights Creators – Insights creators (e.g. data analysts) are responsible for creating insights from data and delivering them to insights consumers. Insights creators typically design the reports and analyses, and often develop them, including reviewing and validating the data. Insights creators are supported by insights enablers.
- Insights Enablers – Insights enablers (e.g. data engineers, data architects, BI engineers) are responsible for making data available to insights creators, including helping to develop the reports and dashboards used by insights consumers.
- Business Owner – There needs to be a business owner who understands the business needs for data and subsequent reporting and analysis. This to ensure accountability, actionability as well as ownership for data quality and data utility based on the data model. The business owner and project sponsor are responsible for reviewing and approving the data model as well as the reports and analysis that OLAP will generate. For larger, enterprise-wide insights creation and performance measurement, a governance structure should be considered to ensure cross-functional engagement and ownership for all aspects of data acquisition, modeling and usage: reporting, analysis.
- Data Analyst / Business Analyst – Often a business analyst or more recently, data analyst are responsible for defining the uses and use cases of the data, as well as providing design input to data structure, particularly metrics, business questions / queries and outputs (reports and analyses) intended to be performed and improved. Responsibilities also include owning the roadmap for how data is going to be enhanced to address additional business questions and existing insights gaps.
Common Business Processes Associated with Data Literacy
The process for developing and deploying Data Literacy is as follows:
- Establish a data literacy program
- Establish a data literacy leader, influential with business users
- Prioritize program mobilization and expansion to key functions / users
- Enlist data analysts and data scientists to develop and deliver program
- Monitor and measure engagement, learning, utilization and impact
- Ensure that roles and responsibilities for using data to make decisions are clear
- Pick key use cases and functions that will deliver maximum impact with lowest amount of potential friction
- Use actual data and tools used by the organization for training
- Seek executive sponsorship and communication to share importance, progress
- Utilize feedback to improve program as well as supporting capabilities
Common Technologies Associated with Data Literacy
Technologies involved with the Data Literacy are as follows:
- Data Products – Data Products are a self-contained dataset that includes all elements of the process required to transform the data into a published set of insights. For a Business Intelligence use case, the elements are data set creation, data model / semantic model and published results, including reports, analyses that may be delivered via spreadsheets or BI application.
- Data Preparation – Data preparation involves enhancing it and aggregating it to make it ready for analysis, including to address a specific set of business questions.
- Data Querying – Technologies called Online Analytical Processing (OLAP) are used to automate data querying, which involves making requests for slices of data from a database. Queries can also be made using standardized languages or protocols such as SQL. Queries take data as an input and deliver a smaller subset of the data in a summarized form for reporting and analysis, including interpretation and presentation by analysts for decision-makers and action-takers.
- Data Catalog – These applications make it easier to record and manage access to data, including at the source and dataset (e.g. data product) level.
- Semantic Layer – Semantic layer applications enable the development of a logical and physical data model for use by OLAP-based business intelligence and analytics applications. The Semantic Layer supports data governance by enabling management of all data used to create reports and analyses, as well as all data generated for those reports and analyses, thus enabling governance of the output / usage aspects of input data.
- Data Governance Tools – These tools automate the management of access to and usage of data. They can also be used to manage compliance by searching across data to determine if the format and structure of the data being stored complies with policies..
- Business Intelligence (BI) Tools – These tools automate the OLAP queries, making it easier for data analysts and business-oriented users to create reports and analyses without having to involve IT / technical resources.
- Visualization tools – Visualizations are typically available within the BI tools and are also available as standalone applications and as libraries, including open source.
- Automation – Strong emphasis is placed on automated all aspects of the process for developing and delivering integrated data sets from hybrid-cloud environments.
Trends / Outlook for Data Literacy
Key trends for the Data Literacy are as follows:
- Semantic Layer – The semantic layer is a common, consistent representation of the data used for business intelligence used for reporting and analysis, as well as for analytics. The semantic layer is important, because it creates a common consistent way to define data in multidimensional form to ensure that queries made from and across multiple applications, including multiple business intelligence tools, can be done through one common definition, rather than having to create the data models and definitions within each tool, thus ensuring consistency and efficiency, including cost savings as well as the opportunity to improve query speed / performance.
- Automation – Increase emphasis is being placed by vendors on ease of use and automation to increase ability to scale data governance management and monitoring. This includes offering “drag and drop” interfaces to execute data-related permissions and usage management.
- Observability – Recently, a host of new vendors are offering services referred to as “data observability”. Data observability is the practice of monitoring the data to understand how it is changing and being consumed. This trend, often called “dataops” closely mirrors the trend in software development called “devops” to track how applications are performing and being used to understand, anticipate and address performance gaps and improve areas proactively vs reactively.
AtScale and Data Literacy
AtScale’s semantic layer improves data literacy implementation by enabling faster insights creation via rapid data modeling for AI and BI, including performance via automated query optimization. The Semantic Layer enables development of a unified business-driven data model that defines what data can be used, including supporting specific queries that generate data for visualization. This enables ease of tracking and auditing, and ensures that all aspects of how data are defined, queried and rendered across multiple dimensions, entities, attributes and metrics, including the source data and queries made to develop output for reporting, analysis and analytics are known and tracked.