Developing a culture of and commitment to viewing data as an asset within your organization not only ensures good governance — and compliance with evolving privacy regulations — it also gives your business the insights needed to thrive in the rapidly changing digital world.
Understand the lifecycle of data as an asset
To encourage good data management processes, it’s important to understand the lifecycle of a data asset originating outside of IT. In these cases, data from multiple sources is blended and prepped for consumption, which typically includes steps to validate, cleanse, and optimize data based on the consumption need — and because these processes happen outside of IT, be on the lookout for potential security or governance gaps.
While individual circumstances vary, from a big picture perspective the data asset development lifecycle generally follows these steps:
- Intake: Data assets can only be created or derived from other datasets to which the end-user already has access. While traditionally this was more focused on internal datasets, blending with external data, such as market, weather, or social, is now more common. Ask:
- How are new requests for information captured?
- Once captured, how are they reviewed and validated?
- How is the information grouped or consolidated?
- How is the information prioritized?
- Design: Once the initial grouping takes place, seeing data as an asset requires thoughtful design that fits in with the structure of other data sets across the organization. Ask:
- How will new datasets be rationalized against existing sets?
- How will common dimensions be conformed?
- How does the consumption architecture affect the homogeneity of data sets being created?
- Curation: Depending on the source, data might be more or less reliable, but even lower confidence information can be extremely valuable in aggregate, as we’ve seen historically with third-party cookies. The more varied the sources contributing to a data asset, the greater the need for curation, cleansing, and scoring. Ask:
- How will the data be cleansed and groomed based on the consumer’s requirements?
- Will different “quality” or certification levels of the data be needed?
- Output: Organizations that view data as an asset prioritize sharing across business units and between tools. Consider implementing standards for data asset creation that take connectivity and interoperability into account. Ask:
- How will data be delivered?
- Will it include a semantic layer that can be consumed by visualization tools?
- Will the data asset feed into a more modern data marketplace where customers (end users) can shop for the data they need?
- Understanding: As a shared resource, data assets require standardized tagging to ensure maximum utility. Ask:
- How will metadata (technical and business) be managed and made available for consumers for these sets?
- How is the business glossary populated and managed?
- Access: To maintain legal and regulatory compliance and avoid costly mistakes, good governance requires access management. Ask:
- Who will have access to various delivered assets?
- Will control require row- or column-level security, and if so, what’s the most efficient and secure way to implement those controls?
Explore tools that streamline data asset preparation
In many organizations, the data asset lifecycle is no longer a linear journey, where all data proceeds from collection to analysis in an orderly progression of steps. With the advent of the data lake, the overall reference architecture for most companies now includes a “marshaling” or staging sector that allows companies to land vast amounts of data — structured, unstructured, semi-structured, what some have labeled collectively as “multi-structured” or “n-structured” — in a single region for retrieval at a later time.
Data may later be consumed in its raw form, slightly curated to apply additional structure or transformation, or groomed into highly structured and validated fit-for-purpose, more traditional structures.
Podium Data developed a useful metaphor when speaking of these three levels of data asset creation.
- “Bronze” refers to the raw data ingested with no curation, cleansing, or transformations.
- “Silver” refers to data that has been groomed in some way to make it analytics-ready.
- “Gold” refers to data that has been highly curated, schematized, and transformed suitable to be loaded into a more traditional data mart or enterprise data warehouse (EDW) on top of a more traditional relational database management system.
To streamline the creation of assets at each of those levels, many organizations adopt self-service tools to ensure standard processes while democratizing asset creation. While the vendor landscape is wide in this area, the following three examples represent key functionality:
- Podium, like Microsoft and others, adopted a “marketplace” paradigm to describe developing data assets for consumption in a common portal where consumers can “shop” for the data they need. Podium provides its “Prepare” functionality to schematize and transform data residing in Hadoop for a marketplace type of consumption.
- AtScale is another Hadoop-based platform for the preparation of data. It enables the design of semantic models, meaningful to the business, for consumption by tools like Tableau. Unlike traditional OLAP semantic modeling tools, a separate copy of the data is not persisted in an instantiated cube. Rather, AtScale embraces OLAP more as a conceptual metaphor. For example, when Tableau interacts with a model created in AtScale on top of Hadoop, the behind-the-scenes VizQL (Tableau’s proprietary query language) is translated in real time to SQL on Hadoop, making the storage of the data in a separate instance unnecessary.
- Alteryx is also a powerful tool for extracting data from Hadoop, manipulating it, then pushing it back into Hadoop for consumption.
Keep security in mind
It is worthy to note that many self-service tools have a server component to their overall architecture that is used to implement governance controls. Both row-level security (RLS) and column-level security (sometimes referred to as perspectives) can be put in place, and implementations of that security can be accomplished many times in more than one way.
Many of these tools can leverage existing group-level permissions and security that exist in your ecosystem today. Work with a consulting services partner or the vendors themselves to understand recommended best practices in configuring the tools you have selected in your environment.
Whether you’re evaluating self-service data tools or looking for ways to shift your organization’s culture toward seeing data as an asset, we can help. Fusion’s team of data, technology, and digital experts can help you architect and implement a comprehensive data strategy, or help you get unstuck with a short call, workshop, or the right resources to reframe the questions at hand.