everythingpossible - Fotolia
The exponential growth of data is forcing many IT shops to find solutions that reduce the amount of time their organization spends collecting, storing, analyzing and presenting information to end users.
Instead of focusing on the transformation of data into actionable insights, many organizations are drowning in their own data. Many organizations are unable to inventory and identify data assets that provide business value, have poor data quality, have issues with dark data or have siloed data.
An increasingly popular solution for many IT shops is to use data catalogs to automate the discovery, analysis, classification and administration of metadata for data assets stored throughout the enterprise. In addition to harvesting and analyzing data, data catalogs increase operational efficiency by providing easy-to-use interfaces that help data consumers easily find the information they need.
In order to turn data into trusted insights to make effective business decisions, you need to collect information about its meaning, business use, source, lineage, context and relationships to other elements. This information is metadata, or data about data. Data specialists explore metadata to identify and better understand the data assets that are available for use. And metadata is what makes data catalogs helpful.
In recent years, there's been a major shift from on-premises deployments to the cloud for organizations' data catalogs. And there are many reasons for enterprises to shift to a cloud data catalog if they haven't already.
Automation in data catalogs
Since competition in the cloud data catalog space is fierce, vendors understand that a constant stream of enhancements to their SaaS products is required to increase their market share. As a result, you will find vendors that offer both cloud and on-premises data catalog implementations will prioritize the release of new features and data store connectors for their SaaS platforms over their on-premises offerings.
Automated discovery features help reduce the amount of dark data and prevent the proliferation of data silos, which prohibit effective information sharing across the enterprise. Automated discovery and metadata generation also reduce the amount of time IT shops spend finding and evaluating the data's value to the business.
Data catalogs improve data quality by enabling analysts to easily evaluate the data's source, accuracy, lineage, completeness, consistency and timeliness. They also facilitate data consumer self-service, which enables all types of IT and line-of-business personnel to use the information.
Catalogs reduce data security and privacy risks by providing better insight into the organization's data stores. It is possible to lock protected data sets so they can't be accessed by those who aren't authorized.
AI in cloud data catalogs
IT shops are constantly searching for products that reduce the amount of time their support professionals spend managing data assets.
A growing number of cloud data catalog competitors offer time-reducing automations, AI and machine learning features that help organizations improve their data identification, metadata management, self-service and information governance capabilities.
Some of the top cloud data catalog vendors include Informatica, Alation, Collibra and Microsoft Azure.
Cloud data catalog product evaluation
To correctly select and implement the most appropriate data catalog for their organizations, IT shops must create and execute a carefully planned and detailed analysis of the competing offerings.
Select the appropriate evaluation team, perform a thorough needs analysis and create a comprehensive set of weighted evaluation metrics. Use the evaluation metrics to create a vendor short list and execute a deep-dive comparison of the remaining vendors.
Develop the evaluation criteria by understanding and prioritizing your needs. How many different types of cloud and on-premises data stores will the product need to access? If you consolidate most of your cloud data on a single vendor's platform, you'll want to identify if your provider offers a data catalog.
Most leading cloud vendors provide data catalog products and customized features to enhance interactions with their internal data stores. What types of automations, machine learning and AI features does it provide? Does it facilitate self-service? How strong are its data governance capabilities?
Use forums such as Gartner Peer Insights and analyst resources such as Gartner's Magic Quadrant and the Forrester Wave, as well as vendor websites, to find the cloud database for you. Be sure to examine the automation options each one offers to help determine your best choice.
Successful data catalog implementation steps
Depending on the size of your organization and the number of locations, volumes and types of data the system will need to access, cloud data catalog projects may require long-term funding and commitment. You will need to have management buy-in for their success.
Determine where you would like to install the data catalog. Where are you storing the majority of your data? If the bulk of your data continues to be in-house, an on-premises implementation may be a viable solution.
The challenge is that many industry-leading data catalog offerings are cloud-based systems. A SaaS solution will also enable you to take advantage of all the traditional benefits a cloud platform provides over its on-premises counterpart.
If you prefer the more popular SaaS implementation, here are a couple areas that you will need to address:
- Security. Cloud data catalogs communicating with in-house data stores will require modifications to firewalls and other security components, such as intrusion detection and prevention systems, as well as network and endpoint security software.
- Network performance. Cloud data catalogs will increase the amount of data you transfer across the network and to the cloud. Can your network and cloud connection handle the increased volume? How expensive will it be to add additional bandwidth?
- Auditing and adherence to compliance frameworks. Does your organization adhere to any internal, industry-specific or governmental regulations? You will need to address how your organization will mitigate the additional risk of allowing a cloud platform to access and transfer information from sensitive data stores. In addition, you will need to determine if the cloud data catalog vendor's SaaS environment will adhere to the framework's control objectives.
Like many complex technologies, data catalogs have a learning curve. Your organization will need to develop a traditional project plan with goals, milestones and assigned action items. You will also need to identify the criteria your organization will use to evaluate success.
Establish standards for metadata quality, data exploration, experimentation and analysis. Data consumers should follow a standardized but flexible process to evaluate data and identify the use cases that will generate the most value to the business.