Data management best practices key to generative AI success

Generative AI can create data, empower decision-makers and innovate competitive advantages. Organizations need strong data management practices to use it effectively.

Stephen Catanzano, Senior Analyst
Enterprise Strategy Group
We provide market insights, research and advisory, and technical validations for tech buyers.

Published: 03 Aug 2023

Generative AI can create new data, which can be used for various purposes such as generating text, images and music. However, to effectively use generative AI, businesses must have a good understanding of data management best practices related to data collection, cleansing, labeling, security and governance:

Data collection. Train generative AI models on large amounts of data. This data can come from a variety of sources, such as internal systems, social media, sensors and IoT devices. Businesses must collect and store this data in a way that is easily accessible and manageable using databases, data warehouses and data lake technologies. When collecting data for generative AI, it is essential to consider a few key factors, including the needed data type. For example, businesses that want to generate text need a text data set. Also consider the size and diversity of the data set. Larger data sets generally produce better results. A diverse data set can help prevent bias in the generative AI model.
Data cleansing. The data used to train generative AI models must be clean and consistent. This means that any errors or inconsistencies in the data must be corrected. Businesses can use data-cleansing tools to identify and correct errors in data, along with data observability tools.
Data labeling. The data must be labeled for generative AI to train on it. This means that the data is tagged with the correct information, such as the type of data, the source of the data and the context of the data. Businesses can use data-labeling tools to label data.
Data security. Generative AI models can create sensitive data, such as personal or financial data. Businesses must ensure that this data is stored securely and that its access is restricted to authorized users.
Data governance. Businesses need a data governance framework to ensure that data management is consistent and secure. This framework should include policies and procedures for collecting, storing and using data.

These are some data management tools to consider when trying to implement or embed generative AI practices in an organization. In a recent study conducted by TechTarget's Enterprise Strategy Group, participants were asked what their organization's most important expected outcome was from implementing data preparation tools or services. Thirty percent expected increased performance, 26% expected increased data insights, 21% expected increased data-driven end-user satisfaction and 12% expected a risk reduction.

These results align with the need to build a resilient data management practice in order to have effective results from generative AI practices. Businesses can use generative AI to create new data. This data has various uses, including customer service improvements, marketing content generation and new product creation.

In addition to these data management practices, businesses also should consider the following ethical considerations when using generative AI:

Data privacy. Generative AI models can create data that is similar to real data. Protecting the confidentiality of the data used to train generative AI models is essential. Businesses must ensure that the data is anonymized or pseudonymized before it is used to train generative AI models.
Data bias. Generative AI models can be biased if trained on biased data. It is important to use a diverse data set to train generative AI models. Businesses should be aware of the potential for bias in their data and take steps to mitigate it.
Data security. Generative AI models can create sensitive data. It is crucial to protect the security of this data, so businesses must use encryption and other security measures to protect it.

By following these data management practices and ethical considerations, businesses can ensure that they use generative AI responsibly and ethically.

In the same Enterprise Strategy Group survey, organizations reported that they are overwhelmingly looking to implement AI practices, including generative AI, to create competitive advantages. When participants were asked how they assess the impact of data analysis on their organization for decision-making or as a competitive advantage, 33% responded "exceptionally positive," with another 66% responding "positive."

There are many reasons why organizations want to develop generative AI practices and why technology vendors are embedding AI and generative AI into their products. I'm a big believer that responsible AI can result in more resilient data management practices with the ability to effectively extract and utilize the value of data from any source.

This can empower decision-makers with trusted and governed data to operate at the speed of the business. Operational efficiencies, cost savings, faster innovations, competitive advantages and data-empowered business and consumer decision-making are some of the many benefits organizations can achieve. However, the data management layer must be correct to get there.

Enterprise Strategy Group is a division of TechTarget. Its analysts have business relationships with technology vendors.

Next Steps

Is Apache Iceberg worth a full migration?

Data management best practices key to generative AI success

Generative AI can create data, empower decision-makers and innovate competitive advantages. Organizations need strong data management practices to use it effectively.

Next Steps

Dig Deeper on Data integration

AI vs. machine learning vs. deep learning: Key differences

Can AI write code? A developer experiments in two languages

Explore real-world use cases for multimodal generative AI

Data quality in AI: 9 common issues and best practices

Next Steps

Related Resources

Dig Deeper on Data integration

AI vs. machine learning vs. deep learning: Key differences

Can AI write code? A developer experiments in two languages

Explore real-world use cases for multimodal generative AI

Data quality in AI: 9 common issues and best practices