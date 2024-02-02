AI and machine learning success hinges on the reliability of the underlying data. As these tools and systems become increasingly popular across enterprises, securing trustworthy data becomes all the more necessary.

Data and data sets are like the oil to a well-running AI machine. Without quality data -- and enough of it -- AI devices struggle to properly learn the functions they are expected to perform. But problems such as primary key inconsistency and data duplication mean that data quality is never a given. By devoting time and resources to implementing techniques that ensure trusted data, organizations can build more trustworthy AI tools and systems.

Untrustworthy data in the modern enterprise Blair Kjenner, founder of information management firm Method1 Enterprise Software, spoke about the pitfalls that can lead to untrusted software data at the November 2023 Estes Park Group meeting hosted by Semantic Arts. Kjenner explained that inconsistencies in primary key methods, core data models, and row and column headers lead to an inability to fully integrate data. In turn, the inability to fully integrate data leads to systems proliferation, with system-level functions being duplicated repeatedly rather than reused. Systems proliferation -- or sprawl -- leads to data siloing, Kjenner said. This siloing creates issues such as duplicated records across different systems, which makes it difficult to identify the most current or accurate version. Haphazardly loading data into a data lake, warehouse or large language model can exacerbate the problem, leading to overwhelming levels of complexity and a lack of confidence in shared data. Dave McComb, president of information systems consulting firm Semantic Arts, added that manufacturers of physical goods can achieve economies of scale by doubling production and reducing cost per unit. But in a software system, every line of code must coexist with all the other code in the system, he said. Therefore, adding more code leads to more complexity. Whenever enterprises add a new software subscription or build a new application, they must accommodate more new code, another primary key method and a different data model. Dan DeMers, CEO and co-founder of data collaboration platform Cinchy, calls this tradeoff the "integration tax." With more complexity, overwhelming amounts of data and inconsistent methods of storing it, untrusted data can run rampant in the enterprise.