Ng: Biggest benefit of AI may be unlocking unstructured data
Tech entrepreneur Andrew Ng says that in addition to autonomous action, one of the most beneficial applications of AI is enabling easy access to valuable unstructured data.
Amid the hype about autonomous agents making business decisions and performing previously time-consuming tasks in seconds, perhaps the biggest benefit of AI may be its ability to unlock unstructured data.
That's according to tech entrepreneur Andrew Ng, founder of DeepLearning.AI and executive chairman of LandingAI -- among other roles -- who spoke on Nov. 4 during a panel discussion at Snowflake Build, a virtual conference for developers hosted by data management vendor Snowflake. Additional panelists included Snowflake CEO Sridhar Ramswamy and AWS vice president of agentic AI Swami Sivasubramanian.
For years, unstructured data has been the Holy Grail of analytics.
Enterprises have historically used structured data such as financial records and sales statistics to inform decisions. But structured data represents a small percentage of an organization's overall data. The majority -- potentially up to 90% -- is unstructured data such as text in PDFs and emails, audio files from customer service interactions and images.
Using only structured data, enterprises have been deriving insights and basing key strategic initiatives on a partial understanding of their operations rather than the whole. Unstructured data, in conjunction with structured data, provides the complete view that has been missing, and AI is the means by which enterprises can easily access that unstructured data.
"We've spent the last 20 years architecting structured data," Ng said. "I think a big unlock will be unstructured data, which AI can finally make sense of."
Opening unstructured data
Many data management and analytics platforms historically focused on only structured data. Some still mainly support only structured data. Others, however, are using generative AI and agentic AI to make it easy to operationalize and analyze unstructured data as well.
We've spent the last 20 years architecting structured data. I think a big unlock will be unstructured data, which AI can finally make sense of.
Andrew NgFounder, DeepLearning.AI
For example, analytics platforms such as Microsoft Power BI, Tableau and Qlik now enable users to query and analyze unstructured data. Similarly, data management platforms from tech giants AWS, Google Cloud and Microsoft all support unstructured data, as do those from more specialized vendors such as Databricks, Snowflake and Informatica.
To make unstructured data accessible with databases, data lakes and data lakehouses, it has to be given some form of structure, such as a vector -- a numerical representation of data -- that makes it discoverable.
While it is complex to develop algorithms that assign vectors to unstructured data, then labor-intensive to build data and AI pipelines that include vector databases, agents and other AI applications drastically simplify accessing and operationalizing unstructured data.
They can take care of the data preparation that makes unstructured data accessible. In addition, through natural language interfaces and text-to-code translation capabilities, AI can make it simple for developers to build applications that include unstructured data and for analysts to query their organization's entire data estate.
While audio from customer service interactions that can be analyzed for sentiment and emails that can be scanned to similarly derive insights are valuable, the information previously locked in PDFs -- and now reachable -- is perhaps most valuable to enterprises, according to Ng.
"I think the single most valuable form of unstructured data that is sitting in all of our businesses is PDF files," Ng said. "There are so many of them. I'm really excited about the work that's being done … on agentic document extraction to take massive PDF files in finance and healthcare that have tons of value and extract it out. The number of use cases is skyrocketing."
Before agentic AI, finding relevant data in a large PDF file was essentially manual, according to Ramswamy, who moderated the panel discussion.
Complex configurations could extract valuable information. But often, it was left to the individual searching for data to find it themselves.
"I ask people, 'Do you know what my most powerful PDF search engine was for much of my life? Command F,'" Ramaswamy said, referencing the keyboard shortcut to find words or phrases within documents, web pages and applications.
Development strategy
While AI tools make it easy to derive value from unstructured data, developing agents that make unstructured data accessible -- or agents that carry out any task -- remains complicated.
It begins with data, according to Sivasubramanian. As developer tools and foundation models improve, the pace at which enterprises can develop cutting-edge applications is increasing exponentially, he noted. But without a strong data foundation, agents, chatbots and other AI tools won't benefit the business.
"Shipping net new features is really coming down to not just building proofs of concept but prioritizing what matters to your business and then also focusing on getting your data architecture right and building all these AI solutions," Sivasubramanian said. "Then you are setting yourselves up for the future."
Meanwhile, the right capabilities for developing agents and other AI tools depend as much on an enterprise's industry and business model within that industry as the capabilities themselves, Sivasubramanian continued. For example, one developer framework or large language model may be better suited for enterprises in manufacturing than retail, and vice versa.
"To me, the best model is your business model," Sivasubramanian said. "Know what it is that is going to drive value for your customers and for your business and work backwards toward it."
Regarding the cost of AI development, which can be prohibitive for some organizations, open source models are improving and are an increasingly appealing option that can reduce "choke points to innovation," Ng said. In addition, open source capabilities can help enterprises avoid getting locked into a vendor's technology that may unintentionally isolate data, he added.
Even agents themselves, when tasked with calling on models to carry out their responsibilities, can now pick the optimal model in the background so they don't spend prohibitively, Sivasubramanian noted.
Ultimately, the key to developing AI tools that serve the needs of customers and employees should begin with knowing what is wanted, according to Ng. Only then should developers begin to look for the appropriate capabilities.
"The biggest challenge is identifying and building a product that customers love, not cost," he said. "Just work to build that good product. Sometimes the costs start climbing and look scary, but at that time, you can find engineering methods to bring the costs back down. … The costs usually come down fast enough, so it's a wonderful problem to have."
Eric Avidon is a senior news writer for Informa TechTarget and a journalist with more than 25 years of experience. He covers analytics and data management.