stokkete -

SharePoint Syntex automatically uncovers document metadata

AI can help organizations automatically uncover metadata and tag documents. Discover how Microsoft's SharePoint Syntex achieves this using machine learning and machine teaching.

Searching through business files in an enterprise content management system is a pain point for many organizations. If the content isn't tagged with the appropriate metadata, it won't turn up in enterprise search results.

AI can assist in the tagging process, which is the purpose of Microsoft's SharePoint Syntex, a product announced at Microsoft Ignite 2020. Syntex is the first product of Project Cortex, a service within Microsoft 365 that uses AI to reason about content and pull meaningful insights from it.

SharePoint Syntex is generally available in SharePoint Online as an add-on for Microsoft 365 plans at $5 per user, per month.

There are numerous AI products on the market to uncover metadata, but what Microsoft is doing is significantly different, said Alan Pelz-Sharpe, founder and principal analyst at Deep Analysis.

"Rather than training on mountains of data, [Microsoft] leverages the skills and experience of the employee to select a very small amount of data to train with," Pelz-Sharpe said.

How SharePoint Syntex works

Syntex is a new add-on feature within SharePoint Online. Syntex's job is to automatically tag content with metadata -- after a user trains the AI. Syntex can be trained using one of the following methods.

  • Machine learning. A business user feeds Syntex comparable sample documents so the AI can look for similarities, surface that information and show the user what it found. The user indicates whether the information is correct or not, and Syntex then uses that model. For example, a user might feed the system 20 contracts that all follow a similar format and one incorrect document. This indicates to Syntex which documents are contracts, and which ones are not.
  • Machine teaching. A business user feeds Syntex a form, telling it what it will find in each field and the format of each -- such as a phone number and an address. The form-processing system in Syntex has a point-and-click interface that enables the user to tell it what data it needs to uncover from that form.

Syntex can also process digital images and perform object recognition, as it has a visual library of more than 10,000 images. It can determine whether an image contains a mountain or car, for example. It can also recognize handwritten text and put that into metadata.

SharePoint Syntex form processing interface
An example of the form processing interface within SharePoint Syntex.

Benefits of SharePoint Syntex

Syntex is a Microsoft feature geared toward providing businesses with all the tools they need to work efficiently in the cloud, said Antonio Maio, director and senior enterprise architect at Protiviti, a global technology consulting firm in California that has been using Project Cortex in private preview.

For example, a large company that produces 200,000 purchase orders a week might want to bring those orders into SharePoint. But is extremely difficult for the business to analyze that amount of content quickly, and to get some meaningful insights -- such as the dollar value of the orders and who the customers are -- due to the volume of content.

After some initial training, Syntex will enable that business to automatically reason over the purchase orders, automatically pulling out details such as the purchase amount on each form. Then the business could create a workflow to automate approvals when purchase orders are over a certain dollar amount.

SharePoint Syntex machine learning interface
An example of the machine learning interface within SharePoint Syntex.

"As a one-stop-shop, it allows you to configure all of that right within the one [Microsoft] platform," Maio said.

A major benefit of Syntex is that regular business users within an organization can train it without needing to get an expert involved, Maio said.

"The interface is easy to use right within SharePoint, where I can either go the machine learning route or the machine teaching route," Maio said, who is also a Microsoft SharePoint MVP.

Creating taxonomy codes and putting meta tags on content have always been extremely beneficial to enterprise search because it enables users to find the right content, said Reda Chouffani, a SharePoint administrator.

Though beneficial, it was manual work that required a lot of upfront effort from SharePoint administrators to create taxonomy codes and agree on what metadata they're going to use, Chouffani said. Administrators also relied on users to make sure they appropriately tagged everything they uploaded to SharePoint.

"The benefit here is likely to be for administrators, taking away the burden of having to constantly tag documents," Chouffani said.

After the initial training, the AI will be able to categorize and tag these documents, as it will learn the format of each.

Another benefit is the amount of time it takes to train Syntex compared to other software on the market. Other software on the market that automatically captures content metadata include GetGuru, Coveo and KMS Lighthouse.

"I don't have to feed it 10,000 documents the way you have to with other machine learning solutions," Maio said. "I can feed it 20 to 30 -- sometimes 50 -- documents to get some meaningful results."

This training time takes approximately two hours, where other software has taken weeks to get meaningful results, Maio said.

Potential challenges

Microsoft needs to ensure its customers are trained to use Syntex for it to function properly, Pelz-Sharpe said.

Buyers and users cannot rely on AI being 'fairy dust' -- they will have roll up their sleeves and do some work to make [Syntex] a success.
Alan Pelz-SharpeFounder and principal analyst, Deep Analysis

"Buyers and users cannot rely on AI being 'fairy dust' -- they will have roll up their sleeves and do some work to make [Syntex] a success," Pelz-Sharpe said. "Poor data and overreliance on the intelligence of AI is the downfall of most AI projects."

To ensure SharePoint Syntex functions properly, organizations need to continuously monitor and guide the AI over time. They also need to monitor the accuracy and relevancy of enterprise search results -- something that should always be under the control of the business itself, Pelz-Sharpe said. Ideally, organizations will have a designated knowledge manager to keep an eye on things.

"Syntex has been designed that way, but there is no guarantee that organizations will follow that guidance," Pelz-Sharpe said.

Dig Deeper on Content collaboration

Business Analytics
Data Management