Rymden - stock.adobe.com
Times of crisis such as the Russian attack on Ukraine and the COVID-19 pandemic demonstrate the importance of metadata as a means of telling truth from fiction.
History is rife with examples of disinformation being used to sway people's opinions, from the Nazis' use of propaganda during their reign in Germany through the United States' domestic war on drugs and foreign interference in recent U.S. presidential elections.
Social media platforms like Facebook and Twitter, where anyone can post media and claim its veracity -- and others can subsequently share the information without verifying it -- combined with technologies like Photoshop that enable anyone with a computer to doctor images have only exacerbated the difficulty of discerning what may be real and what may not.
Therefore, it is difficult to determine the legitimacy of photographs, videos and articles coming out of Ukraine, news related to COVID-19 or articles on American politics about which cries of "fake news" are heard daily.
Defined as data about data, metadata is not the sole answer to the problem. But the importance of metadata is the addition of context to data, and in that sense, it can be a powerful part of determining the integrity of information.
"It's very similar to the data world," said Stijn Christiaens, co-founder and chief data citizen at data management vendor Collibra. "You have to look at a lot of factors -- where is the data coming from, is the data qualitative, who is the source of the data, who's sharing it? These are things you can do to check if a report is trustworthy and you can apply in this context as well."
Similarly, Satyen Sangani, co-founder and CEO of Alation, another data management vendor, pointed out that the importance of metadata is providing context, and that can be applied to data in the enterprise as well as information people consume from news organizations and on social media.
"For any given bit of information, you need context to determine whether it's appropriate or correct for use," he said. "You're more likely to believe an article if it's in The New York Times, just as you're more likely to believe something that comes from a manager, a CEO or someone else you trust as an expert.
"Having the context gives you the ability to know whether or not you can trust a bit of information," Sangani added.
Getting metadata for images and articles, however, is not as simple and straightforward as it is in the enterprise.
Metadata in the enterprise
Enterprise metadata is broad and has many applications.
Some of its most basic elements are data management attributes such as table names, schemas, columns and database names.
Once users create files, metadata includes file attributes such as the name of the author, the date it was created, any dates that modifications were made and the file size. Reports, dashboards and data models carry additional usage metadata such as number of views and shares.
Metadata also includes data governance, data privacy policies, and even data about an organization's social media interactions and online support portals and chat windows.
All that metadata then comes together to contextualize what is happening within the enterprise, according to Jitesh Ghai, executive vice president and chief product officer at Informatica, a specialist in metadata management. Metadata can play a similar role by helping news and social media consumers contextualize what's going on in the larger world, such as in the Russia-Ukraine war.
"Across these dimensions of data -- technical, business, operational, usage, social -- [metadata is] able to provide a complete view of an enterprise's various data," Ghai said.
With metadata, enterprises can catalog data for future use too.
Metadata informs an organization's data consumers -- data scientists and analysts, as well as business users -- about the lineage of the data they're using to inform their reports, dashboards and models. They can see who has used the data, whether it's been updated to reflect the most current conditions, and if and how it's been manipulated.
Metadata also informs data quality and can be used to establish trust scores so that data consumers can know whether a data set is reliable enough to use in building a report, dashboard or model.
"Metadata is what we capture, and it's what we leverage to organize and provide a simple searchlike experience for data engineers, data scientists and nontechnical business users to search, browse and discover data wherever it resides, whatever its structure," Ghai said.
Essentially, metadata informs the development of data assets, according to Christiaens.
"In the data world, data architects and modelers will use it to design databases and build their applications, to move their data around," he said. "We use metadata to make data trustworthy itself."
But the importance of metadata goes beyond enabling enterprises to measure data quality and use it to build the assets that inform data-driven decisions.
Just as metadata is used to give context to enterprise data and help measure its quality -- its truthfulness -- it can give context to images and accounts during times of crisis and be used as a means to help verify their truthfulness.
It's a critical source for news organizations attempting to report accurately on worldwide events.
Digital images have metadata. Every digital photo and video carries with it such information as the exact time it was taken, the GPS location where it was shot and the device used to take it. When using photos and videos coming out of Ukraine, for example, news organizations doing due diligence can use that metadata to verify images from journalists on the ground before publishing the images.
But not all photos and videos are published by news organizations, and not all organizations purporting to report the news are legitimate.
Satyen SanganiCo-founder and CEO, Alation
Many outlets or individuals have an agenda, and use disinformation to deliberately mislead people and further that agenda. By posting photos and videos that look legitimate, they play to people's emotions in an attempt to convince others to adopt their own ideology as fact.
However, by combining critical thinking and metadata, people can keep from falling prey to that type of disinformation, according to Christiaens.
"We've all been susceptible to misinformation, whether it's tweets you're seeing or some other source," he said. "You're getting a message or picture, and it's evoking a certain emotion -- anger, sadness -- and then the emotions can lead us to forget our critical thinking. You're immediately jumping into action and maybe sharing that information forward rather than first doing a little due diligence."
It's unlikely, though, that most consumers of news will be able to dig into an image's metadata to discern when and where a photo or video was shot and who shot it.
CNN enables people to save images from its website to their desktop and view metadata such as when the image was taken and the device used to take it. But The New York Times and The Washington Post don't enable photos to be saved or metadata to be viewed beyond what's in the photo caption. Neither do Fox News and MSNBC.
However, by right-clicking on any photo -- without first saving the image to a desktop -- people can select Inspect and view the code underlying the image, and those who know code may be able to discern a bit of metadata about the image. But for the most part, particularly when images and articles are shared on social media platforms like Facebook and Twitter, due diligence entails gathering bits of information and piecing them together rather than getting to an image's attributes.
Those bits of gathered information are metadata in themselves and provide context.
"You have to read around the margins," Sangani said.
Metadata around the margins
When investigating the veracity of worldwide events, the importance of metadata is that it informs critical thinking.
Just as in the enterprise, the source of an article or image is one critical piece of metadata. Whether it's the mainstream news, a social media post or something forwarded by a friend, that source information informs trust.
Most people are likely to believe in the veracity of an image or article posted by a mainstream news organization, though there are certainly some who have their doubts. Corroboration can therefore be key metadata. If CNN, Fox and MSNBC all report the same thing -- with perhaps different editorial spins -- news consumers can usually determine that the general facts are believable.
The source of a social media post carries with it important metadata as well.
"You're looking for people who are renowned," Sangani said. "You're looking for some element of social proof, like who else is linking to this information -- are they experts you might trust or are they disreputable websites that may not be trustworthy -- and who are they linking to or citing as well?"
If the poster is known to the consumer, that relationship carries with it contextual information. A friend may have significant political leanings and all their posts may be toward carrying out an agenda. The same goes for a celebrity or politician.
If the poster is unknown, a quick examination of their profile can be informative. If their handle is the name of a famous person slightly altered -- for example, a zero in their name instead of the letter O -- that could be a sign that something is amiss. If all their posts are about one subject, that could be a tell that they have an agenda and shouldn't be believed.
Beyond the source, reactions to posts can be informative and are another form of metadata. The ratio of comments to likes is telling. If there are more comments than likes, it could be a signal that the post is controversial.
Meanwhile, some social media websites now provide small bits of metadata. For example, WhatsApp is attempting to combat disinformation by enabling users to see how many times a message has been forwarded.
"Just with these small pieces of metadata, you can still know a lot," Christiaens said. "You can infer, and if you have enough metadata over time, it turns into data itself. You can actually tell something with the metadata."
The role of data science
While most news consumers are on their own when it comes to contextualizing the images and articles they see, the most technologically savvy can go further.
Though orchestrated disinformation campaigns originate with humans, much of the disinformation the campaigns disseminate is automatically generated by bots and computers before it is shared on social media platforms and other mediums. And trained data scientists and citizen data scientists can write algorithms and train models to spot autogenerated images and articles.
Just as teachers can check students' work for plagiarism by using a computer program, programs can discern whether news is legitimate or not. In fact, programs like this already are available.
Computer- or bot-generated images and articles have certain metadata characteristics that reveal they've been autogenerated.
They're often difficult to spot -- one small anomaly amid everything else that seems legitimate. But just as data scientists can build and train models using augmented intelligence and machine learning to detect anomalies in enterprise data, they can build and train models not only to spot news fakes, but also to get better at it the more the models are used.
"We're looking at various attributes to identify that this behavior is unusual," Ghai said. "It's metadata that serves as clues, and in aggregate you find multiple dimensions for certain media, and train AI and machine learning models to identify what is deemed suspect. That's ultimately an immense value creator in finding the signal in the sea of noise."
Metadata provides breadcrumbs -- clues to follow when determining truthfulness, he continued.
There will always be small clues that reveal whether something has been autogenerated, whether it's an image, text or a social media post. Those clues, when aggregated, can lead to a high degree of confidence in something's integrity.
And as researchers build up a library of images or articles that have been doctored or autogenerated -- a data set -- that library can be used to train a machine learning model, according to Ghai.
"The beauty of the digital world we live in is that everything turns into information theory. It turns into 1s and 0s. It turns into math," he said. "That's the magic of the digital world versus the analog world. And if it turns into information, that information can be interpreted."
Ultimately, deciding whether an image or article about Russia's attack on Ukraine or the ongoing COVID-19 pandemic -- or anything else -- is believable is up to the consumer.
The importance of metadata is to add context. It can serve as a guide, but even in the enterprise, it's not 100% accurate and doesn't lead to 100% certainty. It can even be manipulated by someone with nefarious intent.
Therefore, deductive reasoning informed by metadata on the margins is key.
"If humans discard and forget about their critical thinking skills and don't ask the right questions, then metadata is mostly useless," Christiaens said. "In the end, the human is an important factor in trust."
According to Ghai, metadata is akin to using a GPS navigation system.
"Metadata is the source of truth, especially in various crises where there is a flurry of information where there is more disparate data than ever before," he said. "If you're driving on a digital highway, metadata is the GPS you need to ensure you're headed in the right direction and you get to a trusted destination."
Even GPS, however, sometimes takes a driver the wrong way down a one-way street, or doesn't know there's construction in a certain location and traffic is being diverted. When following GPS, there is a human element -- just as there's still a human element at the end of the data-driven decision-making process.
Metadata can lay out a series of facts and get a consumer to the point of making a decision about the authenticity of news. But then, no matter what direction the data may seem to lead, there's still an interpretation to be made at the end.
"It ultimately does come down to your literacy," Sangani said. "We talk a lot about data literacy [in the enterprise]. It's your ability to both statistically discern information and also, through logic, deduct whether you think something is actually true."