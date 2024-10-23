Have you always wanted to be a teacher? Your dream might be coming true -- but not necessarily how you hoped.

That latest photo you posted on Instagram might be used to train an AI model or used in an AI-generated image. Your resume data on LinkedIn might be fed to an AI model. Your face might even appear in an ad if you use a feature on Snapchat.

AI companies rely on the internet to train their models because of the massive amounts of data they need. Not only are there vast amounts of data on the internet -- including social media sites -- it's also free.

Whether you want to train AI with your data or not, you have options.

How does AI scrape data from social media? AI training models consume data faster than humans can produce it. They scrape the internet for information to learn how to respond to questions. AI chatbots -- such as ChatGPT -- use the information they pull from the web to formulate answers to questions. Companies also use social media data to find language data to help large language models understand how people converse and the latest trends. "AI models rely on unstructured data from social media, including text, images and videos. Through techniques such as natural language processing and computer vision, AI attempts to understand and categorize this data," said Matt Hasan, CEO of AIResults Inc., an AI-powered marketing and CRM company. "But social media is chaotic, spanning multiple languages and contexts, which makes it difficult for AI to learn accurately. AI can easily misinterpret what it sees." Matt Hasan Matt Hasan Companies also use AI to capture people's posts on social media for targeted ads. They use AI to analyze their posts, likes and actions to learn more about you. They want to reach you, learn more about you, and use AI to figure out what appeals to you, said Rogers Jeffrey Leo John, co-founder and CTO of DataChat, a generative AI platform for instant analytics.

Why you should opt out of AI training on social media Transparency and disclosure are important. If you don't explicitly understand it, you are better opting out, said Kamal Ahluwalia, president of Ikigai Labs, a generative AI data platform. Kamal Ahluwalia Kamal Ahluwalia There are several reasons you should consider taking steps to prevent your information from being used to train AI models, including the following: No control over how your information -- including images or private information -- is used.

Plagiarism issues on your thoughts and text posts.

Spreading of false information -- including misinformation and disinformation.

Lack of privacy. "Once a model is trained on your data, there's no way to make it 'unlearn' or erase it, making it safer to exclude such data from AI training to protect privacy," John said.

Issues with training AI Social media sites might not provide the highest quality data for AI model performance. For reliable and accurate data output, companies need high-quality and diverse data. Using social media data might result in biased information, human slang, jargon, harmful content and disinformation. The quality of data also varies across platforms. LinkedIn tends to have higher-quality career posts, while Reddit might have more diverse perspectives. By training models on this information, there is a need to identify incorrect misinformation and disinformation that might be purposely trying to spread harmful information to the public. This could become a safety hazard. Rogers Jeffrey Leo John Rogers Jeffrey Leo John John said companies need to filter data because it is often biased and misinformative. Social media also holds vast amounts of private data -- such as birthdate, relationship status, and contact and employment information -- which has been exploited by malicious actors. When reviewing products or company data, people tend to share negative experiences more freely. There is a good chance there will be more negative than positive commentary, even though more people have positive experiences. "Negativity seems to percolate these days a lot faster," Ahluwalia said. These negative experiences about products and services can give an inaccurate representation of a product launch when performing a sentiment analysis. Ahluwalia also said there's a lot of noise in social media content between the data generated by people and data being generated by machines. It's neither good nor bad, models and developers don't know how to remove it. "It's genuinely a lot of garbage in, and it's hard to take that garbage out," Ahluwalia said.

Is it ethical for AI to use social media information without permission? Privacy is a major concern. Opting out of AI training on social media isn't straightforward, as most platforms include your data by default, Hasan said. Users are often unaware that their data is being used to build and train AI models, which is a fairness issue. And often, platforms profit from user data without compensating individuals. Ahluwalia said he thinks anyone training their models or using people's data should get permission. He mentioned EU AI regulations make it very clear that companies obtain consent from users before using data for AI training, and the specific purpose of training is communicated.