What is voice squatting?
Voice squatting is an attack vector for voice user interfaces, or VUIs, that exploits homonyms -- words that sound the same, but are spelled differently -- and input errors -- words that are mispronounced. Voice squatting is also known as skill squatting because Amazon refers to third-party apps as skills.
Voice squatting is similar to the text-based typosquatting, a cybersquatting exploit that takes advantage of users typing incorrect website addresses. Attackers purchase these misspelled domains to carry out attacks, often to steal users' personally identifiable information (PII).
The exploit was dubbed voice squatting in 2018 following the release of a paper from researchers from Indiana University, the Chinese Academy of Sciences and the University of Virginia.
Amazon and Google have said they have protections in place to guard against voice squatting.
How does voice squatting work?
Virtual AI assistants, such as Amazon Alexa, Apple Siri, Google Assistant and Microsoft Copilot, use voice keywords to open third-party applications. Voice squatting attackers register fake third-party apps with voice keywords that sound similar to legitimate third-party apps.
For example, if there is a legitimate app called Library, an attacker might create an app and register it as libary, a common mispronunciation. Or an attacker might see there is a legitimate banking app called Goldman Sachs and register the voice keywords goldmine sacks to trick users and the AI into opening the attacker's app instead of the legitimate app.
The attackers' intent is that when a user requests the legitimate app, the AI will open the counterfeit app instead.
Voice squatting is dangerous because apps and skills can run undetected in the background for long periods of time. In addition to recording users without their permission or knowledge, voice squatting could be used to broadcast fake news, conduct phishing scams or prompt users into divulging PII.