Voice squatting is an attack vector for voice user interfaces (VUIs) that exploits homonyms (words that sound the same but are spelled differently) and input errors (words that are mispronounced). The attack vector is similar to a text-based exploit called typo squatting.
Virtual assistants like Amazon Echo's Alexa use voice keywords to open third-party applications. An attacker who is voice squatting will register a bogus third party app with a voice keyword that sounds similar to a legitimate third-party app. The hope is that when an end user requests the legitimate app, Alexa will open the counterfeit app instead. For example, if there is a legitimate app called library, an attacker may create a listening app and register it with Amazon under the voice keyword libary, which is a common mispronunciation of the word. Or an attacker may see there is a genuine banking app called Goldman Sachs and register the voice keywords goldmine sacks to try and trick Alexa into opening the attacker's app instead of the legitimate banking app.
Voice squatting is also known as skill squatting because Amazon refers to third-party apps as skills. Voice squatting is dangerous because skills can run in the background for long periods of time undetected. In addition to recording users without their permission or knowledge, voice squatting could be used to broadcast fake news or prompt users to divulge personally identifiable information (PII).