Smart speakers records you more often than you think
It is well known that voice assistants aren’t perfect and will start recording event when you don't say their trigger word, but a team of researchers wanted to quantify how often these activations happen and what the devices hear when they do.
The research was made by a team composed by Daniel J. Dubois (Northeastern University), Roman Kolcun (Imperial College London), Anna Maria Mandalari (Imperial College London), Muhammad Talha Paracha (Northeastern University), David Choffnes (Northeastern University), Hamed Haddadi (Imperial College London).
According to the report  , smart speakers accidentally activate between 1.5 and 19 times per day, recording from 6 to 40 seconds of audio each time.
How frequently do devices activate? The average rate of activations per device is between 1.5 and 19 times per day (24 hours) during our experiments. HomePod and Cortana devices activate the most, followed by Echo Dot series 2, Google Home Mini, and Echo Dot series 3.
Are activations long enough to record sensitive audio from the environment? Yes, we have found several cases of long activations. Echo Dot 2nd Generation and Invoke devices have the longest activations (20-43 seconds). For the Homepod and the majority of Echo devices, more than half of the activations last 6 seconds or more.
Researchers tested five types of speakers: Google Home Mini, Apple HomePod, Microsoft’s Harman Kardon Invoke and Amazon Echo Dots (second- and third-generation).
For the experiment, speakers has been forced to binge-listen to several television shows, trying to trigger a false activation event:
...we came up with a much simpler approach: we turn to popular TV shows containing reasonably large amounts of dialogue. Namely, our experiments use 125 hours of Netflix content from a variety of themes/genres, and we repeat the tests multiple times to understand which non-wake words consistently lead to activations and voice recording.
|Gilmore Girls||Comedy, Drama|
|Grey’s Anatomy||Medical drama|
|The L Word||Drama, Romance|
|Dear White People||Comedy, Drama|
|Riverdale||Crime, Drama, Mystery|
|Jane the Virgin||Comedy|
|Friday Night Tykes||Reality TV|
|Big Bang Theory||Comedy, Romance|
|The West Wing||Political Drama|
The report is part of a still-in-progress larger study, which will also look into what happens to all the data these voice-activated assistants collect:
There are several other important open questions that we are in the process of answering, such as:
- How many activations lead to audio recordings being sent to the cloud vs. processed only on the smart speaker?
- Do cloud providers correctly show all cases of audio recording to users?
- Do activations depend on the TV show character’s accent, ethnicity, gender, or other factors?
- Do smart speakers adapt to observed audio and change whether they activate in response to certain words over time?