Valerio Mulas published an interesting analysis about the security of Android-based Smart TVs.
The analysis points out the default configuration of most Android-based TVs, which allows you to enable the ADB, install unsigned applications and theoretically gain full control of the device.
The attack process resembles what a Rubber Ducky  does when plugged into a USB port of a device.
In this attack, dubbed TVoodoo, the only difference is the channel used to carry the attack: Infrared communication.
The very success of the attack leverages on the fact that smart TVs should be considered as an unattended computer with root shell access and your infrared device/remote is the keyboard.
Using the infrared remote control (by means of a sequence of buttons) an attacker is able to:
- Invoke the main menu of the Android TV
- Enable the Developer Mode
- Enable ADB
- Enable Unknown Sources
- Force the TV to connect to the hotspot of the attacker
Then, the attacker can connect to the TV via ADB and install any app, compromizing the system and installing backdoors, remote control and environmental tapping softwares:
One of the permissions granted to the malware involves the microphone: it is possible to listen to any conversation in the room and send the audio content to the botnet master node. Here the attacker could listen to the audio looking for a particular content, such as personal information regarding people, or business information such as agreements, contracts and partners.
But, thinking of having thousands of infected Smart TV’s, the manual approach described above does not scale.
AWS offers a service, named Transcribe, that is capable to convert audio to text and that might automate and solve the “scalability” problem.
Transcribe works in this way: the attacker uploads the files into a S3 Bucket and then he runs transcribe against the audio file targeted, getting back in return the speech-to-text conversion.
At this point, the attacker can easily start to index and classify the various targets. Having such amount of audio converted into text, solves the first part of the scalability approach the attacker is looking for, but still, he has to go through a huge amount of words to find something interesting.
Valerio also published a video of the attack flow:
For more technical details, you can refer to the full paper published on Medium .