The prediction accuracy decreased to 93% when Zoom was used to train the sound classification algorithm, still exceedingly good and a record for that medium.
Such an attack has a significantly adverse impact on the users’ data security since it is capable of exposing users' passwords, conversations, messages, and other sensitive information to nefarious outsiders.
When compared to the other side attacks that need specific circumstances and are susceptible to data rate and distance restrictions, these acoustic attacks are easier to operate because of the popularity of devices that are now equipped with high-end microphones.
This makes sound-based side-channel attacks achievable and far more hazardous than previously thought, especially given the rapid advances in machine learning.
The attack is initiated in order to acquire keystrokes on the victim’s keyboard, since the data is required for the prediction algorithm to work. This can be done via a nearby microphone or by accessing the microphone on the target's phone, which may have been compromised by malware.
Additionally, keystrokes can also be recorded via Zoom call, in which, rogue meeting attendee compares the messages entered by the target with the auditory recording of that person.
The researchers acquired training data by pressing 36 keys on a modern MacBook Pro, 25 times each, further recording the sounds produced on each press.
The spectrogram images were used to train the image classifier "CoAtNet," and it took some trials and errors with the epoch, learning rate, and data splitting parameters to get the best prediction accuracy outcomes.
The same laptop, whose keyboard has been present in all Apple laptops over the past two years, an iPhone 13 mini positioned 17 cm from the target, and Zoom were utilized in the researchers' tests.
The CoatNet classifier gained 95% accuracy in the smartphone recordings and 93% from the content captured via Zoom. Skype, on the other, produced comparatively lower accuracy, i.e. 91.7%.
In order to protect oneself from side-channel attacks, users are advised to try “altering typing styles,” or generating passwords with randomized keys.
Another safety measure includes utilizing software in order to generate keystroke sounds, white noise, or software-based keystroke audio filters.
Moreover, since the attack model proved highly efficient even against a very silent keyboard, installing sound dampeners to mechanical keyboards or shifting to membrane-based keyboards is unlikely to help in any way.
Finally, using password managers to avoid manually entering sensitive information and using biometric authentication whenever possible also serve as mitigating factors.