My advisor Dan Boneh, colleague Gabi Nakibly and I have recently published a paper “Gyrophone: Recognizing Speech from Gyroscope Signals”.
It was presented at the 23rd USENIX Security conference in San Diego, and at
BlackHat Europe 2014 in Amsterdam.
To get a quick idea of what this research is about the following video should do:
We show that the MEMS gyroscopes found on modern smart phones are sufficiently sensitive to measure acoustic signals in the vicinity of the phone. The resulting signals contain only very low-frequency information (< 200 Hz). Nevertheless we show, using signal processing and machine learning, that this information is sufficient to identify speaker information and even parse speech. Since iOS and Android require no special permissions to access the gyro, our results show that apps and active web content that cannot access the microphone can nevertheless eavesdrop on speech in the vicinity of the phone.
This research attracted quite a bit of media attention and the first one to be published was an article in Wired.com: The Gyroscopes in Your Phone Could Let Apps Eavesdrop on Conversations. They interviewed us directly, and that article is probably the most technically accurate (Engadget and many more others followed and cited this original article).
We’ve been also addressed with some questions by a French journalist, which I answered quite in detail in an email. So to clarify certain points regarding this work I’m pasting the Q&A here:
Q: What is the best results that you get in term of recovering sound? What are the limit of your work until now? What sort of sounds couldn’t you recover? Could you recover a complete human conversation, for example ?
A: To be precise we currently do not recover the original sound in a way that it will be understandable to a human. We rather try to tell what was the original word (in our case digits) based on the gyroscope measurements. The fact that the recording is not comprehensible to a human ear doesn’t mean a machine cannot understand it, and that’s exactly what we do.
We managed to reach a recognition accuracy of 65% for a specific speaker, for a set of 11 different words, with a single mobile device, and 77% combining measurements from two devices. While that is far from full speech recognition the important point is that we can still identify potentially sensitive information in a conversation this way.
We also outline a direction for potential reconstruction of the original sound using multiple phones, but that requires further research, and we don’t claim yet whether it is possible or not with smartphone devices. Another, not less important result is the ability to identify the gender of the speaker, or to identify a particular speaker among a group of possible users of the mobile device.
Q: Why has nobody worked on and proposed this approach before? Is it because the technical tools (algorithm) weren’t available? I mean, what is really the most performance? What was the most difficult? The algorithm? What are the advantages in comparison to traditional microphones?
A: I’m not completely sure nobody hasn’t but definitely no prior work demonstrated the capabilities to this extent. The fact itself, that gyroscopes are susceptible to acoustic noise, was known. Manufacturers were aware of it but they didn’t look at it from a security point of view, but rather as an effect that might just add noise to the gyro measurements. We think there hasn’t been enough awareness regarding the possibility of sensing speech frequencies, and the security implications of it. In particular in smartphones, access to the gyro doesn’t require any permission from the user, which makes it a good candidate for leaking information, and a such, an interesting problem to look into. That is also the advantage compare to the regular microphone.
The hardest part apart from the idea itself was to adapt speech processing algorithms to work with the gyroscope signals and obtain results despite of the low sampling frequency and noise.
Q: What are the applications for Gyrophone that you imagine for the future? Spying?
A: The application can definitely be eavesdropping on specific words in a conversation, or knowing who is near the mobile device at a certain moment.
Q: What are the next steps in your work? I mean what is your next work now to progress in this direction? Do you plan to publish soon with new results?
A: The next steps in this direction would be to study better what are the limits of this attack: What physical range is possible? Can the recognition accuracy be improved?
Is there a way to synchronize two or more mobile devices to potentially recover sound?Currently we don’t plan to publish new results for this attack but rather exploring more ways to leak sensitive information from mobile phones by unexpected means.
Q: Do you imagine that in the future such system could be used by everybody? To do what?
A: It is not that easy to make such a system work for practical attacks on a large scale, although more research effort in this direction might yield more surprising results.
Q: How could we avoid this spying risk?
A: The attack is not so hard to prevent either by limiting the sampling rate available to applications, requiring specific permissions, or filtering the high frequencies.
Our hope is that the general issue of side-channel attacks will be addressed by mobile device manufacturers in way that will make it impossible.
The project page http://crypto.stanford.edu/gyrophone provides access to our code and dataset, as well as a link to the published paper.