Gyrophone Attack: Q&A part 2

A journalist from the Danish Consumer Council has contacted me recently regarding our Gyrophone attack on mobile devices (published in 2014). I’m posting the questions and my answers to them. Note that I haven’t been keeping up with the most recent developments around gyroscope access on mobile device, so I encourage you to verify
the state of matters nowadays.

Q: Can Android phones with gyroscopes still read up to 200 hertz, being within the spectrum of the human voice?
A: There’s no one Android phone. Android is an operating systems with many versions currently being used by users. The access depends not only on the operating system but on the hardware capabilities as well. As far as I know gyroscope measurements are still accessible without special permissions to applications.
I’m not aware of protective measures taken to limit the available sampling rate by software or hardware vendors. However, as we already noted in the paper, the Chrome browser, and other browsers based on the WebKit framework, impose a software limitation on the sampling rate available from JavaScript, bringing it down to 25Hz. So at least Chrome protects its users from malicious access to gyroscopes.
What’s important to note, is that most of the human speech is beyond the sampling frequency, and the access to it is due to an effect named “aliasing”. Low-pass filtering can mitigate the attack. It seems for instance that Samsung Galaxy use hardware that applies certain low-pass filtering (I’m not sure on what frequency), and our phrase recognition attack did not perform as well on those phones as it did on Nexus 4 devices. As we did not test it on many different models, it remains to be studied how well it can perform on them.

Q: In 2014 the technique was not refined enough to pick up more than a fraction of the words spoken near a phone. What about now? Have gyroscopes in smartphones evolved enough to pick up entire sentences and conversations? From how far away can the gyroscopes pick up conversations?
A: The statement “the technique was not refined enough to pick up more than a fraction of the words” is inaccurate. In the experiments we did we purposely trained the algorithms on small sets of phrases to recognize. Since we did not conduct experiments with larger dictionaries, it remains to be studied how well it can perform for larger sets of phrases.
The task (as many other machine learning tasks) definitely becomes harder as the dictionary grows. Our aim was to show a proof of concept, but the full potential of the attack can only be understood by conducting more experiments.
The distance on which this can work depends on the loudness of the signal. If the signal is coming from a further source but is loud enough, perhaps the attack can work. In our experimental setup, the source of the sound was pretty close to the device and was fairly loud. What can amplify the attack is a reverberant surface that responds to sound waves and conducts them well.
To calm things down, I believe that this attack won’t work well when the speakers are several meters away from the device with most gyroscopes. However, this is a very general claim, and it all depends on the particular hardware model and characteristics.

Q: Does the user of an Android phone need to give permission for recording before gyroscopes pick up sound and words? Does Google require the user to give permission for recording?
A: Android phones notify the user when an application requires access to the microphone. However (and that’s the point of our work) it doesn’t notify the user when an application accesses to the gyroscope, which is what enables stealthy eavesdropping. I’m not sure what you mean by Google, since that’s a property of the Android operating system (which is mostly maintained by them), and it’s important to not confuse the two. There are many Android distributions that come without pre-installed Google applications, and Google doesn’t have access to the data on those phone. It’s important to be precise about it and phrase it accurately. Since the Android OS is mostly maintained by Google, it might be a natural expectation that they would address such issues, however, since it is an open source system, technically inclined users can compile their own version of it, and mitigate the attack.

Q: Can the user choose not to be recorded (through gyroscopes)?
A: As far as I know, the user of a standard Android distribution doesn’t have an option to block access to the gyroscopes.

Q: What do Google do with the sound data they pick up via gyroscopes?
A: I don’t have any evidence that Google collects such data and does anything with it. And since Google anyway has access by default to Android devices, we need to worry about the general lesson from this attack, rather than about Google in particular. More important, is that any Android application can collect gyroscope data. So more than worrying about Google (that has reputation to maintain), we should be worried about malicious third-parties that have the same access to our data.

Q: Do you know if it is stored someplace or Google use it through voice recognition?
A: Again, there’s no evidence that Google records gyroscope data, stores it or uses it anywhere else except for on the phone itself. The point is that our work shows new implications of having potential to access this data.

Glass Goggles

So I was naive to purchase Google Glass, hoping that everyone around would want one soon enough. In the meantime I wanted to try to develop something for it, something somewhat useful.  I thought it would be cool if I could walk around, look at something and Google Glass would tell me what is it I’m looking at.
Since Google Goggles (the Android app) service doesn’t have an open API decided to use Google’s “search by image” service.

The result is a Glass application that captures a photo using the Camera, uploads it to a temporary storage such that the URL is accessible to Google’s search by image. It then parses the retrieved results and annotates the image with the leading guess provided by Google. Here are two examples, one is for a poster of Daft Punk DJ’ing, and the other is a poster of Hokusai’s “Great Wave” painting (with Mount Fuji in the background of course).

daftpunk hokusai

 

Gyrophone: Recognizing Speech from Gyroscope Signals

My advisor Dan Boneh, colleague Gabi Nakibly and I have recently published a paper “Gyrophone: Recognizing Speech from Gyroscope Signals”.
It was presented at the 23rd USENIX Security conference in San Diego, and at
BlackHat Europe 2014 in Amsterdam.

To get a quick idea of what this research is about the following video should do:

We show that the MEMS gyroscopes found on modern smart phones are sufficiently sensitive to measure acoustic signals in the vicinity of the phone. The resulting signals contain only very low-frequency information (< 200 Hz). Nevertheless we show, using signal processing and machine learning, that this information is sufficient to identify speaker information and even parse speech. Since iOS and Android require no special permissions to access the gyro, our results show that apps and active web content that cannot access the microphone can nevertheless eavesdrop on speech in the vicinity of the phone.
This research attracted quite a bit of media attention and the first one to be published was an article in Wired.comThe Gyroscopes in Your Phone Could Let Apps Eavesdrop on Conversations. They interviewed us directly, and that article is probably the most technically accurate (Engadget and many more others followed and cited this original article).

Here’s our BlackHat Europe talk that explains this work in more detail

We’ve been also addressed with some questions by a French journalist, which I answered quite in detail in an email. So to clarify certain points regarding this work I’m pasting the Q&A here:

Q: What is the best results that you get in term of recovering sound? What are the limit of your work until now? What sort of sounds couldn’t you recover? Could you recover a complete human conversation, for example ?
A: To be precise we currently do not recover the original sound in a way that it will be understandable to a human. We rather try to tell what was the original word (in our case digits) based on the gyroscope measurements. The fact that the recording is not comprehensible to a human ear doesn’t mean a machine cannot understand it, and that’s exactly what we do.
We managed to reach a recognition accuracy of 65% for a specific speaker, for a set of 11 different words, with a single mobile device, and 77% combining measurements from two devices. While that is far from full speech recognition the important point is that we can still identify potentially sensitive information in a conversation this way.
We also outline a direction for potential reconstruction of the original sound using multiple phones, but that requires further research, and we don’t claim yet whether it is possible or not with smartphone devices. Another, not less important result is the ability to identify the gender of the speaker, or to identify a particular speaker among a group of possible users of the mobile device.

Q: Why has nobody worked on and proposed this approach before? Is it because the technical tools (algorithm) weren’t available? I mean, what is really the most performance? What was the most difficult? The algorithm? What are the advantages in comparison to traditional microphones?
A: I’m not completely sure nobody hasn’t but definitely no prior work demonstrated the capabilities to this extent. The fact itself, that gyroscopes are susceptible to acoustic noise, was known. Manufacturers were aware of it but they didn’t look at it from a security point of view, but rather as an effect that might just add noise to the gyro measurements. We think there hasn’t been enough awareness regarding the possibility of sensing speech frequencies, and the security implications of it. In particular in smartphones, access to the gyro doesn’t require any permission from the user, which makes it a good candidate for leaking information, and a such, an interesting problem to look into. That is also the advantage compare to the regular microphone.
The hardest part apart from the idea itself was to adapt speech processing algorithms to work with the gyroscope signals and obtain results despite of the low sampling frequency and noise.

Q: What are the applications for Gyrophone that you imagine for the future? Spying?
A: The application can definitely be eavesdropping on specific words in a conversation, or knowing who is near the mobile device at a certain moment.

Q: What are the next steps in your work? I mean what is your next work now to progress in this direction? Do you plan to publish soon with new results?
A: The next steps in this direction would be to study better what are the limits of this attack: What physical range is possible? Can the recognition accuracy be improved?
Is there a way to synchronize two or more mobile devices to potentially recover sound?Currently we don’t plan to publish new results for this attack but rather exploring more ways to leak sensitive information from mobile phones by unexpected means.

Q: Do you imagine that in the future such system could be used by everybody? To do what?
A: It is not that easy to make such a system work for practical attacks on a large scale, although more research effort in this direction might yield more surprising results.

Q: How could we avoid this spying risk?
A: The attack is not so hard to prevent either by limiting the sampling rate available to applications, requiring specific permissions, or filtering the high frequencies.
Our hope is that the general issue of side-channel attacks will be addressed by mobile device manufacturers in a way that will make it impossible.

The project page http://crypto.stanford.edu/gyrophone provides access to our code and dataset, as well as a link to the published paper.

Registering a specific file-type handler on Android

(My example is in C# since I was working with Mono for Android but the same is applicable for Java applications as well via the usual manifest XML)

While many answers already exist on forums and several tutorials are available, I’ve still encountered a certain problem when I tried to register a handler for my own file type (let’s call it .xyz). I’ve added an IntentFilter attribute, specifying that I’m handling the View and Edit actions and choosing the Default category.
Since there is no existing MIME type for my extension I’ve used DataPathPattern for filtering and a wildcard for DataHost.
Overall the code looked like that

[IntentFilter(new[]{Intent.ActionView, Intent.ActionEdit},
Categories=new[]{Intent.CategoryDefault},
DataPathPattern=".*\.xyz",
DataHost="*")]
public class MyActivity : Activity { ...

The problem is that my application was now taking over all possible intents such as calling and messaging, and when I was browsing my contacts my application icon showed there as a possible candidate for handling the contact entry (i.e. making the call). What finally solved the problem was specifying the DataMimeType as “application/octet-stream”, like that

[IntentFilter(new[]{Intent.ActionView, Intent.ActionEdit},
Categories=new[]{Intent.CategoryDefault},
DataPathPattern=".*\.xyz",
DataHost="*", DataMimeType="application/octet-stream")]
public class MyActivity : Activity { ...

3D on Android (Geekcon 2011)

I’ve recently participated in Geekcon 2011. It is similar to Garage Geeks but the thing is that people actually build stuff during the two and a half days of staying there.
My friends, Ofer Gadish and Gil Shai from Acceloweb and I worked on displaying stereoscopic 3-D images on an Android device. Those were exciting three days of sleeping very little, coding a lot, soldering, drinking beer and having a lot of fun.

When we initially discussed the idea we thought about using 3-D glasses controlled by Bluetooth but we realized that in the short time we had we would probably not be able to study the control protocol and also figure out how we directly control the Bluetooth transmitter of the mobile device, if it is at all possible to do it on such a low level from a user application.

Instead we have chosen to control the glasses through the audio jack output of the mobile phone. We found another pair of glasses controlled using quite an old VESA standard. The glasses are supplied with 3-pin mini-DIN receptacle. The idea is very simple: high voltage means logical “1” means opening the left eye and low voltage means logical “0” means opening the right eye.

To supply ground, +5V and an accurate square wave synchronization signal to the glasses we’ve done some soldering and connected the mini-DIN to an Arduino, that was in turn receiving the output from the mobile’s audio jack.

DSC_0089

The Android software was a bit of a mess, reaching a switching rate of 60 Hz wasn’t very simple considering the slow performance which I don’t exactly know what to attribute to, whether it is a slow refresh rate of the display or the technique we used to draw the images (although we accessed the canvas directly, bypassing any higher level APIs for displaying a picture). On Saturday afternoon we had it running, with some glitches occurring every couple of seconds, but giving some feeling of 3-D depth. Or was it our exhausted imagination after not sleeping too much during this crazy and awesome weekend?

Pango parking Android application

I’ve published my first android application! Well, no need to get too excited… It’s a simple proxy for the Pango cellular  parking service. The application uses SMS commands supported by Pango to activate and deactivate parking. Meanwhile only the default parking city and area are supported.
I’m planning to add a combo box for choosing the city and parking area, and later, perhaps add support for automatic location detection using GPS.

Enjoy.

https://market.android.com/details?id=com.pango.mobile