Fuzzing a geolocation

200907141614
One of the features that we would like to support in Firefox, is the ability to “fuzz” your geolocation. The use case is pretty clear — you want to be able to share your exact location with a mapping application for turn by turn directions, but rather not expose that much information when you simply want to find out what movies are playing in your area.

In the above picture, the red arrow on Embarcadero Road and Middlefield Road is my precise location (around 37.4419, -122.1419). I explicitly plotted that location by hand (it wasn’t reported to the google maps api via geolocation). The bunch of blue markers in the upper left is what my ‘fuzzed’ location was reported as. There are a bunch the blue icons because I let this little webapp run for a bit. Now, this demo will not work in yet. I haven’t finished the fuzzing implementation in Firefox. However, here is what we are planning to do and I request that you comment here or in the bug 454488.

The basic threat is that the web application you are going to can figure out your precise location even when you have “fuzzed” your location.

The first thing we will do when starting up for the first time is to generate two random numbers. One will be a distance from your actual position, and the other number will be a direction. So, for me, in the above example, the displacement magnitude was 1503, and the direction was 139 (in degrees).

Now, these two numbers will never change, and the reason is that a website, if it sees enough of these displacement vectors, they can average them out and discover your real location. Not so good for “fuzzing”.

The second thing we do is that each and every time a location is passed to the web application, we add a bit of randomness to the displaced location. This results in the small cloud of blue dots you see above. What you can’t really see by this map is that the error field of the geolocation is pretty large. So, even though it is a pretty tight cluster, the webapp will see an error greater than the distance between the reported position and the actual position.

Other properties of the location such as altitude, speed, ect. will be zeroed out.

I hope that this helps a bit. Let me know what you think.

Technorati Tags: , , ,

This entry was posted in mozilla and tagged , , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

18 Comments

  1. Perry Lorier
    Posted October 14, 2008 at 9:05 pm | Permalink

    If the website can ever guess your real location (eg they ask you to enter a delivery address, and you appear to be “near by”) they can learn your particular pair of offsets and then that they will be accurate forever.

    Even small fuzzing that’s applied afterwards can be as you point out, averaged away.

    Why not round *everybody* to some specific grid system. This means that every firefox user within say the block, will always report the same location in that block no matter where they are. If the user is moving at a reasonable (but constant) speed you might be able to estimate where they are fairly accurately by timing when they move from “block” to “block”, but as soon as they change their velocity (especially if they slowdown or stop), you lose the ability to estimate where they are to more accurate.than the size of the block. If you want to provide jitter (I’m not sure why you’d bother), you can return points anywhere inside that “block”, the average will be the center of the “block”, not your real location.

    Blocks could be a lat/long rounded to the nearest 100m/1k/10k or whatever. This is consistant for all firefox users, and still provides the same level of anonymity.

  2. Posted October 15, 2008 at 12:30 am | Permalink

    A couple of thoughts about how you could ‘crack’ this approach to location fuzzing, based on knowing the geography of the area near the fuzzed location. I think this is a vulnerability, so long as the offset randomization is small/predictable.

    1. If your offset stayed the same, and the webapp took your fuzzed location as you moved along some of the larger roads on that map, then you would draw the shape of those roads, but offset by a certain amount. It would be pretty easy to figure out your offset and hence original location from this information.

    2. Also, if your randomized position lies near a known busy point, like at an airport, then there’s it’s a good guess that your real location is at this busy point. This inference could allow the offset to be guessed with high probability.

  3. Jesper Kristensen
    Posted October 15, 2008 at 12:46 am | Permalink

    Wouldn’t your traveling direction and speed be possibler to calculate precisely, if you use a constant displacement angle and magnitude?

  4. Posted October 15, 2008 at 4:03 am | Permalink

    Perry: it seems silly to predicate things based on a user giving a website an address. If you give a website your address, then clearly they know where you are. They don’t need the geolocation API to figure that out. I would think that “don’t give your address to websites that you don’t want to know where you live” is not a tough thing for people to understand.

  5. Posted October 15, 2008 at 5:31 am | Permalink

    Seeing that map did make me think “What about if you are moving”? If you a travelling along roads or geographic features (e.g. a river) that are larger than the local fuzz factor, then it may be possible to work out the shape of the route you have travelled, and compare that to a map of the local area, and hence calculate your offsets.

  6. Posted October 15, 2008 at 6:46 am | Permalink

    Yeah, like Perry, I was wondering whether it wouldn’t be easier to just truncate/round to an appropriate degree so that there’s no secret data to extract over time. Rounding to the nearest 5 degree-seconds is about 150m, which is probably pretty okay for “neighbourhood” I’d think, or you could run to the nearest thousandth in decimal-land, which would be what, 3.6s = 111m? Maybe that’s a little too close for comfort – nearest hundredth would be ~1.1km/0.7mi…

    You’ve thought more about this than I have, so I can totally believe that there is a reason to not bias towards gridlines?

  7. skierpage
    Posted October 15, 2008 at 3:21 pm | Permalink

    Nice. But the “www.example.com wants to know where you are. Tell them” geolocation bar clearly needs a [Somewhere else] button so I can find info about where I *will* be; and once the UI has that I’ll probably just consistently lie about where I am — when I’m at home I’ll say I’m at the police station three streets away.

  8. Posted October 16, 2008 at 1:38 am | Permalink

    The displacement seems to be pretty large – six or eight streets. Does anyone need that amount of fuzz? Particularly if I’m on foot, “nearby restaurants” for the fuzzed location would be a lot less useful than “nearby restaurants” for the real location.

    I agree that there’s a risk that sites could use other data to work out your offsets. Could you regenerate them once a month, say? Or change them slowly over time?

  9. Posted October 17, 2008 at 11:54 am | Permalink

    I third Perry Lorier and Johnathan Nightingale’s question.

    Why not truncate the Latitude and Longitude (perhaps randomly picking between “ceiling”, “floor”, and “round”) to something like the nearest 33rdth, 100th, 333rd, or 1000th of a degree?

    33rd would leave a margin of error in the range of 1.7-2.4 km (if my math is correct), depending on user latitude. 100th would be 580-780 m. 333rd would be 190-260 m. 1000th would be 58-78 m. Those seem pretty reasonable for “my town” though “almost where I am standing”.

    They could be called something cute like “Area, Neighborhood, Block, and Shouting Distance”.

  10. Posted October 22, 2008 at 7:23 am | Permalink

    You can’t just truncate, because if the site watches your location over a period of time and notices a transition from one “block” to another, it knows you just crossed the line, thereby localizing you much more than you’d want.

  11. James Napolitano
    Posted October 26, 2008 at 6:05 pm | Permalink

    One other possible approach is for the API to offer an option to request location data to within a certain margin of error, i.e. to nearest 0.1, 1, 10, 100, etc. km. The user could then choose how specific the information he is providing will be.

    >Now, these two numbers will never change, and the reason is that a website, if it sees enough of these displacement vectors, they can average them out and discover your real location. Not so good for “fuzzing”.

    One approach that might help is if, on a user’s first use, you generated another random number that doesn’t change. This number would be used in generating the random numbers used for fuzzing, so that the average of the random numbers isn’t 0. (i.e. have a weighted probability distribution that’s different for every user). This way, if a website averages out many reported values to try to get some info on the user, they end up with an incorrect answer.

  12. Posted October 27, 2008 at 9:42 am | Permalink

    @gerv the default displacement is about 1mile. It is pretty much okay for getting your local theater.

    @ian yup, if you are moving, and there is only one street, someone probably could extrapolate that you are on the street and not off of the street.

  13. Martin Thomson
    Posted November 2, 2008 at 4:07 pm | Permalink

    Ian’s solution can be addressed by making the random offset a function of time.

    Of course, all the averaging and tracking methods are moot if you only give the site location once. That would be my suggested approach. Simply avoid giving the site any more information to use. But that probably comes down to a user decision; I don’t know how to properly impress upon a user that understanding.

    1 mile is pretty damned arbitrary. I’d hope that users can set their own “fuzz amount”. I’m also a big fan of SI units ;)

    One thing that this fails to take into consideration is uncertainty. Zeroing accuracy will give a false impression to the site. So you have to provide some value. Also, if you already have 3km uncertainty (common enough if you use the serving cell for location), there seems little point in further obscuring the location. All you are doing is making it worse.

    My suggestion: take the point (and accuracy) as a circle on the map. Pick another circle of the required size (1 mile can be your default if you like) and put it anywhere you like, as long as it covers the original circle. Use your stored random numbers to pick the offset.

    The advantage of this method is that if you get bad location information, you don’t end up making it worse.

    I see no point in adding additional randomization. If you aren’t moving, the site will be able to detect the false jitter and be able to detect the randomization.

  14. Doug Turner
    Posted November 4, 2008 at 8:02 pm | Permalink

    So, we had a quick brainstorming session. We questioned why not just truncate. As gerv mentioned above, we need to worry about solving the boundary problem. If we solve that, we might have a simpler and safer solution. So, here is the straw man:

    1) remember the exact location that a geolocation provider returns. The first one we see is put into preferences and not change it (*).

    2) any time we are asked for a fuzzed location, we take the stored value, round/truncate the stored value to some number of decimal points, increase the accuracy on the Position object, and return that to the web.

    (*) For updating the stored value, we monitor the locations being send in from the geolocation provider. When the difference between our stored location and the current location differ by more than 1/2 of the rounding value we are using, we update the stored geolocation.

    For example, suppose we are rounding:

    37.012601 -122.001795

    to:

    37.013 -122.002

    We would update the stored value when the actual position moves to:

    37.013101 -122.002295

    A couple things:

    1) this does solve the boundary problem
    2) everyone in a give area would be reported at the same position

    Thoughts?

  15. Doug Turner
    Posted November 5, 2008 at 3:31 pm | Permalink

    After lots of discussions, we have decided not to implement the “fuzz my location” in Firefox at this time. We have attempted to provide our users with a way to reduce the accuracy of a geolocation request. However, we do not feel that we can safely reduce the accuracy in all cases. Over time, the approaches would result in giving out more information than intended — comprising the general reason to fuzz.

    This doesn’t mean Firefox will not have fuzzing at some point. It just isn’t good enough for our users.

  16. martyfmelb
    Posted July 9, 2009 at 3:12 am | Permalink

    I hate, absolutely despise this “all-or-nothing” approach. I choose “nothing”. If I only want a website to know I’m in Melbourne, I want to be able to just tell it, “Melbourne”, NOT 103 Warrigal Rd Hughesdale VIC by the way here’s the key to my house if you’d like to just walk in while I’m not home (which you’ll be able to find out soon enough with a little bit of cross-correlation, I’m sure – I’m looking at you, evil Google, the very arbiter of Firefox’s raw geolocation data).

    Please default the geolocation provider, next point-release, to *someone with less of a vested interest* in knowing exactly where we live / are / are thinking about being at / in real time / popped out for a smoko with a geolocation-sensitive website loaded on Fennec left running in my pocket by accident, I tend to do this 8:54 AM every workday, next Monday comes and there’s a friendly Zippo salesman 8:53 AM out back of work just as I step out … you see what I’m getting at. (It’s an over-the-top scenario only for the next 10 years, max, then it’ll be mainstream like radio talk-shows.)

    I am most wary of the security ramifications of having geolocation directly in the browser. This is the first time I have ever considered avoiding a Firefox update. Geographic information is marketing gold, and plenty of out-of-work programmers are desperate enough to try breaking Firefox’s geolocation permissions if it means food on the table.

  17. Posted July 9, 2009 at 9:00 am | Permalink

    Hi Marty,

    I think we share the same desire to improve the user experience around this. I do want a way for the user to specify where they are (or want to be). For example, I am in Mountain View, Ca. However, I really want to state that I am in San Jose, Ca. when I do searches. This should be possible.

    Btw, sadly websites already know that your are in Melbourne without ever having to ask you anything (they can use a technology called GeoIP. their are free libraries and databases that enable this). I would love to see GeoIP used without the users permission go away. Hopefully Geolocation in the browser will encourage web sites to ask the user before using GeoIP.

    Thanks for your comments!

  18. Posted August 8, 2009 at 9:28 pm | Permalink

    FTW!

One Trackback

  1. [...] your general location, or fuzzing your location, using the Geolocation plug-in is a work in progress that should be resolved soon.  Right now it [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>