Case Study: Accuracy & Precision of Google Analytics Geolocation
Has Google invented teleportation? Find out by yourself.
I recently concluded a project with the Quebec City Tourism. During the course of my engagement they attended a conference where a supposedly enlightened analytics consultant said, “geolocation data in Google Analytics is just plain wrong”. I despise such blanket statements, especially knowing there were no serious studies on this topic.
Not satisfied with it, I decided to tackle the challenge and it turned into a fairly ambitious endeavor. As I did when I revealed how a retailer left $700,000 on the racks on Black Friday, I’m sharing the step-by-step process from the problem statement (Define), down to gathering relevant data (Measure), uncovering patterns and correlations (Analyze) and concluding with possible Improvements and Control methods.
What Is The Hypothesis?
Articulating a hypothesis (or objective) is the first and certainly the most critical step in any analysis. It should be short, clear, and allow all stakeholders to understand what’s at hand and rally around a common goal or objective.
Hypothesis: Are Google Analytics geolocation data accurate — reporting proper city, region, and country; and precise — reporting a reasonable distance from real to inferred location (in km or miles) in order to inform business decisions?
Collecting Relevant Data
Given there are no publicly available studies on this topic — but numerous anecdotes and opinions — the decision was to build a tool which would provide users with an appreciation of the accuracy and precision of their current location. While using the tool, observations are collected and contribute to the ongoing benchmark.
HTML5 and Google Maps Geolocation
The W3C introduced geolocation with HTML5 and it is supported by all modern browsers. Devices with a GPS such as iPhones can report much more accurate data, and this article reveals how accurate location data is derived from your own wireless network.
Did you know? Google initially used Google Street cars to gather Wi-Fi access points but eventually stopped doing it following a lawsuit. Boston-based Skyhook has been a pioneer in using Wi-Fi location-based technology instead of GPS and accused Google of disparaging its technology.
While HTML5 geolocation provides the latitude and longitude, the Google Maps Geocoding API can convert those into a complex structure of address components. The combination of the two gives us door-level accuracy in most cases. Privacy advocates will rightly point out the risk here, but remember user consent is required and this is the convenient functionality we use in Google Maps and others on a regular basis.
Google Analytics Geolocation
Google Analytics uses your IP address to infer your location — this is far less accurate than the HTML5 approach. This is convenient but IP-based locations are approximate — but how much?
Technically, an advanced Google Analytics implementation was necessary to uniquely identify each request and retrieve the data using the Google Analytics Real-Time API.
As a reminder, the Google Analytics Geolocation dimensions are listed here. Note the real-time API offers a limited subset of all dimensions and metrics and restrict us to country, region, city, latitude and longitude.
Did you know? Google Analytics rounds latitude and longitude to 4 digits, thus, providing a maximum precision of 11.1m.
Inferring Accuracy and Precision
In order to assess accuracy, the Google Maps geocoded location is compared against the Google Analytics Real-Time API values for country, region and city. This poses an additional challenge since values for standard geographic dimensions are localized into the user’s preferred language, making it complicated to compare them. Some proxying to standardize localized terms had to be developed.
The precision is calculated directly from the latitude and longitude gathered from HTML5 and Google Analytics. The Vincenty formula is used to calculate and report the bird’s eye distance in kilometers.
Analyze: Uncover Patterns & Correlations
Note: Those are preliminary results since the number of observations in our benchmark isn’t large enough. Please contribute by visiting the GA Location Benchmark tool.
- Country and regions (states & provinces) are very accurate (100% accuracy so far), but city-level varies widely (i.e. the detected city names are different between Google Maps and Google Analytics);
- Typically, mobile-based results using the carrier data are much more accurate than Wifi based ones (exact difference will be quantified later);
- Precision varies from a few meters, but in some cases the distance was 250 km!
- Oddly enough, the city reported by the Real Time API is often different from the one reported later by regular reporting. While the relative distance between the two points seems the same, the direction is very different (as shown below). Could this be a bug or intentional obfuscation for privacy reasons?
A Bug in Google Analytics Geolocation?
- Reference point accurately reported by HTML5 (a few meters away from real location, using Chrome on Mac OSX and Wifi network); the location is accurate down to the door number!
- Google Analytics Real-Time API geolocation: 15 km north-east of the reference point; the city is wrongly reported as Château-Richer;
- Google Analytics Reporting API geolocation: 15 km south-west of the reference point; the city is wrongly reported as Lévis;
Potential Actions for Greater Accuracy and Precision
Most organizations don’t need city or metro-level accuracy. But clearly, if your business depends on precise geolocation, relying on Google Analytics below regional levels is risky. Here’s what you could do to improve the quality of your geolocation data:
- Enable HTML5 geolocation detection (and account for cases where users will not allow it);
- Google Analytics doesn’t permit overwriting the latitude and longitude values so you need to define new custom dimensions in Google Analytics for your own latitude & longitude;
- You will have to use the GA reporting API to retrieve those values and create your own mapping, possibly exploring the data with Tableau (coming up once we have enough observations in our benchmark).
Impact for Marketers
For marketers, the biggest risk related to the lack of accuracy of Google Analytics data is twofold.
Inappropriate Reporting
Avoid reporting on precise geographic locations (city or metro) and only report at regional levels if you have a significant volume of data. If accurate location is critical to your business, use the method explained above to collect more accurate latitude & longitude data and measure the distance (precision), and if city, region and country are matching (accuracy). At best, enable this for long enough to conduct an audit and establish the margin of error. (note: I can help implement this solution — feel free to contact me)
Invalid Audiences
Given the same method is used by AdWords, targeting ads to narrow geography is very risky. AdWords allow you to target specific cities or even postal codes. Although you can target mobile devices — which would typically be more accurate because of the use of GPS data, there is no way to distinguish between a Wifi or a carrier data connection.
Other Digital Analytics Platforms and Ad Networks
Given there are no magic tricks, I’m confident other digital analytics platforms and other ad networks are facing the same limitations.
Next Steps
Once enough observations are added to the benchmark, I will revisit this article and share insight on the precision and accuracy for various countries, desktop vs mobile, and share some cool visualizations using Tableau Public.
If you find any issue, have comments or questions, don’t hesitate to contribute to the conversation.
Further Readings, References and Notes
Most relevant comments & notes from various social media outlets. Those notes will be updated as needed.
- How to measure the accuracy of latitude and longitude? An interesting discussion on gis.statsexchange.com
- Is geolocation considered to be personally identifiable information? According to “The Glossary of Education Reform”, a site dedicated to journalists, parents, and community members, “Common forms of personally identifiable information include… geolocation data (e.g., real-time location data relayed by a smartphone)”.
- A user of the benchmark reported about 10% errors when attempting to target AdWords ads to specific cities — with distance as much as 62km away.
Technologies used in this project
- HTML5, Javascript, jQuery, Bootstrap, and PHP;
- HTML5 geolocation;
- Google Maps API, including maps and geocoder;
- Google Analytics events with custom dimensions;
- Google Analytics Real-Time API;
- Highcharts;
- Vincenty distance formula; Quartiles formula to calculate box-plot values;
Images credit: http://www.pdpics.com/