My Take: Compliance with Verified Data
A forensic approach to Google Analytics data quality and compliance.
Much has been said about GDPR compliance and data quality. Broadly speaking, there are three areas which need to be addressed:
- Privacy and Cookie Policy Maintenance: the clauses and lingua franca of the law — keeping track of the legal aspect changes and your counterpart responsibility to disclose everything you collect. After reviewing a handful of solutions, I have opted for iubenda. This service takes care of my privacy and cookie policies on DaVinci.Tools as well as within the Chrome Extension itself;
- Cookie Solution: A method to ask for consent — cookie banner, prior blocking and asynchronous reactivation. CookiePro, by OneTrust, a leader in privacy management, nicely address this point and will scan your site and find all the tools currently in use, whether they use cookies or local storage. For one of my clients, Quebec-Cite.com, we used Cookie Consent, an open source solution which we customized.
- Ongoing Compliance: Monitoring things are the way they should and raising the flag if they aren’t. Only the most expensive solutions offer this type of service.
For the third point, there’s a newcomer that I like very much because it offers a very unique approach, and on top of that, I know who’s behind it: Verified-Data.com by Brian Clifton, a veteran in the field, focuses on Google Analytics data quality and compliance.
If you are going to make data-informed decisions, better be confident in your data! Verified Data addresses the complex task of making sure your configuration is actually right and you can have reasonable trust in your data.
The Unique Value Proposition of Verified Data
While other solutions will merely check the presence of tags, Verified Data will leverage its access to your Google Analytics data to look at the actual configuration.
Verified Data uses a hybrid method — combing what should be tracked with what is actually tracked. A full audit consists of around 200 unit tests. That means it’s very good at establishing if there are any holes in the setup or GDPR compliance.
Setting a First Audit
Setting the spider is straightforward: pick a Google Analytics view, and optionally set up crawl depth, URL filters and parameters to exclude or specifically keep.
The audit configuration is also as easy: set up the noise threshold — by default, the audit will only analyze the first 10 rows of a given dimension. Set the look-back window (30 days is certainly sufficient to surface initial issues), and next, set up the weight of each audited items — or just leave it as they are for now. This will affect the scoring of the Quality Index so you can decide what’s most important for you. By default, PII Compliance has twice the importance as the other items.
The beauty of Verified Data is its ability to audit the configuration itself. For example, it will check who has access (domains and specific emails) to your data, an aspect too often overlooked when thinking about privacy and compliance.
Looking Under the Hood
Every account covers the Structure & Access of your Account/Property/Views, some Data Validity checks, Error Page tracking and Deployment & Coverage.
Features are then broken down into four main modules:
- GDPR & Privacy will check for the unwanted presence of PII, privacy and compliance, and perform monitoring every 4 hours;
- Marketing & Ads checks Google Ads tracking, visitor segments and campaign tracking;
- The Bespoke module focuses on things such as Site Search, File Download, Outbound Links, Events and Goals;
- And finally, E-Commerce looks at transactions, product dimensions, and shopping stages.
Once set up, the spider will crawl the site (up to the given page limit) looking for trackable content. The spider behaves just like a real visitor in that it waits for each page to load, then searches for clickable elements e.g. file downloads, outbound links, search boxes, submission forms, etc., and clicks on them. The purpose being to listen for any signals sent to Google Analytics. If the pageview or any of those click throughs do not send a Google Analytics hit, then you know there is a potential problem…
Secondly, Google Analytics API requests are conducted to query your data. Heuristic tests look for oddities such as any duplicate data points, missing data, inaccurate and incorrect data. It also stress tests your governance and hunts for traces of personal information — which, of course, you should not be collecting.
A nice touch from the headless crawler is it won’t actually fire the tags — in fact, it will let the whole logic of your tracking implementation do what it has to do and intercepted the request before it’s sent to Google.
You might wonder, as I did, how AI is being used. For now, it’s used in detecting Personally Identifiable Information (PII). AI gives Verified Data two very big advantages:
- Firstly, PII comes in many different shades, and by its nature issues are low in volume — typically, much less than 0.01% of total hits. That makes it hard to spot with human eyes, cumbersome to code and inefficient if it was to use a rules-based approach.
- By using AI they are also able to spot PII in different languages. What constitutes PII varies greatly by region and country. Imagine if you have to manage a large-scale localized brand!
Back in the days of the WASP crawler I always wanted to find a way to avoid crawling a whole site, and instead, use a sample — but knowing which pages would constitute a good enough sample was a challenge. Machine Learning could be used to gradually uncover the pages that are the most representative of the overall website content and features.
Understanding the Results
Verified Data claims 60% of Google Analytics implementations are so poor you shouldn’t even trust the data. Let’s forget about being “data driven”…
Making decisions based on misleading data is worse than forgoing analytics completely and going with experience and guts.
After an audit you get a nice scorecard with a snapshot of all the checks which were done behind the scene. As you run new audits, you will get additional columns so you can easily see how things are improving.
If we drill down into the Data Validity item, which has a perfect score, we see:
- Single Data Source
- Filters are being used
- Account, Property or View filters are manageable (less than 20)
- Subdirectory filters are avoided
- Multiple includes are avoided
- There’s a filter to exclude Internal/Staff visits
- A Hostname include filter is present
- Spiders & Bots are excluded
- Spam Referrals are excluded
- There are no “not set” Landing Pages
- There are no self-referrals
- The view doesn’t contain mixed case URLs
- There are no intra-site campaign links
On the other hand, the item “Deployment & Coverage” is red. Here, the role of a tool like Verified Data is to raise the flag so we can pay attention to what matters.
In this specific case, the GA tracking code coverage wasn’t detected correctly because the audited website, Quebec-Cite.com, uses GTM with Cookie Consent and some custom code which checks the origin of the traffic, and if from the EU, prevent GA from firing unless consent is explicitly given.
I modified the JavaScript of my Custom HTML Tag to detect if the userAgent contains “verified-data.com”, and in this case, just pretend consent was given so the crawler could do its job correctly.
Like any similar service, the configuration of the crawler and other parameters is critical. For example, under the Event Tracking category, you will be warned if there are more than 10 different event categories, and a red flag if there are more than 20. Those are rules of thumb based on experience. The logic goes something like this and is aligned with the Radical Analytics philosophy of “minimum viable product” and quick iterative improvements: Apart from pageviews, track only what you need i.e. start small and then expand as your needs grow. So typically 10 categories should be enough, but if its been thought-through, the thresholds can be increased as you see fit.
But! If you want to tweak all parameters, head over to Audit Settings and you’ll be able to adjust everything you want — the weight of each category, and even individual thresholds.
Typically, audits obey the 80/20 Pareto rule. Most of the issues will be surfaced through initial audits and can reasonably easily be solved. Start there, and use the Quality Index as a way to show how data quality is being addressed and improved over time.
Apart from governance, the point with auditing is to get people to think about data quality and data noise then adjust as necessary. Generally that through process is just not happening…
My Take
Things have come a long way since I created WASP a decade ago. At the time, the goal was to identify tags in situ to help debugging implementations. Then came things like ObservePoint and Hub’Scan cloud-based crawlers which checks the presence of tags and run scenarios, but they both fail at actually checking the validity of the implementation itself (beyond defining which metrics and dimensions you expect). ObservePoint being an American company, their offering doesn’t even address GDPR or privacy compliance in general and their price point is pretty high. Hub’Scan merely uses its scenario technology to record the expected steps and validate against those. Other tools I like, such as GTMetrix and DareBoost for website performance analysis, covers other aspects, but not GA implementation. Those solutions do not find trackable content within pages such as outbound links, file downloads, etc.
This really sets Verified Data apart as the solution for Google Analytics implementation audit — and not only for GDPR compliance. This is also where the AI and forensic aspect built into the solution is interesting, as it will be able to evolve and detects new patterns in the data.
It’s certainly worth giving it a try, even if with limited features, at no cost, and no credit card required. In its latest release, the Base Configuration Checks are free for 50 pages, so you can appreciate the power and potential of Verified Data. Of course, auditing and compliance should be a continuous process, not a one-off… If you go for the full service with 10 projects (typically 10 Google Analytics views), it will cost about 500€ a month, while an entry-level of 2 projects with GDPR and Governance compliance, marketing and bespoke validations will cost 250€ per month. Those price points are quite reasonable compared to the competition.
Verified Data is clearly on the right track. Since it’s crawling the site anyway, so capturing other tools and allowing us to create our own checks would make this solution even more powerful. Given the tool is still in beta, and based on my conversations with Brian, we can expect the Verified Data service to quickly evolve and improve in the coming months.
Stéphane Hamel is a seasoned independent consultant, teacher and speaker. He shares his passion for digital marketing and analytics — be it technical ‘how to’ or assessing organizations’ digital capabilities and maturity. His latest venture is Da Vinci Tools — Google Marketing Platform power tools for smart digital marketers and analysts.
If you enjoyed this article, you should follow me on Facebook or Twitter, connect on LinkedIn and while you’re at it, why not click the nice little clapping hands on the left and follow me on Medium!