What is Google Analytics data sampling, and what’s so bad about it?
Google (2019) explains what data sampling is:
“In data analysis, sampling is the practice of analysing a subset of all data in order to uncover the meaningful information in the larger data set.”
This is basically saying instead of analysing all of the data, there’s a threshold on how much data is analysed and any data after that will be an assumption based on patterns.
Google’s (2019) data sampling thresholds:
This threshold is limiting because your data in GA may become more inaccurate as the traffic to your website increases.
Say you’re looking through all your traffic data from the last year and find you have 5 million page views. Only 500K of that 5 million is accurate! The data for the remaining 4.5 million (90%) is an assumption based on the 500K sample size.
This is a key weapon Google uses to sell to large businesses. In order to increase that threshold for more accurate reporting, upgrading to premium Google Analytics 360 for approximately US$150,000 per year seems to be the only choice.
What’s so bad about data sampling?
It’s unfair to say sampled data is to be disregarded completely. There is a calculation ensuring it is representative and can allow you to get good enough insights. However, we don’t encourage it as we don’t just want “good enough” data. We want the actual facts.
In a recent survey sent to Matomo customers, we found a large proportion of users switched from GA to Matomo due to the data sampling issue.
The two reasons why data sampling isn’t preferable:
- If the selected sample size is too small, you won’t get a good representative of all the data.
- The bigger your website grows, the more inaccurate your reports will become.
An example of why we don’t fully trust sampled data is, say you have an ecommerce store and see your GA revenue reports aren’t matching the actual sales data, due to data sampling. In GA you may be seeing revenue for the month as $1 million, instead of actual sales of $800K.
The sampling here has caused an inaccuracy that could have negative financial implications. What you get in the GA report is an estimated dollar figure rather than the actual sales. Making decisions based on inaccurate data can be costly in this case.
Another disadvantage to sampled data is that you might be missing out on opportunities you would’ve noticed if you were given a view of the whole. E.g. not being able to see real patterns occurring due to the data already being predicted.
By not getting a chance to see things as they are and only being able to jump to the conclusions and assumptions made by GA is risky. The bigger your business grows, the less you can risk making business decisions based on assumptions that could be inaccurate.
If you feel you could be missing out on opportunities because your GA data is sampled data, get 100% accurately reported data.
The benefits of 100% accurate data
Matomo doesn’t use data sampling on any of our products or plans. You get to see all of your data and not a sampled data set.
Data quality is necessary for high impact decision-making. It’s hard to make strategic changes if you don’t have confidence that your data is reliable and accurate.
Take the challenge!
Compare your Google Analytics data (sampled data) against your Matomo data, or if you don’t have Matomo data yet, sign up to our 30-day free trial and start tracking!