How is funnel data archived?
Like most analytics in Matomo, funnel data must be archived before it can be presented in the application. Archiving of funnel data occurs during the normally scheduled archiving process. There are, however, some key differences. For example, instead of calculating funnel analytics and inserting them directly into archive tables, an intermediary table is used to store the initial calculated values. The intermediary log_funnel table is then queried while inserting the data into the archive tables.
Why do we archive funnel data differently?
Since funnels are defined steps, such as specific URL patterns, we need to filter the visit actions based on those specific patterns. This means that Funnels is one of the most complicated and resource intensive plugins to archive data for.
How do we handle funnel archiving efficiently?
As mentioned in the page introduction, we use the log_funnel table to record the initial count of visits per step of the funnel. We then query that table to get the final archived data. Starting with version 4.1.2 of the Funnels plugin, we limit how frequently we archive data for the current day. This is based on the time_before_today_archive_considered_outdated config setting, which defaults to 900 seconds, but is 21600 for cloud instances. The time elapsed is calculated using records in the option table with the funnel_archiving_today prefix. The same day limit was actually supposed to be used previously, like in other parts of the application, but must have been overlooked. Another change with version 4.1.2 is that we no longer reuse data in the log_funnel table. Reusing that data helped with efficiency, but led to some inaccuracies. There were times when certain segments were archived with empty results because it was expected for All Visits to archive first, but that isn’t always the case. To remedy this, we now calculate the log_funnel data each time we archive, but we skip archiving if that same day has already recently been archived and no new visit actions have been recorded for that date. We track this using records in the option table with the prefix of max_idlink_va. Another change is that if we have multiple processes trying to archive for the same day, the first process must complete before the next can proceed. If a process is stuck waiting for more than 6 hours, it will log a warning and give up.
How is archive invalidation handled
Invalidating archive data for funnels works as expected. However, there is one difference starting with version 4.1.2 and it’s that invalidation doesn’t invalidate data for today or the previous day since archives for those two days might not be finalised yet.