How do I replay the traffic to Matomo and ingest logs of matomo.php (or piwik.php) requests?
Log Analytics lets you import any web server log file. In this FAQ we will focus on one particular type of logs that you may find useful to import in Matomo: the Matomo tracking API logs.
What are Matomo Tracking API logs?
When users visit your websites the Matomo JavaScript tracking code will send a HTTP(S) request to matomo.php
(or piwik.php
) Tracking API endpoint. If you use one of the Tracking API clients to measure your mobile apps or games or desktop apps, they will also send requests to matomo.php
(or piwik.php
). Your webserver handling those requests will create access log files containing the tracking data that Matomo will collect in your database.
Here is what an example access log line looks like:
12.10.30.51 - - [03/Feb/2020:16:40:31 +1300] "GET /matomo.php?idsite=1&rec=1&urlref=https://www...................... HTTP/1.1" 200 256 "-" "Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0"
Uses of replaying logs
Replaying logs is very useful for example when your database server breaks down and Matomo could not write the data for a few hours. Luckily you can use your web server logs matching /matomo.php
(or /piwik.php
) and replay them into Matomo. Replaying logs means that the Log Analytics tool will go through each line of the log and import them in your Matomo for the correct datetime in the past. Replaying logs is also useful if you want to setup High availability Matomo.
How to replay Tracking API logs? Steps to follow
1) Firstly you would prepare a log file containing only the requests that should imported. Typically you would import only a given period of time. In these logs, all the request URLs would start with matomo.php
or piwik.php
. These are the requests we can replay next.
2) Secondly, make sure all requests in the file are sorted chronologically. This is especially important when you have merged data from different log files. Afterwards it is important to order the log file by the date-time field.
3) Finally you can replay the tracking API logs by calling the log analytics importer with the --replay-tracking
parameter, for example:
./misc/log-analytics/import_logs.py --url=piwik.example.net --replay-tracking /var/log/apache2/access.log
4) After replaying the logs it is recommended to reprocess the data with the core:archive
console command.
Once this is completed, congratulations: you have now recovered your missing web analytics data!
Limitations of logs replay
When replaying the logs, most of your log data will be replayed as expected (visits, pageviews, goals, ecommerce transactions, etc.) but there may be a few Tracking requests which are not replayed: specifically any log entries which are POST requests will not be replayed (because the POST request parameters are not stored in the access log files).
Matomo JavaScript tracker is more likely to use POST requests when you use Form Analytics, Media Analytics, Heatmaps, Heartbeat timer, so some of these tracking requests may not be replayed.