How to Set up Auto-Archiving of Your Reports
If your website has more than a few hundreds visits per day (bravo!), waiting for Matomo to process your data may take a few minutes. The best way to avoid these waiting times is to set up a cron job on your server so that your data is automatically processed every hour.
If you are using Matomo for WordPress, you don’t need to do this as it utilises the WP Cron.
If you are on the Matomo Cloud, this is automatically taken care of for you as well.
But if you are using Matomo On Premises, read on!
To automatically trigger the Matomo archives, you should set up a script that will execute every hour.
There are instructions below for Linux/Unix systems using a crontab, but also instructions for Windows users with the Windows Task Scheduler, and for tools such as CPanel. If you don’t have access to the server, you can also setup a web cron.
Linux/Unix: How to Set up a Crontab to Automatically Archive the Reports.
A crontab is a time-based scheduling service in a Unix-like server. The crontab requires php-cli or php-cgi installed. You will also need SSH access to your server in order to set it up. Let’s create a new crontab with the text editor nano
:
nano /etc/cron.d/matomo-archive
and then add the lines:
MAILTO="youremail@example.com"
5 * * * * www-data /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /home/example/matomo-archive.log
Until now, your Matomo interface has been initiating this archiving process in an ad hoc fashion: users were clicking on the browser to view a report and, in many cases, the report needed to be calculated on demand. Unless your amount of data is very small, this quickly becomes unsustainable: users will experience a sluggish experience and my encounter many instances where instead of data they see “Oops, there was an error with the request”. Worse, they may repeatedly click the interface, which just exacerbates the problem and makes them think Matomo is not working.
Therefore it is important that you disable this behavior. In your override file (config.ini.php) add this line:
[General]
browser_archiving_disabled_enforce = 1
The new setting will be respected immediately: you don’t need to reset anything.
That’s it, you have successfully set up scheduled archiving.
The Matomo archive example script shown above will run every hour (at 5 minutes past). Generally, it completes in a few minutes when your site is new. As you acquire more data, it may take a couple dozen minutes. On the very largest sites it can take hours, or up to a day on extremely large servers that are tracking multiple sites, all with high traffic. To achieve those levels of performance you will want to consult the guides Requirements for Matomo On-Premise as well as Scalable Matomo Setup. If your execution time takes an hour or more, you will need to rewrite the first line of your cron job so that it happens every 2 hours, every 6 hours, or even once per day. One common practice is to write a cron to run every 12 hours for a high traffic site, and then add a cron that runs once per hour for all of your low-traffic sites.
If the archiving task executes completely and without errors, your reports will periodically refresh on schedule. Executing this refresh process is called archiving in our documentation. For clarity on why this refresh process is required, read more about the relationship of raw data to report data.
Important note: your most important job as an administrator will be to check weekly, or daily, for any messages which arise during the archiving process. These will be sent to the email address you specify in the last line of your cron job. They will also be logged in the default location set in the global.ini.php file, which is
[log]
logger_file_path = tmp/logs/matomo.log.
Or set your preferred logging location by adding an override in your config.ini.php file:
[log]
logger_file_path = example/path/to/your/preferred/location
Because you will no longer need the archiving to be triggered by users interacting with the browser, you should disable this feature
Congratulations: you have succesfully enabled scheduled archiving of your Matomo data.
Now, if you want to understand the console core:archive
command in more detail, and see some special cases, either check our developers guide, or read on below.
Breakdown of the parameters:
MAILTO="youremail@example.com"
If there is an error during the script execution, the script output and error messages will be sent to the youremail@example.com address.www-data
is the user that the cron job will be executed by. The user is sometimes “apache”. It is recommended to run your crontab as the same user as your web server user (to avoid file permissions mismatch)./usr/bin/php
is the path to your PHP executable. It varies depending on your server configuration and operating system. You can execute the command “which php” or “which php” in a linux shell, to find out the the path of your PHP executable. If you don’t know the path, ask your web host or sysadmin./path/to/matomo/console
is the path to your Matomo app on your server. For example it may be/var/www/matomo/console
.--url=http://example.org/matomo/
is the only required parameter in the script, which must be set to your Matomo base URL eg. http://analytics.example.org/ or http://example.org/matomo/> /home/example/matomo-archive.log
is the path where the script will write the output. You can replace this path with /dev/null if you prefer not to log the last Matomo cron output text. The script output contains useful information such as which websites are archived, how long it takes to process for each date & website, etc. This log file should be written in a location outside of your web-server so that people cannot view it via their browser (because this log file will contain some sensitive information about your Matomo installation). You can also replace > by >> in order to append the script output to the log file, rather than overwrite it on each run (but then we recommend you rotate this log file or delete it eg. once a week).2> /home/example/matomo-archive-errors.log
is the optional path where the script will write the error messages. If you omit this from the cron tab, then errors will be emailed to your MAILTO address. If you write this in the crontab, then errors will be logged in this specified error log file. This log file should be written in a location outside of your web-server so that people cannot view it via their browser (because this log file will contain some sensitive information about your Matomo installation).
Description of the ‘linux cron’ utility: The cron utility uses two different types of configuration files: the system crontab and user crontabs. The only difference between these two formats is the sixth field.
- In the system crontab, the sixth field is the name of a user for the command to run as. This gives the system crontab the ability to run commands as any user.
- In a user crontab, the sixth field is the command to run, and all commands run as the user who created the crontab; this is an important security feature.
If you set up your crontab as a user crontab, you would instead write:
5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
This cron job will trigger the day/week/month/year archiving process at 5 minutes past every hour. This will make sure that when you visit your Matomo dashboard, the data has already been processed; Matomo will load quickly.
Test the cron command
Make sure the crontab will actually work by running the script as the crontab user in the shell:
su www-data -s /bin/bash -c "/usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/"
You should see the script output with the list of websites being archived, and a summary at the end stating that there was no error.
Launching multiple archivers at once
If you have multiple sites you may be interested in running multiple archivers in parallel for faster archiving. We recommend not starting them at the same time but launch them each a few seconds or minutes apart to avoid concurrency issues. For example:
5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
6 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
In the above example one archiver will start at the minute 5 of each hour, the other starts one minute later. Alternatively, you can also start multiple archivers at the same time using a script which you then execute regularly through a cronjob.
CONCURRENT_ARCHIVERS=2
for i in $(seq 1 $CONCURRENT_ARCHIVERS)
do
(sleep $i && /path/to/matomo/console core:archive & )
done
Windows: How to Set up Auto-Archiving Using Windows Scheduler
-> Please see our dedicated FAQ for setting up a scheduled task in Windows.
Plesk: How to Set up the Cron Script using Plesk
Learn more about installing Matomo on Plesk and configuring the archiving crontab in the Plesk Matomo guide.
CPanel: How to Set up the Cron Script Using CPanel
It is easy to set up automatic archiving if you use a user interface such as CPanel, Webmin or Plesk. Here are the instructions for CPanel:
- Log in to CPanel for the domain with the Matomo installation
- Click on “Cron Jobs”
- Leave email blank
- In ‘Minutes’ put 00 and leave the rest blank.
-
You then need to paste in the path to the PHP executable, then the path to the Matomo /console script, then the parameter with your Matomo base URL –url=matomo.example.org/
Here is an example for a Hostgator install (in this example you would need to change ‘yourcpanelsitename’ to whatever your particular domains cpanel username is)/usr/local/bin/php -f /home/yourcpanelsitename/public_html/matomo/console core:archive --url=example.org/matomo/ > /home/example/matomo-archive-output.log
“yourcpanelsitename” tends to be the first eight letters of your domain (unless you changed it when you set up your cpanel account)
6. Click “Add New Cron Job”
Matomo will process your reports automatically at the hour.
Web Cron When Your Web Host Does Not Support Cron Tasks
If possible, we highly recommend that you run a cron or scheduled task. However, on some shared hosting, or on particular server configurations, running a cron or scheduled task may not be easy or possible.
Some web hosts let you set up a web cron, which is a simple URL that the host will automatically visit at a scheduled time. If your web host lets you create a web cron, you can input the following URL in their hosting interface:
https://matomo.your-server.example/path/to/matomo/misc/cron/archive.php?token_auth=XYZ
Replace the XYZ by the super user 32 characters token_auth. To find the token_auth, log in as a super user in Matomo, click on Administration link in the top menu, go to Personal and click Security. Scroll down below and you’ll find where to create a new Token_auth.
Notes:
- For security, if possible we recommend you
POST
the token_auth parameter to the URLhttps://matomo.your-server.example/path/to/matomo/misc/cron/archive.php
(instead of sending the token_auth as aGET
parameter) - You can test the web cron by pasting the URL in your browser, wait a few minutes for processing to finish and then check the output.
- The web cron should be triggered at least once per hour. You may also use a ‘Website Monitoring’ service (free or paid) to automatically request this page every hour.
Important Tips for Medium to High Traffic Websites
Disable browser triggers for Matomo archiving and limit Matomo reports to updating every hour
After you have set up the automatic archive script as explained above, you can set up Matomo so that requests in the user interface do not trigger archiving, but instead read the pre-archived reports. Login as the super user, click on Administration > System -> General Settings, and select:
- Archive reports when viewed from the browser: No
- Archive reports at most every X seconds : 3600 seconds
Click save to save your changes. Now that you have set up the archiving cron and changed these two settings, you can enjoy fast pre-processed near real-time reports in Matomo!
Today’s statistics will have a one hour lifetime, which ensures the reports are processed every hour (near real time)
Increase PHP Memory Limit
If you receive this error:
Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate X bytes)
you must increase the memory allocated to PHP. To give Matomo enough memory to process your web analytics reports, increase the memory limit. Sites with less data or fewer features enabled can use 512M or 2G. (If the problem persists, we recommend to increase the setting further. 8G is a common size for a medium to large Matomo instance.)
memory_limit = 512M
To find where is your php.ini
file on your server, you can follow the following steps: create a test.php
file and add the following code:
<?php phpinfo(); ?>
and open it in browser, it will show the file which is actually being read by PHP running on your webserver. It will also show your currently set max_execution_time
value.
More High Traffic Server Tips!
It is possible to track millions of pages per month on hundreds or thousands of websites using Matomo. Once you have set up cron archiving as explained above, there are other important and easy steps to improve Matomo performance.
For more information, see How to configure Matomo for speed.
More Information About Matomo Archiving
- If you run archiving several times per day, it will re-archive today’s reports, as well as any reports for a date range which includes today: current week, current month, etc.
- Your Matomo database size will grow over time, this is normal. Matomo will delete archives that were processed for incomplete periods (i.e. when you archived a week in the middle of this week), but will not delete other archives. This means that you will have archives for every day, every week, every month and every year in the MySQL tables. This ensures a very fast UI response and data access, but does require disk space.
- Matomo archiving for today’s reports is not incremental: running the archiving several times per day will not lower the memory requirement for weeks, months or yearly archives. Matomo will read all logs for the full day to process a report for that day.
- Once a day/week/month/year is complete and has been processed, it will be cached and not re-processed by Matomo.
- If you don’t set up archiving to run automatically, archiving will occur when a user requests a Matomo report. This can be slow and provide a bad user experience (users would have to wait N seconds). This is why we recommend that you set up auto-archiving for medium to large websites (click for more information) as explained above.
-
By default, when you disable browser triggers for Matomo archiving, it does not completely disable the trigger of archiving as you might expect. Users browsing Matomo will still be able to trigger processing of archives in one particular case: when a Custom segment is used. To ensure that users of your Matomo will never trigger any data processing, in your config.ini.php file you must add the following setting below the
[General]
category:; disable browser trigger archiving for all requests (even those with a segment) browser_archiving_disabled_enforce = 1
Help for core:archive command
Here is the help output for this command:
$ ./console help core:archive
Usage:
core:archive [--url="..."] [--skip-idsites[="..."]] [--skip-all-segments] [--force-idsites[="..."]] [--skip-segments-today] [--force- periods[="..."]] [--force-date-last-n[="..."]] [--force-date-range[="..."]] [--force-idsegments="..."] [--concurrent-requests-per-website[="..."]] [--concurrent-archivers[="..."]] [--max-websites-to-process="..."] [--max-archives-to-process="..."] [--disable-scheduled-tasks] [--accept-invalid-ssl-certificate] [--php-cli-options[="..."]] [--force-all-websites] [--force-report[="..."]]
Options:
--url Forces the value of this option to be used as the URL to Matomo.
If your system does not support archiving with CLI processes, you may need to set this in order for the archiving HTTP requests to use the desired URLs.
--skip-idsites If specified, archiving will be skipped for these websites (in case these website ids would have been archived).
--skip-all-segments If specified, all segments will be skipped during archiving.
--force-idsites If specified, archiving will be processed only for these Sites Ids (comma separated)
--skip-segments-today If specified, segments will be only archived for yesterday, but not today. If the segment was created or changed recently, then it will still be archived for today and the setting will be ignored for this segment.
--force-periods If specified, archiving will be processed only for these Periods (comma separated eg. day,week,month,year,range)
--force-date-last-n Deprecated. Please use the "process_new_segments_from" INI configuration option instead.
--force-date-range If specified, archiving will be processed only for periods included in this date range. Format: YYYY-MM-DD,YYYY-MM-DD
--force-idsegments If specified, only these segments will be processed (if the segment should be applied to a site in the first place).
Specify stored segment IDs, not the segments themselves, eg, 1,2,3.
Note: if identical segments exist w/ different IDs, they will both be skipped, even if you only supply one ID.
--concurrent-requests-per-website When processing a website and its segments, number of requests to process in parallel (default: 3)
--concurrent-archivers The number of max archivers to run in parallel. Depending on how you start the archiver as a cronjob, you may need to double the amount of archivers allowed if the same process appears twice in the `ps ex` output. (default: false)
--max-websites-to-process Maximum number of websites to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
--max-archives-to-process Maximum number of archives to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
--disable-scheduled-tasks Skips executing Scheduled tasks (sending scheduled reports, db optimization, etc.).
--accept-invalid-ssl-certificate It is _NOT_ recommended to use this argument. Instead, you should use a valid SSL certificate!
It can be useful if you specified --url=https://... or if you are using Matomo with force_ssl=1
--php-cli-options Forwards the PHP configuration options to the PHP CLI command. For example "-d memory_limit=8G". Note: These options are only applied if the archiver actually uses CLI and not HTTP. (default: "")
--force-all-websites Force archiving all websites.
--force-report If specified, only processes invalidations for a specific report in a specific plugin. Value must be in the format of "MyPlugin.myReport".
--help (-h) Display this help message
--quiet (-q) Do not output any message
--verbose (-v|vv|vvv) Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
--version (-V) Display this application version
--ansi Force ANSI output
--no-ansi Disable ANSI output
--no-interaction (-n) Do not ask any interactive question
--matomo-domain Matomo URL (protocol and domain) eg. "http://matomo.example.org"
--xhprof Enable profiling with XHProf
Help:
* It is recommended to run the script without any option.
* This script should be executed every hour via crontab, or as a daemon.
* You can also run it via http:// by specifying the Super User &token_auth=XYZ as a parameter ('Web Cron'),
but it is recommended to run it via command line/CLI instead.
* If you have any suggestion about this script, please let the team know at feedback@matomo.org
* Enjoy!
Note: The core command option –url is deprecated since Matomo 5. It has been replaced with –matomo-domain.
Making sense of the core:archive
output
The core:archive
output log displays useful information about the archiver process, in particular which websites and segments are being processed. The output shows in particular:
- which website ID is currently being archived:
INFO [2020-03-31 21:16:29] 23146 Will pre-process for website id = 1, period = month, date = last3
. - how many segments there are for this website, in this example there are 25 segments:
INFO [2020-03-31 21:16:29] 23146 - pre-processing segment 1/25 countryName!=Algeria March 29, 2022
. - how many websites are left to be processed from this archiver’s queue of websites, in this example it has finished processing 2 out of 3 websites:
INFO [2020-03-31 21:17:07] 23146 Archived website id = 3, 4 API requests, Time elapsed: 18.622s [2/3 done]
. - if you’re running multiple
core:archive
processes using--concurrent-archivers
you can tell the different concurrent archivers from each other by looking at the number after the timestamp:INFO [2020-03-31 21:17:07] 23146 [...]
. Each different concurrent archiver run will have a different number. So if you grep for this number across your logs you can find the output for this particular core:archive thread. You can also set--concurrent-archivers
to-1
which indicated unlimited concurrent archiver.
If you have any question or feedback, please use the feedback button below and we’ll get do our best to back to you.