How to Set up Auto-Archiving of Your Reports FAQ - On-Premise

If your website has more than a few hundreds visits per day (bravo!), waiting for Matomo to process your data may take a few minutes. The best way to avoid these waiting times is to set up a cron job on your server so that your data is automatically processed every hour.

If you are using Matomo for WordPress, you don’t need to do this as it utilises the WP Cron.
If you are on the Matomo Cloud, this is automatically taken care of for you as well.
But if you are using Matomo On Premises, read on!

To automatically trigger the Matomo archives, you should set up a script that will execute every hour.

There are instructions below for Linux/Unix systems using a crontab, but also instructions for Windows users with the Windows Task Scheduler, and for tools such as CPanel. If you don’t have access to the server, you can also setup a web cron.

Linux/Unix: How to Set up a Crontab to Automatically Archive the Reports.

A crontab is a time-based scheduling service in a Unix-like server. The crontab requires php-cli or php-cgi installed. You will also need SSH access to your server in order to set it up. Let’s create a new crontab with the text editor nano:

nano /etc/cron.d/matomo-archive

and then add the lines:

MAILTO="youremail@example.com"
5 * * * * www-data /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /home/example/matomo-archive.log

Until now, your Matomo interface has been initiating this archiving process in an ad hoc fashion: users were clicking on the browser to view a report and, in many cases, the report needed to be calculated on demand. Unless your amount of data is very small, this quickly becomes unsustainable: users will experience a sluggish experience and my encounter many instances where instead of data they see “Oops, there was an error with the request”. Worse, they may repeatedly click the interface, which just exacerbates the problem and makes them think Matomo is not working.

Therefore it is important that you disable this behavior. In your override file (config.ini.php) add this line:

[General]
browser_archiving_disabled_enforce = 1

The new setting will be respected immediately: you don’t need to reset anything.

That’s it, you have successfully set up scheduled archiving.

The Matomo archive example script shown above will run every hour (at 5 minutes past). Generally, it completes in a few minutes when your site is new. As you acquire more data, it may take a couple dozen minutes. On the very largest sites it can take hours, or up to a day on extremely large servers that are tracking multiple sites, all with high traffic. To achieve those levels of performance you will want to consult the guides Requirements for Matomo On-Premise as well as Scalable Matomo Setup. If your execution time takes an hour or more, you will need to rewrite the first line of your cron job so that it happens every 2 hours, every 6 hours, or even once per day. One common practice is to write a cron to run every 12 hours for a high traffic site, and then add a cron that runs once per hour for all of your low-traffic sites.

If the archiving task executes completely and without errors, your reports will periodically refresh on schedule. Executing this refresh process is called archiving in our documentation. For clarity on why this refresh process is required, read more about the relationship of raw data to report data.

Important note: your most important job as an administrator will be to check weekly, or daily, for any messages which arise during the archiving process. These will be sent to the email address you specify in the last line of your cron job. They will also be logged in the default location set in the global.ini.php file, which is

[log]
logger_file_path = tmp/logs/matomo.log.

Or set your preferred logging location by adding an override in your config.ini.php file:

[log]
logger_file_path = example/path/to/your/preferred/location

Because you will no longer need the archiving to be triggered by users interacting with the browser, you should disable this feature

Congratulations: you have successfully enabled scheduled archiving of your Matomo data.

Now, if you want to understand the console core:archive command in more detail, and see some special cases, either check our developers guide, or read on below.

Breakdown of the parameters:

MAILTO="youremail@example.com" If there is an error during the script execution, the script output and error messages will be sent to the youremail@example.com address.
www-data is the user that the cron job will be executed by. The user is sometimes “apache”. It is recommended to run your crontab as the same user as your web server user (to avoid file permissions mismatch).
/usr/bin/php is the path to your PHP executable. It varies depending on your server configuration and operating system. You can execute the command “which php” or “which php” in a linux shell, to find out the the path of your PHP executable. If you don’t know the path, ask your web host or sysadmin.
/path/to/matomo/console is the path to your Matomo app on your server. For example it may be /var/www/matomo/console.
--url=http://example.org/matomo/ is the only required parameter in the script, which must be set to your Matomo base URL eg. http://analytics.example.org/ or http://example.org/matomo/
> /home/example/matomo-archive.log is the path where the script will write the output. You can replace this path with /dev/null if you prefer not to log the last Matomo cron output text. The script output contains useful information such as which websites are archived, how long it takes to process for each date & website, etc. This log file should be written in a location outside of your web-server so that people cannot view it via their browser (because this log file will contain some sensitive information about your Matomo installation). You can also replace > by >> in order to append the script output to the log file, rather than overwrite it on each run (but then we recommend you rotate this log file or delete it eg. once a week).
2> /home/example/matomo-archive-errors.log is the optional path where the script will write the error messages. If you omit this from the cron tab, then errors will be emailed to your MAILTO address. If you write this in the crontab, then errors will be logged in this specified error log file. This log file should be written in a location outside of your web-server so that people cannot view it via their browser (because this log file will contain some sensitive information about your Matomo installation).

Description of the ‘linux cron’ utility: The cron utility uses two different types of configuration files: the system crontab and user crontabs. The only difference between these two formats is the sixth field.

In the system crontab, the sixth field is the name of a user for the command to run as. This gives the system crontab the ability to run commands as any user.
In a user crontab, the sixth field is the command to run, and all commands run as the user who created the crontab; this is an important security feature.

If you set up your crontab as a user crontab, you would instead write:

5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null

This cron job will trigger the day/week/month/year archiving process at 5 minutes past every hour. This will make sure that when you visit your Matomo dashboard, the data has already been processed; Matomo will load quickly.

Test the cron command

Make sure the crontab will actually work by running the script as the crontab user in the shell:

su www-data -s /bin/bash -c "/usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/"

You should see the script output with the list of websites being archived, and a summary at the end stating that there was no error.

Launching multiple archivers at once

If you have multiple sites you may be interested in running multiple archivers in parallel for faster archiving. We recommend not starting them at the same time but launch them each a few seconds or minutes apart to avoid concurrency issues. For example:

5 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null
6 * * * * /usr/bin/php /path/to/matomo/console core:archive --url=http://example.org/matomo/ > /dev/null

In the above example one archiver will start at the minute 5 of each hour, the other starts one minute later. Alternatively, you can also start multiple archivers at the same time using a script which you then execute regularly through a cronjob.

CONCURRENT_ARCHIVERS=2
for i in $(seq 1 $CONCURRENT_ARCHIVERS)
do
    (sleep $i && /path/to/matomo/console core:archive & ) 
done

Windows: How to Set up Auto-Archiving Using Windows Scheduler

-> Please see our dedicated FAQ for setting up a scheduled task in Windows.

Plesk: How to Set up the Cron Script using Plesk

Learn more about installing Matomo on Plesk and configuring the archiving crontab in the Plesk Matomo guide.

CPanel: How to Set up the Cron Script Using CPanel

It is easy to set up automatic archiving if you use a user interface such as CPanel, Webmin or Plesk. Here are the instructions for CPanel:

Log in to CPanel for the domain with the Matomo installation
Click on “Cron Jobs”
Leave email blank
In ‘Minutes’ put 00 and leave the rest blank.
You then need to paste in the path to the PHP executable, then the path to the Matomo /console script, then the parameter with your Matomo base URL –url=matomo.example.org/
Here is an example for a Hostgator install (in this example you would need to change ‘yourcpanelsitename’ to whatever your particular domains cpanel username is)
```
/usr/local/bin/php -f /home/yourcpanelsitename/public_html/matomo/console core:archive --url=example.org/matomo/ > /home/example/matomo-archive-output.log
```

“yourcpanelsitename” tends to be the first eight letters of your domain (unless you changed it when you set up your cpanel account)
6. Click “Add New Cron Job”

Matomo will process your reports automatically at the hour.

Web Cron When Your Web Host Does Not Support Cron Tasks

If possible, we highly recommend that you run a cron or scheduled task. However, on some shared hosting, or on particular server configurations, running a cron or scheduled task may not be easy or possible.

Some web hosts let you set up a web cron, which is a simple URL that the host will automatically visit at a scheduled time. If your web host lets you create a web cron, you can input the following URL in their hosting interface:

https://matomo.your-server.example/path/to/matomo/misc/cron/archive.php?token_auth=XYZ

Replace the XYZ by the super user 32 characters token_auth. To find the token_auth, log in as a super user in Matomo, click on Administration link in the top menu, go to Personal and click Security. Scroll down below and you’ll find where to create a new Token_auth.
Notes:

For security, if possible we recommend you POST the token_auth parameter to the URL https://matomo.your-server.example/path/to/matomo/misc/cron/archive.php (instead of sending the token_auth as a GET parameter)
You can test the web cron by pasting the URL in your browser, wait a few minutes for processing to finish and then check the output.
The web cron should be triggered at least once per hour. You may also use a ‘Website Monitoring’ service (free or paid) to automatically request this page every hour.

Important Tips for Medium to High Traffic Websites

Disable browser triggers for Matomo archiving and limit Matomo reports to updating every hour

After you have set up the automatic archive script as explained above, you can set up Matomo so that requests in the user interface do not trigger archiving, but instead read the pre-archived reports. Login as the superuser, click on Matomo settings (Administration) System > General Settings, and select:

Archive reports when viewed from the browser: No
Archive reports at most every X seconds : 3600 seconds

Click save to save your changes. Now that you have set up the archiving cron and changed these two settings, you can enjoy fast pre-processed near real-time reports in Matomo!

Today’s statistics will have a one hour lifetime, which ensures the reports are processed every hour (near real time)

Increase PHP Memory Limit

If you receive this error:

Fatal error: Allowed memory size of 16777216 bytes exhausted (tried to allocate X bytes)

you must increase the memory allocated to PHP. To give Matomo enough memory to process your web analytics reports, increase the memory limit. Sites with less data or fewer features enabled can use 512M or 2G. (If the problem persists, we recommend to increase the setting further. 8G is a common size for a medium to large Matomo instance.)

memory_limit = 512M

To find where is your php.ini file on your server, you can follow the following steps: create a test.php file and add the following code:

 <?php phpinfo(); ?>

and open it in browser, it will show the file which is actually being read by PHP running on your webserver. It will also show your currently set max_execution_time value.

More High Traffic Server Tips!

It is possible to track millions of pages per month on hundreds or thousands of websites using Matomo. Once you have set up cron archiving as explained above, there are other important and easy steps to improve Matomo performance.

For more information, see How to configure Matomo for speed.

More Information About Matomo Archiving

If you run archiving several times per day, it will re-archive today’s reports, as well as any reports for a date range which includes today: current week, current month, etc.
Your Matomo database size will grow over time, this is normal. Matomo will delete archives that were processed for incomplete periods (i.e. when you archived a week in the middle of this week), but will not delete other archives. This means that you will have archives for every day, every week, every month and every year in the MySQL tables. This ensures a very fast UI response and data access, but does require disk space.
Matomo archiving for today’s reports is not incremental: running the archiving several times per day will not lower the memory requirement for weeks, months or yearly archives. Matomo will read all logs for the full day to process a report for that day.
Once a day/week/month/year is complete and has been processed, it will be cached and not re-processed by Matomo.
If you don’t set up archiving to run automatically, archiving will occur when a user requests a Matomo report. This can be slow and provide a bad user experience (users would have to wait N seconds). This is why we recommend that you set up auto-archiving for medium to large websites (click for more information) as explained above.
By default, when you disable browser triggers for Matomo archiving, it does not completely disable the trigger of archiving as you might expect. Users browsing Matomo will still be able to trigger processing of archives in one particular case: when a Custom segment is used. To ensure that users of your Matomo will never trigger any data processing, in your config.ini.php file you must add the following setting below the [General] category:
```
; disable browser trigger archiving for all requests (even those with a segment)
browser_archiving_disabled_enforce = 1
```

Help for core:archive command

Here is the help output for this command:

$ ./console help core:archive
Usage:
 core:archive [--url="..."] [--skip-idsites[="..."]] [--skip-all-segments] [--force-idsites[="..."]] [--skip-segments-today] [--force-  periods[="..."]] [--force-date-last-n[="..."]] [--force-date-range[="..."]] [--force-idsegments="..."] [--concurrent-requests-per-website[="..."]] [--concurrent-archivers[="..."]] [--max-websites-to-process="..."] [--max-archives-to-process="..."] [--disable-scheduled-tasks] [--accept-invalid-ssl-certificate] [--php-cli-options[="..."]] [--force-all-websites] [--force-report[="..."]]

Options:
 --url                              Forces the value of this option to be used as the URL to Matomo. 
                                    If your system does not support archiving with CLI processes, you may need to set this in order for the   archiving HTTP requests to use the desired URLs.
 --skip-idsites                     If specified, archiving will be skipped for these websites (in case these website ids would have been archived).
 --skip-all-segments                If specified, all segments will be skipped during archiving.
 --force-idsites                    If specified, archiving will be processed only for these Sites Ids (comma separated)
 --skip-segments-today              If specified, segments will be only archived for yesterday, but not today. If the segment was created or changed recently, then it will still be archived for today and the setting will be ignored for this segment.
 --force-periods                    If specified, archiving will be processed only for these Periods (comma separated eg. day,week,month,year,range)
 --force-date-last-n                Deprecated. Please use the "process_new_segments_from" INI configuration option instead.
 --force-date-range                 If specified, archiving will be processed only for periods included in this date range. Format: YYYY-MM-DD,YYYY-MM-DD
 --force-idsegments                 If specified, only these segments will be processed (if the segment should be applied to a site in the first place).
                                    Specify stored segment IDs, not the segments themselves, eg, 1,2,3. 
                                    Note: if identical segments exist w/ different IDs, they will both be skipped, even if you only supply one ID.
 --concurrent-requests-per-website  When processing a website and its segments, number of requests to process in parallel (default: 3)
 --concurrent-archivers             The number of max archivers to run in parallel. Depending on how you start the archiver as a cronjob, you  may need to double the amount of archivers allowed if the same process appears twice in the `ps ex` output. (default: false)
 --max-websites-to-process          Maximum number of websites to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
 --max-archives-to-process          Maximum number of archives to process during a single execution of the archiver. Can be used to limit the process lifetime e.g. to avoid increasing memory usage.
 --disable-scheduled-tasks          Skips executing Scheduled tasks (sending scheduled reports, db optimization, etc.).
 --accept-invalid-ssl-certificate   It is _NOT_ recommended to use this argument. Instead, you should use a valid SSL certificate!
                                    It can be useful if you specified --url=https://... or if you are using Matomo with force_ssl=1
 --php-cli-options                  Forwards the PHP configuration options to the PHP CLI command. For example "-d memory_limit=8G". Note:  These options are only applied if the archiver actually uses CLI and not HTTP. (default: "")
 --force-all-websites               Force archiving all websites.
 --force-report                     If specified, only processes invalidations for a specific report in a specific plugin. Value must be in the format of "MyPlugin.myReport".
 --help (-h)                        Display this help message
 --quiet (-q)                       Do not output any message
 --verbose (-v|vv|vvv)              Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
 --version (-V)                     Display this application version
 --ansi                             Force ANSI output
 --no-ansi                          Disable ANSI output
 --no-interaction (-n)              Do not ask any interactive question
 --matomo-domain                    Matomo URL (protocol and domain) eg. "http://matomo.example.org"
 --xhprof                           Enable profiling with XHProf

Help:
 * It is recommended to run the script without any option.
 * This script should be executed every hour via crontab, or as a daemon.
 * You can also run it via http:// by specifying the Super User &token_auth=XYZ as a parameter ('Web Cron'),
   but it is recommended to run it via command line/CLI instead.
 * If you have any suggestion about this script, please let the team know at feedback@matomo.org
 * Enjoy!

Note: The core command option –url is deprecated since Matomo 5. It has been replaced with –matomo-domain.

Making sense of the `core:archive` output

The core:archive output log displays useful information about the archiver process, in particular which websites and segments are being processed. The output shows in particular:

which website ID is currently being archived: INFO [2020-03-31 21:16:29] 23146 Will pre-process for website id = 1, period = month, date = last3.
how many segments there are for this website, in this example there are 25 segments: INFO [2020-03-31 21:16:29] 23146 - pre-processing segment 1/25 countryName!=Algeria March 29, 2022.
how many websites are left to be processed from this archiver’s queue of websites, in this example it has finished processing 2 out of 3 websites: INFO [2020-03-31 21:17:07] 23146 Archived website id = 3, 4 API requests, Time elapsed: 18.622s [2/3 done].
if you’re running multiple core:archive processes using --concurrent-archivers you can tell the different concurrent archivers from each other by looking at the number after the timestamp: INFO [2020-03-31 21:17:07] 23146 [...]. Each different concurrent archiver run will have a different number. So if you grep for this number across your logs you can find the output for this particular core:archive thread. You can also set --concurrent-archivers to -1 which indicated unlimited concurrent archiver.

If you have any question or feedback, please use the feedback button below and we’ll get do our best to back to you.

Next FAQ: Is there a Video that explains how to Install Matomo?

Previous FAQ: Installing Matomo On-Premise