Analysing a slow archiver by auditing the subprocess times with XHProf
This guide explains how to troubleshoot a slow Matomo On-Premise archiving process using XHProf. It focuses on profiling a single archiveReports request (instead of core:archive) to identify performance bottlenecks.
Once bottlenecks are identified, you can reduce archiving time by disabling or simplifying the reports causing the most load (for example: segments, funnels, or custom reports).
Note: This is a minimal guide. It assumes XHProf is already installed and working on your Matomo server.
Introduction
On high-traffic Matomo instances, archiving can reach a point where it no longer completes within 24 hours. At this stage, simply adding more resources (CPU, memory, disk speed) often provides limited improvement, because the bottleneck typically shifts to the database layer.
Several features can significantly increase archiving time, including:
- Segments
- Funnels
- Reports with high cardinality (e.g. unique visits)
- Custom reports
Simpler things to try
You should try quicker remedies before running the profiler.
- Consider deleting some segments, using fewer conditions on the segments, or even just changing the segment definitions from “
contains” to “starts with” or “is exactly”. - Also disable unused custom reports, or one or more funnels.
- Disable unique visits by month or unique visits by year.
- You can next enable MySQL slow query logging for one day, to identify expensive queries.
- You can then try throwing more resources at the problem. Deploy on your database server and PHP server larger/more CPU cores, more I/O capacity.
If the issue persists, you can profile the PHP execution of archiving using XHProf. This guide focuses on that approach.
What is XHProf?
XHProf is a light-weight, hierarchical, function-level profiler for PHP originally developed by Facebook and open-sourced in March 2009.
It operates as a passive profiler implemented as a C-based PHP Zend extension, designed to collect performance data with minimal impact on application behavior, making it suitable for production environments. It is primarily used for identifying performance bottlenecks by tracking function-level call counts and metrics such as wall (elapsed) time, CPU time, and memory usage.
Why profile archiveReports instead of core:archive?
The core:archive command mainly orchestrates archiving work. The actual heavy processing happens in child requests, specifically CoreAdminHome.archiveReports.
Profiling core:archive alone will not give you useful insights into performance bottlenecks. Instead, you should profile a single archiveReports request directly.
Before you start
Make sure all of the following are true:
- XHProf is installed and enabled for PHP CLI. Many guides exist, including this, at the Drupal foundation: https://www.drupal.org/docs/develop/development-tools/xhprof-code-profiler.
enable_php_profiler = 1is set inconfig/config.ini.phpxhprof.output_diris configured and writable.- You can run Matomo console commands.
- Archiving cron jobs are temporarily stopped (to avoid overlapping work).
1. Pick one affected site and date
- Choose:
- One site (
IDSITE) - One date (
YYYY-MM-DD)
- One site (
- Pick a combination that is known to be slow during archiving.
- Replace the placeholders below with your values:
IDSITE: the affected site IDYYYY-MM-DD: the date to archivehttps://matomo.example.com: your Matomo URL
2. Invalidate archives first
This step is required. If you skip it, Matomo may reuse existing archives and the profile will not be meaningful.
./console core:invalidate-report-data --dates=YYYY-MM-DD --sites=IDSITE --periods=day
Optional: verify that invalidation is queued
./console diagnostics:archiving-queue
3. Run one profiled climulti:request
php ./console climulti:request --superuser --matomo-domain=https://matomo.example.com 'module=API&method=CoreAdminHome.archiveReports&idSite=IDSITE&period=day&date=YYYY-MM-DD&format=json&trigger=archivephp&xhprof=1'
Key details:
--superuseris required forarchiveReports.trigger=archivephpensures the request behaves like cron archiving.xhprof=1enables profiling and outputs the result URL.
4. Collect the result
If successful, the command will output a URL like:
Profiler report is available at:
https://matomo.example.com/vendor/lox/xhprof/xhprof_html/?source=piwik&run=...
Open this URL in your browser and review or share it.
Interpreting results (practical guidance)
When reviewing the XHProf report:
- Sort by CPU time (exclusive) first.
- Look for:
- Expensive report generation functions
- Repeated calls with high cost
- Identify which features they correspond to (segments, custom reports, etc.)
Once identified, consider:
- Disabling or simplifying those reports.
- Reducing segmentation complexity.
- Limiting high-cardinality dimensions
Yes, this may reduce reporting detail. That’s the tradeoff for an archiver that actually finishes.
Notes:
- This profiles a single site, date, and period. That’s intentional. Start small.
- If the run is still fast, you likely picked the wrong site or date.
- To repeat the test, invalidate the same date again before rerunning.