How to set up BigQuery & Data Warehouse Exports
Matomo is a powerful analytics suite that enables you to extract, manipulate, and enrich your data by exporting to a data warehouse platform, such as Google BigQuery. If you are an On-Premise user, you will have direct access to your own databases, and Cloud customers can request to activate an add-on that enables fully hosted and automated exports of your Matomo analytics data.
This guide explains how to enable and configure the export in Matomo and how to import the data from Amazon S3 to BigQuery. With this feature, you can also export raw data to other supported data warehouse platforms.
Enable BigQuery & Data Warehouse Exports
This feature is available on request and requires a paid subscription that you can add to any Matomo Cloud plan. If you are a Cloud customer and want to enable BigQuery and Data Warehouse Exports, please contact support.
Configure Matomo for Data Warehouse Exports
The steps to enabling the export of your Matomo Analytics data to a data warehouse platform are described below.
Select which sites to enable
The first step is enabling the specific sites in Matomo that you would like to export and synchronise. As soon as you have enabled a site for export, it will begin processing data into export format on the backend.
Note: There are hourly data limits in place for this processing, so only enable sites that you plan on exporting to a data warehouse platform to ensure the fastest processing times.
-
Click on the cog icon to access your Administration settings in Matomo.
-
Click on BigQuery & Data Warehouse under the Export menu item.
-
Select the Site Id you’d like to export within the Sync Configuration section by clicking the checkbox.
-
Click the Enable Export button. This will start regular processing of your data into the hosted links shared further down the page in the Export Table Paths section.
How to backfill data for export from Matomo
When you first enable the BigQuery and Data Warehouse exports, it will begin regular processing of your analytics data from the date you enabled the feature. If you would like access to your historic data, you will need to configure the date ranges you’d like to see.
Note: This system uses Amazon S3 on the backend, which has strict data limits. Therefore, it will only process a certain amount of data per hour and the number of days this covers will vary depending on the amount of data your Matomo instance contains.
To ensure that you get access to your most important data as soon as possible, we recommend only enabling smaller date ranges that you plan to actively access in your data warehouse platform to avoid filling your hourly quotas with data that may not be necessary yet.
- Visit the BigQuery & Data Warehouse settings page under the Export menu in the Matomo Admin settings.
- Scroll down to the Export Jobs section.
- Click on the Add Backfills Export Job button.
- Select a start and end date that you want to access in your data warehouse and click on Submit to queue your backfill request.
- Once you have submitted your backfill request, it will enter Queued status. At this stage, you will need to wait for the data to be processed and uploaded to the Export Table Paths shown further down the page.
You will notice that each date is added as a unique job and the processing can take several hours, days or longer depending on the amount of data you have tracked. There are several potential statuses shown for backfill jobs in the Export Jobs section.
- Queued – This means a backfill has been requested for the date and you will need to wait.
- Processing – This means the backfill request is currently processing and should soon be available for importing to your data warehouse platform.
- Finished – This means that data for the specific date has finished processing and therefore the data should be available from the links in the Export Table Paths for use within your data warehouse platform.
How to import to your data warehouse
There are many data warehouse platforms available and we cannot provide specific guides for each. However, BigQuery is one of the most popular and they usually offer free credits for getting started, so we have included a full guide for the Google BigQuery process below. It is likely that other data warehouse tools will follow a similar methodology so if you use another platform, you may be able to use this guide as a reference too.
Select which data to import to your data warehouse
Before you begin importing your Matomo analytics, it is important to make sure you are only importing data that you will access within your data warehouse. Not every site makes use of every Matomo feature and there is no point wasting time setting up imports of empty tables if you don’t use a feature. For this reason, we have organised the tables into two lists below; recommended and optional. Everyone is likely to benefit from the recommended tables whereas the optional tables are likely to be most relevant to people using specific features as described in their listing.
Recommended tables for all users
- log_action - This contains data about the actions available on your site and is recommended for most users.
- log_link_visit_action – This links data from the
log_visit
andlog_action
tables and is recommended for most users. - log_visit – This table contains all visit data for your site and is recommended for most users.
- site – This table contains metadata about your site and is recommended for most users.
- goal - This contains data relating to the Goal Tracking feature. If you don’t use this feature you can safely ignore this table.
- log_conversion – This contains data on conversions and is recommended for most users.
Optional tables
- funnel - This contains data relating to the Funnels feature. If you don’t use this feature you can safely ignore this table.
- log_abtesting - This contains data relating to the AB Testing feature. If you don’t use this feature you can safely ignore this table.
- log_clickid – This contains data relating to the Advertising Conversion Export feature. If you don’t use this feature you can safely ignore this table.
- log_conversion_item - This contains data on ecommerce items as part of Ecommerce tracking in Matomo. If you don’t use this, you can safely ignore this table.
- log_form – This contains data relating to the Form Analytics feature. If you don’t use this, you can safely ignore this table.
- log_form_field - This contains data relating to the Form Analytics feature. If you don’t use this, you can safely ignore this table.
- log_form_page - This contains data relating to the Form Analytics feature. If you don’t use this, you can safely ignore this table.
- log_hsr - This contains data relating to the Heatmaps and Session Recording feature. If you don’t use this, you can safely ignore this table.
- log_hsr_blob - This contains data relating to the Heatmaps and Session Recording feature. If you don’t use this, you can safely ignore this table.
- log_hsr_event - This contains data relating to the Heatmaps and Session Recording feature. If you don’t use this, you can safely ignore this table.
- log_hsr_site - This contains data relating to the Heatmaps and Session Recording feature. If you don’t use this, you can safely ignore this table.
- log_media - This contains data relating to the Media Analytics feature. If you don’t use this, you can safely ignore this table.
- log_media_plays - This contains data relating to the Media Analytics feature. If you don’t use this, you can safely ignore this table.
- site_form - This contains data relating to the Form Analytics feature. If you don’t use this, you can safely ignore this table.
If you aren’t sure exactly which tables you need to include, you can set up imports for all 20 of them. Any tables that are not used will simply appear empty within your data warehouse as they would in the Matomo database itself. You can access deeper documentation on the database schema within our developer documentation.
How to configure your Matomo datasets and tables in Google BigQuery
-
To setup and configure your Matomo datasets, open Google Cloud Console BigQuery using this link: https://console.cloud.google.com/bigquery.
-
Enable the BigQuery API and the BigQuery Data Transfer API.
-
In BigQuery Studio (1), open or create a Project linked to a billing account from the selector at the top left of the screen (2).
-
Click on the three dots within the explorer to reveal a sub-menu and click Create dataset.
-
Define the settings for your data set by providing a descriptive Dataset ID and choose the Region option and select europe-west3 (recommended for the most efficient transfers).
-
Click Create Data Set to save the new setup under your BigQuery project.
-
Click on the three dots next to the dataset and then click Create table.
-
Complete the options as shown in the screenshot below, making sure to select your project and dataset. We recommend using the same table name as in the source data to avoid confusion, for example
log_visit
.
-
You will notice in the image above that the Schema field is toggled to Edit as text. There is very specific text that you should include for each table. Navigate to the schema files available on GitHub and locate the schema for the table you are creating in BigQuery. Copy the schema and paste it into the Schema field in BigQuery's create table form.
-
Only once you have pasted the schema in, click Create Table button to finalise.
-
Repeat steps 7 - 10 for each of the tables that you plan on importing from Matomo.
How to configure data transfers from Matomo to Google BigQuery
-
To get started, visit the BigQuery and Data Warehouse page under the Export menu within Matomo.
-
Within the Credentials section, click on the Generate Credentials button and securely save these details for later.
-
Visit BigQuery in Google Cloud Console.
-
Click on the Data transfers tab and click to Create a transfer.
-
Select Amazon S3 as the Source.
-
Complete the Transfer configuration fields:
- Display name: Something descriptive, for example the dataset and table name.
- Repeat Frequency: You can leave this at the default of repeating every 24 hours.
- Start date and run time: Set this to midnight to ensure full data for the prior day.
- Dataset: This should be the dataset you have created within BigQuery.
- Destination table: This should be a table that you’ve created within BigQuery and if you followed the guide above will share the same name as the table in Matomo.
- Amazon S3 URI: You will need to source this from the Export Table Paths section on the BigQuery and Data Warehouse page in Matomo for the relevant table. The URI looks a little something like the following:
s3://prod-processed-customer-exports/demo.matomo.cloud-id/site/*
- Access key ID: This is the access key you were provided when clicking Generate Credentials on the BigQuery and Data Warehouse page in Matomo.
- Secret access key: This is the access key you were provided when clicking Generate Credentials on the BigQuery and Data Warehouse page in Matomo.
- File format: Select JSON
- Write Disposition: For the majority of tables, you should select
WRITE_APPEND
to ensure that new data is added to the database instead of overwriting prior transfers. However, if you are configuring thesite
,site_form
,funnel
orgoal
tables these can safely be configured to useWRITE_TRUNCATE
.
-
Scroll down to the bottom and click Save to enable the data transfer.
-
To create another transfer, you need to click the + CREATE TRANSFER link at the top of the page and then repeat Steps 9 and 10 for each table that you want to import into BigQuery.
Once you have set up data transfers for all the tables, they should begin regular automated transfers according to your configured schedule from the date you set them up. One thing to keep in mind is that not every table will always contain new data every single day, so it can be normal for a transfer to run without importing new data if nothing has changed.
Note: If you would like to backfill data from prior dates, you will need to ensure that you have configured and completed the backfill process in Matomo. Then within BigQuery, click on the relevant data transfer and there is a link to SCHEDULE BACKFILL at the top right of the page. Clicking this will load a configuration pop-up that lets you select the date range that you would like to backfill.
How to run your first query in Google BigQuery
Once your first BigQuery data transfer has completed, you can verify that your tables contain data by running a query on each of the tables. Full support for BigQuery is beyond the scope of this guide. The official BigQuery documentation can be found here.
One example query you could run to check that records have been imported is:
SELECT * FROM `matomo-project.matomo_export.log_visit` LIMIT 1000
Make sure that you replace matomo_project
with your actual project name, matomo_export
with your dataset name and log_visit
with your table name.
As long as this query returns results, you can be sure that your data has begun transferring. For more help navigating the exported tables, we have provided more examples in our FAQ on how to write SQL queries for the Matomo database.
Matomo Database Schema Text for Google BigQuery
When creating tables for the Matomo data in BigQuery, you will need to define a schema for each of them to allow for successful synchronisation. Currently, the most efficient way of doing this is to copy and paste the database schemas from this GitHub repository as text into the BigQuery editor.