Why are there duplicates in data exported using the Data Warehouse Connector?
Duplicated data appearing in exports, using the Data Warehouse Connector, may occur when visitor information changes after the export has completed. This can happen in setups using server-side or offline tracking.
When a visitor’s browser goes offline and later reconnects, or when data is sent separately from the visitor’s browser session (through backend processes that record actions such as purchases), the Visit ID (idvisit) remains the same, but the Visitor ID (idvisitor) may change.
Because the Data Warehouse Connector performs exports on a fixed schedule, any new or updated data that arrives in Matomo after an export run, will be included in the next scheduled export. When this happens, both the earlier and the updated record for the same visit may appear across different exports, leading to duplicate entries.
Example: The visit originally existed in the first export and then later, a delayed or server-side event was attributed to that same visit. In some cases, changes in cookie values or User ID tracking can also cause Matomo to assign a new Visitor ID, which may update or merge visit records. When this happens, Matomo recalculates visit data (such as visit_total_actions or visit_last_action_time) after receiving the new event. The next scheduled export then captures the updated version of that visit record.
Deduplicate the data
If duplicate records appear across exports, the Visit ID can be used as a stable reference to identify and merge those records. For most analytics or BI use cases, you can keep only the latest record per Visit ID. This ensures your warehouse reflects the most accurate state of Matomo data.
- Use Visit ID (
idvisit) as the primary key for merging and Visitor ID (idvisitor) as a contextual field only. - You could also add a simple timestamp field to track when the export was generated.
This approach ensures data consistency from exports while having the flexibility to review visit updates over time (if using a timestamp).