The new import_logs.py is now included in Matomo core distribution (as of Matomo 1.7.2) in piwik/misc/log-analytics/import_logs.py
Note: Apache2Matomo described below is deprecated: we recommend using import_logs.py (click for more info!)
Log files are known to contain a wealth of information about activity on a website, and are usually analyzed with tools such as AWStats or Webalizer. Being able to transfer it to Matomo, a powerful web analysis tool, can greatly enhance data mining and presentation. This, in turn, means more control over your web property, better informed decisions and greater potential for optimalization.
Importing visits, pages, Goal conversions from logs is very fast, processing thousands of log lines per second and can also read & process your log files in real time. Matomo (Piwik) reports after import have a few missing data points compared to the standard Javascript code and standard Matomo reports. However, compared to older log analyser softwares such as Webalizer or AWStats, Matomo reports are sharp, easy to understand, and lets you focus on your analysis goals!
This page contains the following sections: Apache2Matomo requirements, How to use guide, List of missing reports when using log files, Performance of the script, and Credits.
Apache2Matomo Requirements
- access to Matomo installation
- access to Apache logs with read privileges (you can specify log format in settings.py)
- Python 2.6 with MySQLdb, GeoIP for Python and httpagentparser
How to import Apache logs in Matomo?
Follow these steps for a test export with Apache2Piwik:
- Important: create backup of your Matomo MySQL Database.
- create `settings.py` as a copy of settings.py.sample and edit MySQL Matomo Database configuration
- execute apache2piwik.py – see examples below
Example 1 – importing log file, all settings set in settings.py file:
$ python2.6 ./apache2piwik.py
Started processing /path/to/file/logfile1 file...
Finished in 2m16s.
Started processing /path/to/file/logfile2 file...
Finished in 2m59s.
$ python2.6 ./apache2piwik.py start
$ python2.6 ./apache2piwik.py stop
$ python apache2piwik.py -g
- Images files are automatically ignored. You can customize ignored extensions in settings.py file. You can also ignore specific logs with regular expressions there
- Search bots are not excluded at this stage. We might add a feature to exclude bots in a future version.
- When you import data in the past, or when you want to reprocess your reports from the logs, you can delete piwik_archive_* tables. See more information in this FAQ.
- Apache2Matomo imports data into the idsite specified in settings.py. You can override this by “-i [idsite]” command line parameter
Reporting differences between Server Logs and Javascript code
Apache log import Performance
- If your URLs contain session id, add a regular expression in URL_REGEXPR directive in settings.py to cut it out
- Do you have any monitoring or cron scripts that call some URLs every X minutes?
If so, add them to IGNORED_LOGS directive in settings.py - The script is designed more for a “single website” use case, or for a few websites. We haven’t tested in a “web hosting” environment type load at this stage, but we hope to in the future.
Credits
The project has been developed initially for CLANMO GmbH, an award-winning mobile interactive agency from Köln, Germany.
If you have any suggestion, bug report, or feedback about Apache2Piwik, please leave in a comment in above page directly.