Access logs from Træfik reverse-proxy are collected via a side-car process called fluentbit.
It pushes the logs to Monit Logs infrastructure for later processing by Logstash for filtering and enrichment running on Monit Marathon.
Eventually, logs are then pushed to HDFS (
/project/monitoring/archive/s3/logs) and to Elasticsearch for storage and visualization.
fluentbit on S3 RadosGWs
Since late April 2022, we use fluentbit on RadosGWs+Træfik frontends as it is much more gentle on memory than Logstash (which we were using previously).
fluentbit tails the log files produced by Træfik (both HTTP access logs and Træfik daemon logs), add a few fields and context through metadata, and pushes the records to the Monit Logs infrastructure at URI
monit-logs-s3.cern.ch:10013/s3 using TLS encryption.
fluentbit on the RadosGWs+Træfik frontends is configured to tail two input files, namely the access (
/var/log/traefik/access.log) and the daemon (
/var/log/traefik/service.log) logs of Træfik. Logs from the access (daemon) file are tagged as
traefik.service.*), labelled as
s3_daemon). Before sending to the Monit infrastructure, the message is prepared to define the payload data and metadata (see monit.lua):
s3(used to build path on HDFS) -- must be whitelisted on the Monit infra;
typedefines if the logs are access or daemon (used to build path on HDFS);
index_prefixdefines the index for the logs (is used by Logstashon Monit Marathon and on Elasticsearch).
Logstash on Monit Marathon
Logstash is the tool that reads the aggregated log stream from Kafka, does most of the transformation and writes to Elasticsearch.
This Logstash process runs in a Docker container on the Monit Marathon cluster (see Applications --> storage --> s3logs-to-es).
For debugging purposes,
stderr of the container are available on monit-spark-master.cern.ch:5050/ -- They do not work from Marathon.
The Dockerfile, configuration pipeline, etc., are stored in s3logs-to-es.
This Logstash instance:
- removes the additional fields introduced by the Monit infrastructure (metadata unused by us)
- parses the original message as json document
- adds costing information
- adds geographical information of the client IP (geoIP)
- copies a subset of fields relevant for CSIR to a different index
- ...and pushes the results (full logs, and CSIR stripped version) to Elasticsearch
We finally have our dedicated Elasticsearch instance managed by the Elasticsearch Service.
There's not much to configure from our side, just a few useful links and the endpoint config repository:
Data is kept for:
- 10 days on fast SSD storage, local to the ES cluster
- other 20 days (30 total) on ceph storage
- 13 months (stripped-down version, some fields are filtered out -- see below) for CSIR purposes
Indexes on ES must start with
ceph_s3. This is the only whitelisted pattern, and hence the only one allowed.
We currently use different indexes:
- ceph_s3_access: Access logs for Gabe (s3.cern.ch)
- ceph_s3_daemon: Traefik service logs for Gabe
- ceph_s3_access-csir: Stripped down version of Gabe access logs for CSIR, retained for 13 months
- ceph_s3_fr_access: Access logs of Nethub (s3-fr-prevessin-1.cern.ch)
- ceph_s3_fr_daemon: Traefik service logs for Nethub
- ceph_s3_fr_access-csir: Stripped down version of Nethub access logs for CSIR, retained for 13 months
ES is also a data source for Monit grafana dashboards:
- Grafana uses basic auth to ES with user
ceph_ro:<password>(The password is stored in Teigi:
- ES must have the internal user
ceph_roconfigured with permissions to read
HDFS is solely used as a storage backed to store the logs for 13 months for CSIR purposes. As of July 2021, HDFS stores the full logs (to be verified if they do not eat too much space on HDFS). To check/read logs on HDFS, you must have access to the HDFS cluster (see prerequisites) and from lxplus
source /cvmfs/sft.cern.ch/lcg/views/LCG_99/x86_64-centos7-gcc8-opt/setup.sh source /cvmfs/sft.cern.ch/lcg/etc/hadoop-confext/hadoop-swan-setconf.sh analytix 3.2 spark3 kinit hdfs dfs -ls /project/monitoring/archive/s3/logs