Monitoring pacemaker with Zabbix

Monitoring pacemaker with Zabbix

In the article I will describe how the monitor a pacemaker cluster resource manager of a Linux cluster with the Zabbix monitoring system. The Zabbix server requests and analyses data from the SNMP agent pacemaker.

The Zabbix Template

Based on the MIB for the Linux Cluster Resource Manager Software pacemaker, I developed a Template for Zabbix to monitor the cluster and set Triggers in case. I want to check two important parameters of the cluster:

  • How many nodes are online? If not all nodes in the cluster are online I set a Warning trigger and if no nodes are online I set a Disaster trigger.
  • There are no resources with failures. One failure in any resource triggers a Warning and more failures trigger a High Event.

Online Nodes

The SNMP agent of pacemaker delivers the sys4PcmkOnlineNodes OID. This is the number of nodes in the online state. The total number of nodes is given in sys4PcmkTotalNodes. The triggers in the template compare both values.

Resource Failures

During normal operation resources in a cluster should not have any errors. Any failcounter in the cluster is sign for problems that the admins has to take care of. So the total number if failures in a cluster sys4PcmkResourceFailures makes a perfect target for monitoring. The items in the templates read the OID and trigger for one ore more failures.

Please find and download the Zabbix Cluster Template here .

Michael Schwartzkopff, 23 Jan 2014