Monitoring pacemaker with OpenNMS

Monitoring pacemaker with OpenNMS

In the article I will describe how the monitor a pacemaker cluster resource manager of a Linux cluster with the OpenNMS monitoring system. The OpenNMS server requests and analyses data from the SNMP agent of pacemaker.

Parameters to Measure

The most important parameters of a pacemaker cluster to watch are the number of nodes online sys4PcmkOnlineNodes in respect to the total nodes in the cluster sys4PcmkTotalNodes and if there are any failures in any resources sys4PcmkResourceFailures.

For the sake of simplicity I added the OIDs to the datacollection/netsnmp.xml file and created a new group:

<group name="pacemaker" ifType="ignore">
  <mibObj oid="." instance="0" alias="sys4PcmkTotNod" type="integer" />
  <mibObj oid="." instance="0" alias="sys4PcmkOnlNod" type="integer" />
  <mibObj oid="." instance="0" alias="sys4PcmkResFail" type="integer" />

Pleaes note that the length of the aliases are limited to 19 characters. After creating the group I can tell OpenNMS to check and collect the data if my nodes identify themselves as net-snmp:

<systemDef name="Net-SNMP">

With the jrobin command you can check if your OpenNMS gathers the correct data.


But your OpenNMS also should generate events if your cluster has problems. Just add the following lines to your <group name="netsnmp" ...> section of thresholds.xml.

<threshold description="sys4PcmkResFail" type="high" ds-type="node" value="0.0"
  rearm="1.0" trigger="1" ds-label="sys4PcmkResFail" ds-name="sys4PcmkResFail"/>
<threshold description"sys4PcmkOnlNod" type="low" ds-type="node" value="2.0"
  rearm="1.0" trigger="1" ds-label="sys4PcmkOnlNod" ds-name="sys4PcmkOnlNod"/>

Any failure of a resource on a node triggers a event. The second line triggers a event, if less than two nodes are online.

Michael Schwartzkopff, 23 Jan 2014