Monitoring Logfile Entries with logmatch

Abstract

Monitoring LOG for abnormal events is daily business as well as tracking normal events - they give a good insight how a system performs. But what makes an event normal and when does it turn into something abnormal?

Monitoring LOG for abnormal events is daily business as well as tracking normal events - they give a good insight how a system performs. But what makes an event normal and when does it turn into something abnormal?

In order to define what's normal you need to measure and define your systems baseline. The baseline represents performance under normal conditions. Any deviation from that may be an indicator of a problem - an abnormal event.

But how can you measure performance figures? How would one extract these figures from an application and how do they get into the monitoring system? In this blog entry I want to describe the logmatch feature of the net-snmp suite.

Motivation

A general problem of system monitoring is how an applications performance data can be collected and once you have it how it can be transported via network to the monitoring station. The Internet standard to transport such data over the network is the Simple Network Management Protocol (SNMP).

What remains is how data from the application can be collected. Many applications come with their own SNMP (sub-) agent. In case your application doesn't come with SNMP support you can parse the logfiles and count lines that match specific regular expressions.

Configuration

The standard SNMP agent of UNIX systems net-snmp offers log line counting as a built-in feature. It has to be configured in the snmpd.conf configuration file.

The option is described in the man pages as follow:

logmatch NAME FILE CYCLETIME REGEX
  monitors the specified file for occurances of the specified pattern REGEX. The file
  position is stored internally so the entire file is only read initially, every sub-
  sequent pass will only read the new lines added to the file since the last read.

  NAME   name  of  the  logmatch  instance  (will  appear  as logMatchName under log‐
         Match/logMatchTable/logMatchEntry/logMatchName in the ucd-snmp MIB tree)

  FILE   absolute path to the logfile to be monitored. Note that this path  can  con‐
         tain date/time directives (like in the UNIX 'date' command). See the manual
         page for 'strftime' for the various directives accepted.

  CYCLETIME   time interval for each logfile read and internal variable update in
         seconds. Note:  an SNMPGET* operation will also trigger an immediate logfile
         read and variable update.

  REGEX  the regular expression to be used. Note: DO NOT enclose the regular  expres‐
         sion in quotes even if there are spaces in the expression as the quotes will
         also become part of the pattern to be matched!

The man pages also list an example to monitor the number of accesses to an apache web server:

logmatch apacheGet /var/log/apache2/access.log 300 GET /.* 200

Now you can walk that logMatch table:

# snmpwalk mywewbserver logMatch
UCD-SNMP-MIB::logMatchMaxEntries.0 = INTEGER: 50
UCD-SNMP-MIB::logMatchIndex.1 = INTEGER: 1
UCD-SNMP-MIB::logMatchName.1 = STRING: apacheGet
UCD-SNMP-MIB::logMatchFilename.1 = STRING: /var/log/apache2/access.log
UCD-SNMP-MIB::logMatchRegEx.1 = STRING: GET /.* 200
UCD-SNMP-MIB::logMatchGlobalCounter.1 = Counter32: 15380
UCD-SNMP-MIB::logMatchGlobalCount.1 = INTEGER: 15380
UCD-SNMP-MIB::logMatchCurrentCounter.1 = Counter32: 15380
UCD-SNMP-MIB::logMatchCurrentCount.1 = INTEGER: 15380
UCD-SNMP-MIB::logMatchCounter.1 = Counter32: 15380
UCD-SNMP-MIB::logMatchCount.1 = INTEGER: 0
UCD-SNMP-MIB::logMatchCycle.1 = INTEGER: 300
UCD-SNMP-MIB::logMatchErrorFlag.1 = INTEGER: noError(0)
UCD-SNMP-MIB::logMatchRegExCompilation.1 = STRING: Success

Tip

If your SNMP agent does not display the values, you may want to check if its effective uid/gid may read the log files. Debian, for example, runs the snmpd in a own snmpd gid. You could change this to adm in your /etc/default/snmpd.

Now your monitoring system can easily retrieve the performance data of any application that does not come with its own SNMP agent.

I prefer to fetch the logMatchGlobalCounter and calculate the differences between two measurement points. In this case you will not loose any information in case the application stops for any reason.

Sample with postfix MTA

The postfix MTA comes without a SNMP agent.

Important

Anyone to sponsor the development of a SNMP agent for postfix?

But with the following logMatch entry in the configuration of ths SNMP agent it is very simple to measure the throughput of a MTA:

logmatch mailSent /var/log/mail.log 300 postfix/smtp.*status=sent
logmatch mailBounce /var/log/mail.log 300 postfix/smtp.*status=bounced

If you have several instances of postfix on your mail server you get nice pictures like the one in figure 1.

Figure 1: MTA throughput for 9 different instances of postfix on one server.

Michael Schwartzkopff, 02. April 2013