Monitoring LOG for abnormal events is daily business as well as tracking normal events - they give a good insight how a system performs. But what makes an event normal and when does it turn into something abnormal?
In order to define what's normal you need to measure and define your systems baseline. The baseline represents performance under normal conditions. Any deviation from that may be an indicator of a problem - an abnormal event.
But how can you measure performance figures? How would one extract these figures from an application and how do they get into the monitoring system? In this blog entry I want to describe the logmatch feature of the net-snmp suite.
A general problem of system monitoring is how an applications performance data can be collected and once you have it how it can be transported via network to the monitoring station. The Internet standard to transport such data over the network is the Simple Network Management Protocol (SNMP).
What remains is how data from the application can be collected. Many applications come with their own SNMP (sub-) agent. In case your application doesn't come with SNMP support you can parse the logfiles and count lines that match specific regular expressions.
The standard SNMP agent of UNIX systems net-snmp offers log line counting as a built-in feature. It has to be configured in the snmpd.conf configuration file.
The option is described in the man pages as follow:
logmatch NAME FILE CYCLETIME REGEX monitors the specified file for occurances of the specified pattern REGEX. The file position is stored internally so the entire file is only read initially, every sub- sequent pass will only read the new lines added to the file since the last read. NAME name of the logmatch instance (will appear as logMatchName under log‐ Match/logMatchTable/logMatchEntry/logMatchName in the ucd-snmp MIB tree) FILE absolute path to the logfile to be monitored. Note that this path can con‐ tain date/time directives (like in the UNIX 'date' command). See the manual page for 'strftime' for the various directives accepted. CYCLETIME time interval for each logfile read and internal variable update in seconds. Note: an SNMPGET* operation will also trigger an immediate logfile read and variable update. REGEX the regular expression to be used. Note: DO NOT enclose the regular expres‐ sion in quotes even if there are spaces in the expression as the quotes will also become part of the pattern to be matched!
The man pages also list an example to monitor the number of accesses to an apache web server:
logmatch apacheGet /var/log/apache2/access.log 300 GET /.* 200
Now you can walk that logMatch table:
# snmpwalk mywewbserver logMatch UCD-SNMP-MIB::logMatchMaxEntries.0 = INTEGER: 50 UCD-SNMP-MIB::logMatchIndex.1 = INTEGER: 1 UCD-SNMP-MIB::logMatchName.1 = STRING: apacheGet UCD-SNMP-MIB::logMatchFilename.1 = STRING: /var/log/apache2/access.log UCD-SNMP-MIB::logMatchRegEx.1 = STRING: GET /.* 200 UCD-SNMP-MIB::logMatchGlobalCounter.1 = Counter32: 15380 UCD-SNMP-MIB::logMatchGlobalCount.1 = INTEGER: 15380 UCD-SNMP-MIB::logMatchCurrentCounter.1 = Counter32: 15380 UCD-SNMP-MIB::logMatchCurrentCount.1 = INTEGER: 15380 UCD-SNMP-MIB::logMatchCounter.1 = Counter32: 15380 UCD-SNMP-MIB::logMatchCount.1 = INTEGER: 0 UCD-SNMP-MIB::logMatchCycle.1 = INTEGER: 300 UCD-SNMP-MIB::logMatchErrorFlag.1 = INTEGER: noError(0) UCD-SNMP-MIB::logMatchRegExCompilation.1 = STRING: Success
If your SNMP agent does not display the values, you may want to check if its effective uid/gid may read the log files. Debian, for example, runs the snmpd in a own snmpd gid. You could change this to adm in your /etc/default/snmpd.
Now your monitoring system can easily retrieve the performance data of any application that does not come with its own SNMP agent.
I prefer to fetch the logMatchGlobalCounter and calculate the differences between two measurement points. In this case you will not loose any information in case the application stops for any reason.
Sample with postfix MTA
The postfix MTA comes without a SNMP agent.
Anyone to sponsor the development of a SNMP agent for postfix?
But with the following logMatch entry in the configuration of ths SNMP agent it is very simple to measure the throughput of a MTA:
logmatch mailSent /var/log/mail.log 300 postfix/smtp.*status=sent logmatch mailBounce /var/log/mail.log 300 postfix/smtp.*status=bounced
If you have several instances of postfix on your mail server you get nice pictures like the one in figure 1.