I use SNMP to monitor ssCPUIdle on CentOS 5.2. Today, after 248 days of uptime, it started returning 0% idle:
$snmpwalk -v 1 localhost –c public .1.3.6.1.4.1.2021.11.11
UCD-SNMP-MIB::ssCpuIdle.0 = INTEGER: 0
I logged in to the CentOS command line and ran top
and vmstat 5
(credit). Both showed id (idle) of 97% or more, so the CPU is fine; SNMP is returning an incorrect value.
I stopped and started the SNMP daemon:
service snmpd stop
service snmpd start
After that, SNMP again returned correct values (96 or more). For about a minute. Then it drops back to 0.
Turns out this is a RHEL 5.2 bug that may or may not be fixed in 5.3: wrong CPU idle (ssCpuIdle.0) reported after certain uptime. Interesting that the bug submitter noticed the problem after 62 days with a 4-CPU machine. I get the error after 248 days on a 1-CPU machine. 62 x 4 = 248.
Fortunately a reboot solves the problem:
shutdown –r now
Remind me to check this again in 248 days!