Error Log Monitoring AIX

Error Log Monitoring AIX

We are going to have a look on errpt monitoring and how errpt works in AIX operating system in detail.

In actual errlog allows operating system to records software and hardware issues such that warning, failure and events in to a log file through errpt command and we can read those errors to take the necessary actions to prevent unavailability of hardware or software components.

Errpt components:-

There are two major components

  1. Kernel components
  2. User Components

Error login process:-

If there is error detected by on operating system that sends the error code information to errsave (kernel components) which add the entry in to /dev/error file and whenever there is a new entry added in to /dev/error file which add the timestamp information etc ….. There is a sequence of operation involved to creating the error log entries. All error description information stored in /var/adm/ras/errtmplt file and based on the error category the information would be described.  After completion of all the above said sequences error message would be stored in /var/adm/ras/errlog file an whenever we execute the errpt command which reads the information from errlog file and provide the information in to terminal.

File location details

  1. Error file locates in /dev/error

Errpt_Image-02.   Errtmplt locates in /var/adm/ras/

Errpt_Image-13.   Errorlog locates in /var/adm/ras/

Errpt_Image-2

If you run errpt command with will display the output in six columns and those are, error identifier, timestamp, type, class, resource name and description.

Errpt_Image-3

Errpt_Image-4

TIMESTAMP would be displayed as like above given image.

Type of Error:-

Type of error displayed under “T” column whereas T denotes Temporary, P denotes Permanent or performance or Pending, I denote Informational and U denotes unknown.

Class of Error:-

Class of error displayed under “C” column whereas H denotes Hardware, S denotes software, O denotes Operator and U denotes undetermined.

Resource Name:-

This information would be collected from ODM and based on error classification it take the information from CuDv,CuAt and CuVPD.

If you want to see detailed information about whatever the error in errpt simply errpt –a command can be executed with pg or more option would give details as like below.

Errpt_Image-5

Error log management:-

Error logging demon would be started at the time of system startup whereas rc.boot script takes care about stopping and starting errdemon.

Can be validate through ps –ef | grep –i err*, incase by any unconditional situation errpt terminated errdemon can be started by /usr/lib/errdemon. If you want to stop errdemon issues /usr/lib/errstop but stopping errdemon would not be advisable.

Errpt_Image-6

Errdemon attributes can be checked through errdemon –l command also buffer size and log file size can be adjusted through s and b switches.If we would like to clear errlog mean through errclear command we could achieve.

Errpt_Image-7

Leave a Comment