Posts Tagged ‘zabbix’

Monitoring chronyd time service is synchronized, get additional time server values with Zabbix userparameter script

Monday, March 21st, 2022

monitoring-chronyc-time-server-synchronization-zabbix-logo

If you''re running a server infrastructure and your main monitoring system is Zabbix. Then a vital check you might want to setup is to monitor the server time synchronization to a central server. In newer Linux OS-es ntpd time server is started to be used lesser and many modern Linux distributions used in the corporate realm are starting to recommend using chrony as a time synchronization client / server.

In this article, I'll show you how you can quickly setup monitoring of chronyd process and monitoring whether the time is successfully synchronizing with remote Chronyd time server. This will be done with a tiny one liner shell script setup as userparameter It is relatively easy then to setup an Action Alert


1. Create userparameter script to send parsed chronyd time synchronization to Zabbix Server

chronyc tracking provides plenty of useful data which can give many details about info such as offset, skew, root delay, stratum, update interval.

[root@server: ~]# chronyc tracking
Reference ID    : 0A32EF0B (fkf-intp01.intcs.meshcore.net)
Stratum         : 3
Ref time (UTC)  : Fri Mar 18 12:42:31 2022
System time     : 0.000032544 seconds fast of NTP time
Last offset     : +0.000031102 seconds
RMS offset      : 0.000039914 seconds
Frequency       : 3.037 ppm slow
Residual freq   : +0.000 ppm
Skew            : 0.023 ppm
Root delay      : 0.017352410 seconds
Root dispersion : 0.004285847 seconds
Update interval : 1041.6 seconds
Leap status     : Normal

[root@server zabbix_agentd.d]# cat userparameter_chrony.conf 
UserParameter=chrony.json,chronyc -c tracking | sed -e s/'^'/'{"chrony":[“‘/g -e s/’$’/'”]}'/g -e s/','/'","'/g
[root@server zabbix_agentd.d]#

The -c option passed to chronyc is printing the chronyc tracking command ouput data in comma-separated values ( CSV ) format.

2. Create Necessery Item key to get chronyd processes and catch the userparameter data

 

  • First lets create a an Item key to calculate the chronyd daemon proc.num
    proc.num – simply returns the number of processes in the process list just like a simple
    pgrep servicename command does.


monitroing-chronyc_zabbix_item_report_to-zabbix

Second lets create the Item for the userparameter script, the chrony.json key should be the same as the key given in the userparameter script.

obtain-chronyc-statistic-variables-from-remote-chronyd-to-zabbix-windows

Create Chrony Zabbix Triggers 
 

Expression 

{server-host:proc.num[chronyd].last()}<1


will be triggered if the process of chronyd on the server is less than 1

 

chronyc_monitoring_process-is-not-running-screenshot

Next configure

{server-host:chrony[Leap status].iregexp[Not synchronised) ]=1


to trigger Alert Chronyd is Not synchronized if the Expression check occurs.

chronyd-is-not-synchronized-trigger-iregexp

Reload the zabbix-agent on the server
 

To make zabbix-agent locally installed on the machine read the userparameter into memory  (in my case this is zabbix-agent-4.0.28-1.el8.x86_64) installed on Redhat 8.3 (Ootpa), you have to restart it.

[root@server: ~]# systemctl restart zabbix-agent
[root@server: ~]# systemctl status zabbix-agent

● zabbix-agent.service – Zabbix Agent
   Loaded: loaded (/usr/lib/systemd/system/zabbix-agent.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2021-12-16 16:41:02 CET; 3 months 0 days ago
 Main PID: 862165 (zabbix_agentd)
    Tasks: 6 (limit: 23662)
   Memory: 20.6M
   CGroup: /system.slice/zabbix-agent.service
           ├─862165 /usr/sbin/zabbix_agentd -c /etc/zabbix/zabbix_agentd.conf
           ├─862166 /usr/sbin/zabbix_agentd: collector [idle 1 sec]
           ├─862167 /usr/sbin/zabbix_agentd: listener #1 [waiting for connection]
           ├─862168 /usr/sbin/zabbix_agentd: listener #2 [waiting for connection]
           ├─862169 /usr/sbin/zabbix_agentd: listener #3 [waiting for connection]
           └─862170 /usr/sbin/zabbix_agentd: active checks #1 [idle 1 sec]

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.


In a short while you should be seeing in the chrony.json key History data fed by the userparameter Script.

In Zabbix Latest data, you will see plenty of interesting time synchronization data get reported such as Skew, Stratum, Root Delay, Update Interval, Frequency etc.

chronyd-zabbix-reported-time-synchronization-offset-leap-residual-freq-root-delay-screenshot

To have an email Alerting further, go and setup a new Zabbix Action based on the Trigger with your likings and you're done. 
The tracked machine will be in zabbix to make sure your OS clock is not afar from the time server. Repeat the same steps if you need to track chronyd is up running and synchronized on few machines, or if you have to make it for dozens setup a Zabbix template.

Monitor service log is continously growing with Zabbix on Windows with batch userparameter script and trigger Alert if log is unchanged

Thursday, March 17th, 2022

monitor-if-log-file-is-growing-with-zabbix-zabbix-userparameter-script-howto

Recently we had an inteteresting Monitoring work task to achieve. We have an Application that is constantly simulating encrypted connections traffic to a remote side machine and sending specific data on TCP/IP ports.
Communiucation between App Server A -> App Server B should be continous and if all is working as expected App Server A messages output are logged in the Application log file on the machine which by the way Runs
Windows Server 2020.

Sometimes due to Network issues this constant reconnections from the Application S. A to the remote checked machine TCP/IP ports gets interrupted due to LAN issues or a burned Network Switch equipment, misconfiguration on the network due to some Network admin making stoopid stuff etc..

Thus it was important to Monitor somehow whether the log is growing or not and feed the output of whether Application log file is growing or it stuck to a Central Zabbix Server. 
To be able to better understand the task, lets divide the desired outcome in few parts on required:

1. Find The latest file inside a folder C:\Path-to-Service\Monitoring\Log\
2. Open the and check it is current logged records and log the time
3. Re-open file in a short while and check whether in few seconds new records are written
4. Report the status out to Zabbix
5. Make Zabbix Item / Trigger / Action in case if monitored file is not growing

In below article I'll briefly explain how Monitoring a Log on a Machine for growing was implemented using a pure good old WIN .BAT (.batch) script and Zabbix Userparameter key

 

1. Enable userparameter script for Local Zabbix-Agent on the Windows 10 Server Host


Edit Zabbix config file usually on Windows Zabbix installs file is named:

zabbix_agentd.win ]


Uncomment the following lines to enable userparameter support for zabbix-agentd:

 

# Include=c:\zabbix\zabbix_agentd.userparams.conf

Include=c:\zabbix\zabbix_agentd.conf.d\

# Include=c:\zabbix\zabbix_agentd.conf.d\*.conf


2. Create folders for userparameter script and for the userparameter.conf

Before creating userparameter you can to create the folder and grant permissions

Folder name under C:\Zabbix -> zabbix_agentd.conf.d

If you don't want to use Windows Explorer) but do it via cmd line:

C:\Users\LOGUser> mkdir \Zabbix\zabbix_agentd.conf\
C:\User\LOGUser> mkdir \Zabbix\zabbix_scripts\


3. Create Userparameter with some name file ( Userparameter-Monitor-Grow.conf )

In the directory C:\Zabbix\zabbix_agentd.conf.d you should create a config file like:
Userparameter-Monitor-Grow.conf and in it you should have a standard userparameter key and script so file content is:

UserParameter=service.check,C:\Zabbix\zabbix_scripts\GROW_LOG_MONITOR-USERPARAMETER.BAT


4. Create the Batch script that will read the latest file in the service log folder and will periodically check and report to zabbix that file is changing

notepad C:\Zabbix\zabbix_scripts\GROW_LOG_MONITOR-USERPARAMETER.BAT

REM "SCRIPT MONITOR IF FILE IS GROWING OR NOT"
@echo off

set work_dir=C:\Path-to-Service\Monitoring\Log\

set client=client Name

set YYYYMMDD=%DATE:~10,4%%DATE:~4,2%%DATE:~7,2%

set name=csv%YYYYMMDD%.csv

set mytime=%TIME:~0,8%

for %%I in (..) do set CurrDirName=%%~nxI

 

setlocal EnableDelayedExpansion

set "line1=findstr /R /N "^^" %work_dir%\output.csv | find /C ":""


for /f %%a in ('!line1!') do set number1=%%a

set "line2=findstr /R /N "^^" %work_dir%\%name% | find /C ":""


for /f %%a in ('!line2!') do set number2=%%a

 

IF  %number1% == %number2% (

echo %YYYYMMDD% %mytime% MAJOR the log is not incrementing for %client%

echo %YYYYMMDD% %mytime% MAJOR the log is not incrementing for %client% >> monitor-grow_err.log

) ELSE (

echo %YYYYMMDD% %mytime% NORMAL the log is incrementing for %client%

SETLOCAL DisableDelayedExpansion

del %work_dir%\output.csv

FOR /F "usebackq delims=" %%a in (`"findstr /n ^^ %work_dir%\%name%"`) do (

    set "var=%%a"

    SETLOCAL EnableDelayedExpansion

    set "var=!var:*:=!"

    echo(!var! >> %work_dir%\output.csv

    ENDLOCAL

)

)
 

 

To download GROW_LOG_MONITOR-USERPARAMETER.BAT click here.
The script needs to have configured the path to directory containing multiple logs produced by the Monitored Application.
As prior said it will, list the latest created file based on DATE timestamp in the folder will output a simple messages:

If the log file is being fed with data the script will output to output.csv messages continuously, either:

%%mytime%% NORMAL the log is incrementing for %%client%%

Or if the Monitored application log is not writting anything for a period it will start writting messages

%%mytime%%mytime MAJOR the log is not incrementing for %client%

The messages will also be sent in Zabbix.

Before saving the script make sure you modify the Full Path location to the Monitored file for growing, i.e.:

set work_dir=C:\Path-to-Service\Monitoring\Log\


5. Create The Zabbix Item

Set whatever service.check name you would like and a check interval to fetch the info from the userparameter (if you're dealing with very large log files produced by Monitored log of application, then 10 minutes might be too frequent, in most cases 10 minutes should be fine)
monitor-if-log-grows-windows-zabbix-item-service-check-screenshot
 

6. Create Zabbix Trigger


You will need a Trigger something similar to below:

Now considering that zabbix server receives correctly data from the client and the monitored log is growing you should in Zabbix:

%%mytime%% NORMAL the log is incrementing for %%client%%


7. Lastly create an Action to send Email Alert if log is not growing

Linux: Howto Fix “N: Repository ‘http://deb.debian.org/debian buster InRelease’ changed its ‘Version’ value from ‘10.9’ to ‘10.10’” error to resolve apt-get release update issue

Friday, August 13th, 2021

Linux's surprises and disorganization is continuously growing day by day and I start to realize it is becoming mostly impossible to support easily this piece of hackware bundled together.
Usually so far during the last 5 – 7 years, I rarely had any general issues with using:

 apt-get update && apt-get upgrade && apt-get dist-upgrade 

to raise a server's working stable Debian Linux version packages e.g. version X.Y to verzion X.Z (for example up the release from Debian Jessie from 8.1 to 8.2). 

Today I just tried to follow this well known and established procedure that, of course nowdays is better to be done with the newer "apt" command instead with the legacy "apt-get"
And the set of 

 

# apt-get update && apt-get upgrade && apt-get dist-upgrade

 

has triggered below shitty error:
 

root@zabbix:~# apt-get update && apt-get upgrade
Get:1 http://security.debian.org buster/updates InRelease [65.4 kB]
Get:2 http://deb.debian.org/debian buster InRelease [122 kB]
Get:3 http://security.debian.org buster/updates/non-free Sources [688 B]
Get:4 http://repo.zabbix.com/zabbix/5.0/debian buster InRelease [7096 B]
Get:5 http://security.debian.org buster/updates/main Sources [198 kB]
Get:6 http://security.debian.org buster/updates/main amd64 Packages [300 kB]
Get:7 http://security.debian.org buster/updates/main Translation-en [157 kB]
Get:8 http://security.debian.org buster/updates/non-free amd64 Packages [556 B]
Get:9 http://deb.debian.org/debian buster/main Sources [7836 kB]
Get:10 http://repo.zabbix.com/zabbix/5.0/debian buster/main Sources [1192 B]
Get:11 http://repo.zabbix.com/zabbix/5.0/debian buster/main amd64 Packages [4785 B]
Get:12 http://deb.debian.org/debian buster/non-free Sources [85.7 kB]
Get:13 http://deb.debian.org/debian buster/contrib Sources [42.5 kB]
Get:14 http://deb.debian.org/debian buster/main amd64 Packages [7907 kB]
Get:15 http://deb.debian.org/debian buster/main Translation-en [5968 kB]
Get:16 http://deb.debian.org/debian buster/main amd64 Contents (deb) [37.3 MB]
Get:17 http://deb.debian.org/debian buster/contrib amd64 Packages [50.1 kB]
Get:18 http://deb.debian.org/debian buster/non-free amd64 Packages [87.7 kB]
Get:19 http://deb.debian.org/debian buster/non-free Translation-en [88.9 kB]
Get:20 http://deb.debian.org/debian buster/non-free amd64 Contents (deb) [861 kB]
Fetched 61.1 MB in 22s (2774 kB/s)
Reading package lists… Done
N: Repository 'http://deb.debian.org/debian buster InRelease' changed its 'Version' value from '10.9' to '10.10'


As I used to realize nowdays, as Linux started originally as 'Hackers' operating system, its legacy is just one big hack and everything from simple maintenance up to the higher and more sophisticated things requires a workaround 'hack''.

 

This time the hack to resolve error:
 

N: Repository 'http://deb.debian.org/debian buster InRelease' changed its 'Version' value from '10.9' to '10.10'


is up to running cmd:
 

debian-server:~# apt-get update –allow-releaseinfo-change
Поп:1 http://ftp.de.debian.org/debian buster-backports InRelease
Поп:2 http://ftp.debian.org/debian stable InRelease
Поп:3 http://security.debian.org stable/updates InRelease
Изт:5 https://packages.sury.org/php buster InRelease [6837 B]
Изт:6 https://download.docker.com/linux/debian stretch InRelease [44,8 kB]
Изт:7 https://packages.sury.org/php buster/main amd64 Packages [317 kB]
Игн:4 https://attic.owncloud.org/download/repositories/production/Debian_10  InRelease
Изт:8 https://download.owncloud.org/download/repositories/production/Debian_10  Release [964 B]
Изт:9 https://packages.sury.org/php buster/main i386 Packages [314 kB]
Изт:10 https://download.owncloud.org/download/repositories/production/Debian_10  Release.gpg [481 B]
Грш:10 https://download.owncloud.org/download/repositories/production/Debian_10  Release.gpg
  Следните подписи са невалидни: DDA2C105C4B73A6649AD2BBD47AE7F72479BC94B
Грш:11 https://ookla.bintray.com/debian generic InRelease
  403  Forbidden [IP: 52.39.193.126 443]
Четене на списъците с пакети… Готово
N: Repository 'https://packages.sury.org/php buster InRelease' changed its 'Suite' value from '' to 'buster'
W: An error occurred during the signature verification. The repository is not updated and the previous index files will be used. GPG error: https://download.owncloud.org/download/repositories/production/Debian_10  Release: 

apt-get-update-allow-releaseinfo-change-debian-linux-screenshot

Onwards to upgrade the system up to the latest .deb packages, as usual run:

# apt-get -y update && apt-get upgrade -y

 

and updates should be applied as usual with some prompts on whether you prefer to keep or replace existing service configuration and some information on some general changes that might affect your installed services. In a few minutes and few prompts hopefully your Debian OS should be up to the latest stable.

How to calculate connections from IP address with shell script and log to Zabbix graphic

Thursday, March 11th, 2021

We had to test the number of connections incoming IP sorted by its TCP / IP connection state.

For example:

TIME_WAIT, ESTABLISHED, LISTEN etc.


The reason behind is sometimes the IP address '192.168.0.1' does create more than 200 connections, a Cisco firewall gets triggered and the connection for that IP is filtered out. To be able to know in advance that this problem is upcoming. a Small userparameter script is set on the Linux servers, that does print out all connections from IP by its STATES sorted out.

 

The script is calc_total_ip_match_zabbix.sh is below:

#!/bin/bash
#  check ESTIMATED / FIN_WAIT etc. netstat output for IPs and calculate total
# UserParameter=count.connections,(/usr/local/bin/calc_total_ip_match_zabbix.sh)
CHECK_IP='192.168.0.1';
f=0; 

 

for i in $(netstat -nat | grep "$CHECK_IP" | awk '{print $6}' | sort | uniq -c | sort -n); do

echo -n "$i ";
f=$((f+i));
done;
echo
echo "Total: $f"

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
1 TIME_WAIT 2 ESTABLISHED 3 LISTEN 

Total: 6

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
2 ESTABLISHED 3 LISTEN 
Total: 5


To make process with Zabbix it is necessery to have an Item created and a Depedent Item.

images/zabbix-webgui-connection-check1

 

 

 

 

webguiconnection-check1

webguiconnection-check1
 

webgui-connection-check2-item

images/webguiconnection-check1

Finally create a trigger to trigger alarm if you have more than or eqaul to 100 Total overall connections.


images/zabbix-webgui-connection-check-trigger

The Zabbix userparameter script should be as this:
cat /etc/zabbix/zabbix_agentd.d/userparameter_webgui_conn.conf
UserParameter=count.connections,(/usr/local/bin/webgui_conn_track.sh)
 

Some collleagues suggested more efficient shell script solution for suming the overall number of connections, below is less time consuming version of script, that can be used for the calculation.
 

#!/bin/bash -x
# show FIN_WAIT2 / ESTIMATED etc. and calcuate total
count=$(netstat -n | grep "192.168.0.1" | awk ' { print $6 } ' | sort -n | uniq -c | sort -nr)
total=$((${count// /+}))
echo "$count"
echo "Total:" "$total"

 

      2 ESTABLISHED
      1 TIME_WAIT
Total: 3

 


Below is the graph built with Zabbix showing all the fluctuations from connections from monitored IP.
ebgui-check_ip_graph

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020

monitoring-linux-hardware-with-software-temperature-disk-cpu-health-zabbix-userparameter-script

I'm part of a  SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool.  Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers  in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent  Platform Management remote board article).
With ipmi
 hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:
 

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

 

linux:~#  /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       –       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       2820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       21
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       –       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      –       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       –       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       –       16
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       –       34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       –       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
231 Temperature_Celsius     0x0000   097   097   000    Old_age   Offline      –       97
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       –       2104
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       –       1857
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       –       1141
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       32
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       107
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

 

2. hddtemp

 

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
 The hddtemp program monitors and reports the temperature of PATA, SATA
 or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
 Technology (S.M.A.R.T.)
information on drives that support this feature.
 

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

 

 

3. lm-sensors / i2c-tools 

 Lm-sensors is a hardware health monitoring package for Linux. It allows you
 to access information from temperature, voltage, and fan speed sensors.
i2c-tools
was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I2C devices, and not all I2C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

 

linux:~# sensors
i350bb-pci-0600
Adapter: PCI adapter
loc1:         +55.0 C  (high = +120.0 C, crit = +110.0 C)

 

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 0:         +26.0 C  (high = +78.0 C, crit = +88.0 C)
Core 1:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 2:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 3:         +28.0 C  (high = +78.0 C, crit = +88.0 C)

 


On CentOS Linux useful tool is also  lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers
psesors-server

psensor-linux-graphical-tool-to-check-cpu-hard-disk-temperature-unix

If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensora GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
else 
echo 'Sensors OK';
fi 

Of course there is much more sophisticated stuff to use for monitoring out there


Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.