Posts Tagged ‘script’

How to calculate connections from IP address with shell script and log to Zabbix graphic

Thursday, March 11th, 2021

We had to test the number of connections incoming IP sorted by its TCP / IP connection state.

For example:

TIME_WAIT, ESTABLISHED, LISTEN etc.


The reason behind is sometimes the IP address '192.168.0.1' does create more than 200 connections, a Cisco firewall gets triggered and the connection for that IP is filtered out. To be able to know in advance that this problem is upcoming. a Small userparameter script is set on the Linux servers, that does print out all connections from IP by its STATES sorted out.

 

The script is calc_total_ip_match_zabbix.sh is below:

#!/bin/bash
#  check ESTIMATED / FIN_WAIT etc. netstat output for IPs and calculate total
# UserParameter=count.connections,(/usr/local/bin/calc_total_ip_match_zabbix.sh)
CHECK_IP='192.168.0.1';
f=0; 

 

for i in $(netstat -nat | grep "$CHECK_IP" | awk '{print $6}' | sort | uniq -c | sort -n); do

echo -n "$i ";
f=$((f+i));
done;
echo
echo "Total: $f"

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
1 TIME_WAIT 2 ESTABLISHED 3 LISTEN 

Total: 6

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
2 ESTABLISHED 3 LISTEN 
Total: 5


To make process with Zabbix it is necessery to have an Item created and a Depedent Item.

images/zabbix-webgui-connection-check1

 

 

 

 

webguiconnection-check1

webguiconnection-check1
 

webgui-connection-check2-item

images/webguiconnection-check1

Finally create a trigger to trigger alarm if you have more than or eqaul to 100 Total overall connections.


images/zabbix-webgui-connection-check-trigger

The Zabbix userparameter script should be as this:
cat /etc/zabbix/zabbix_agentd.d/userparameter_webgui_conn.conf
UserParameter=count.connections,(/usr/local/bin/webgui_conn_track.sh)
 

Some collleagues suggested more efficient shell script solution for suming the overall number of connections, below is less time consuming version of script, that can be used for the calculation.
 

#!/bin/bash -x
# show FIN_WAIT2 / ESTIMATED etc. and calcuate total
count=$(netstat -n | grep "192.168.0.1" | awk ' { print $6 } ' | sort -n | uniq -c | sort -nr)
total=$((${count// /+}))
echo "$count"
echo "Total:" "$total"

 

      2 ESTABLISHED
      1 TIME_WAIT
Total: 3

 


Below is the graph built with Zabbix showing all the fluctuations from connections from monitored IP.
ebgui-check_ip_graph

List all existing local admin users belonging to admin group and mail them to monitoring mail box

Monday, February 8th, 2021

local-user-account-creation-deletion-change-monitor-accounts-and-send-them-to-central-monitoring-mail

If you have a bunch of servers that needs to have a tight security with multiple Local users superuser accounts that change over time and you need to frequently keep an have a long over time if some new system UNIX local users in /etc/passwd /etc/group has been added deleted e.g. the /etc/passwd /etc/group then you might have the task to have some primitive monitoring set and the most primitive I can think of is simply routinely log users list for historical purposes to a common mailbox over time (lets say 4 times a month or even more frequently) you might send with a simple cron job a list of all existing admin authorized users to a logging sysadmin mailbox like lets say:
 

Local-unix-users@yourcompanydomain.com


A remark to make here is the common sysadmin practice scenario to have local existing non-ldap admin users group members of whom are authorized to use sudo su – root via /etc/sudoers  is described in my previous article how to add local users to admin group superuser access via sudo I thus have been managing already a number of servers that have user setup using the above explained admin group.

Thus to have the monitoring at place I've developed a tiny shell script that does check all users belonging to the predefined user group dumps it to .csv format that starts with a simple timestamp on when user admin list was made and sends it to a predefined email address as well as logs sent mail content for further reference in a local directory.

The task is a relatively easy but since nowadays the level of competency of system administration across youngsters is declinging -that's of course in my humble opinion (just like it happens in every other profession), below is the developed list-admin-users.sh
 

 

#!/bin/bash
# dump all users belonging to a predefined admin user / group in csv format 
# with a day / month year timestamp and mail it to a predefined admin
# monitoring address
TO_ADDRESS="Local-unix-users@yourcompanydomain.com";
HOSTN=$(hostname);
# root@server:/# grep -i 1000 /etc/passwd
# username:x:username:1000:username,,,:/home/username:/bin/bash
# username1:x:username1:1000:username1,,,:/home/username1:/bin/bash
# username5:x:username1:1000:username5,,,:/home/username5:/bin/bash

ADMINS_ID='4355';
#
# root@server # group_id_ID='4355'; grep -i group_id_ID /etc/passwd
# …
# username1:x:1005:4355:username1,,,:/home/username1:/bin/bash
# username5:x:1005:4355,,,:/home/username5:/bin/bash


group_id_ID='215';
group_id='group_id';
FIL="/var/log/userlist-log-dir/userlist_$(date +"%d_%m_%Y_%H_%M").txt";
CUR_D="$HOSTN: Current admin users $(date)"; >> $FIL; echo -e "##### $CUR_D #####" >> $FIL;
for i in $(cat /etc/passwd | grep -i /home|grep /bin/bash|grep -e "$ADMINS_ID" -e "$group_id_ID" | cut -d : -f1); do \
if [[ $(grep $i /etc/group|grep $group_id) ]]; then
f=$(echo $i); echo $i,group_id,$(id -g $i); else  echo $i,admin,$(id -g $i);
fi
done >> $FIL; mail -s "$CUR_D" $TO_ADDRESS < $FIL


list-admin-users.sh is ready for download also here

To make the script report you will have to place it somewhere for example in /usr/local/bin/list-admin-users.sh ,  create its log dir location /var/log/userlist-log-dir/ and set proper executable and user/group script and directory permissions to it to be only readable for root user.

root@server: # mkdir /var/log/userlist-log-dir/
root@server: # chmod +x /usr/local/bin/list-admin-users.sh
root@server: # chmod -R 700 /var/log/userlist-log-dir/


To make the script generate its admin user reports and send it to the central mailbox  a couple of times in the month early in the morning (assuming you have a properly running postfix / qmail / sendmail … smtp), as a last step you need to set a cron job to routinely invoke the script as root user.

root@server: # crontab -u root -e
12 06 5,10,15,20,25,1 /usr/local/bin/list-admin-users.sh


That's all folks now on 5th 10th, 15th, 20th 25th and 1st at 06:12 you'll get the admin user list reports done. Enjoy 🙂

Add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers

Tuesday, December 8th, 2020

Zabbix-logo-how-to-make-ntpd-time-server-monitoring-article

 

How to add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers?

We needed to set on some servers at my work an elementary check with Zabbix monitoring to check whether servers time is correctly synchronized with ntpd time service as well report if the ntp daemon is correctly running on the machine. For that a userparameter script was developed called userparameter_ntp.conf the script is simplistic and few a lines of bash shell scripting 
stuff is based on gresping information required from ntpq and ntpstat common ntp client commands to get information about the status of time synchronization on the servers.
 

[root@linuxserver ]# ntpstat
synchronised to NTP server (10.80.200.30) at stratum 3
   time correct to within 47 ms
   polling server every 1024 s

 

[root@linuxserver ]# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+timeserver1 10.26.239.41     2 u  319 1024  377   15.864    1.270   0.262
+timeserver2 10.82.239.41     2 u  591 1024  377   16.287   -0.334   1.748
*timeserver3 10.82.239.43     2 u   47 1024  377   15.613   -0.553   0.251
 timeserver4 .INIT.          16 u    – 1024    0    0.000    0.000   0.000


Below is Zabbix UserParameter script that does report us 3 important values we monitor to make sure time server synchronization works as expected the zabbix keys we set are ntp.offset, ntp.sync, ntp.exact in attempt to describe what we're fetching from ntp client:

[root@linuxserver ]# cat /etc/zabbix/zabbix-agent.d/userparameter_ntp.conf

UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { offset=1000 } $1 ~ /\*/ { offset=$9 } END { print offset }')
#UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'FNR==4{print $9}')
UserParameter=ntp.sync,(/usr/bin/ntpstat | cut -f 1 -d " " | tr -d ' \t\n\r\f')
UserParameter=ntp.exact,(/usr/bin/ntpstat | /usr/bin/awk 'FNR==2{print $5,$6}')

In Zabbix the monitored ntpd parameters set-upped looks like this:

 

ntp_time_synchronization_check-zabbix-screenshot.

 

!Note that in above userparameter example, the commented userparameter script is a just another way to do an ntpd offset returned value which was developed before the more sophisticated with more regular expression checks from the /usr/sbin/ntpd via ntpq, perhaps if you want to extend it you can also use another script to report more verbose information to Zabbix if that is required like ouput from ntpq -c peers command:
 

UserParameter=ntp.verbose,(/usr/sbin/ntpq -c peers)

Of course to make the Zabbix fetch necessery data from monitored hosts, we need to set-up further new Zabbix Template with the respective Trigger and Items.

Below are few screenshots including the triggers used.

ntpd_server-time_synchronization_check-zabbix-screenshot-triggers

  • ntpd.trigger

{NTP:net.udp.service[ntp].last(0)}<1

  • NTP Synchronization trigger

{NTP:ntp.sync.iregexp(unsynchronised)}=1

 

 

As you can see from history we have setup our items to Store history of reported data to Zabbix from parameter script for 90 days and update our monitor check, every 30 seconds from the monitored hosts to which Tempate is applied.

Well that's all folks, time synchronization issues we'll be promptly triggering a new Alarm in Zabbix !

Vodka! :)

Wednesday, September 12th, 2007

Yesterday I drinked 200 gr. of Vodka yesterday Night, it was pretty refreshing for me but I got drunk a little.I'm smoking again … Things are going bad in my life recently. I have health issues. And I intend to go to doctor today.Yesterday I went to the polyclinic but my personal Dr. Nikolay  was not there (I was angry, I went to doctor once in years and he is not there) so I'll try again today. I had pains somewhere around the stomach. At least at work things are going smoothly at least God hears my prayers about this. I'm very confused and I have completely no idea what to do with my life. Yesterday I was out with Lily and Kiril on the fountain. The previous day Nomen, I, Yavor, Kiro and Bino went to the "Kobaklyka" (a woody place which is close to Dobrich.) Well that's most of what's happening lately with my life. I wrote a little script to make that nautilus to get restarted if it starts burning the cpu. It's a dumb script (the bad thing is that I'm loosing form scripting, Well I don't script much lately). Here is the script http://pcfreak.d-bg.net/bshscr/restart_nautilus.sh https://www.pc-freak.net/bshscr/restart_nautilus.sh. The days before the 4 days weekend, I hat to spend a lot of time on one of the servers fighting with Spammers. Hate spammers really! I ended removing bounce messages at all for one of the domains, which fixed the bounce spam method spammers use (btw qmail's chkuser seems to not work properly for some reason) … Also I started watching Stargate – SG1. First I thought it's a stupid sci-fi serial. But after the first serie I now think it has it's good moments :]. Also I had something like a Mortification Day going on during Monday. The whole day I listened to Mortification (The first Christian Death Metal Band). I Liked much the "Hammer of God" album. In the evening Sabin (Bino) came home and we watched some Mortification videos at Youtube. Right now I listen again to "Ever – Idyll" a pretty great song. And yeah I keep listening to ChristianIndustrial.net a lot, a great radio. Try it if you haven't!END—–

Check server Internet connectivity Speedtest from Linux terminal CLI

Friday, August 7th, 2020

check-server-console-speedtest

If you are a system administrator of a dedicated server and you have no access to Xserver Graphical GNOME / KDE etc. environment and you wonder how you can track the bandwidth connectivity speed of remote system to the internet and you happen to have a modern Linux distribution, here is few ways to do a speedtest.
 

1. Use speedtest-cli command line tool to test connectivity

 


speedtest-cli is a tiny tool written in python, to use it hence you need to have python installed on the server.
It is available both for Redhat Linux distros and Debians / Ubuntus etc. in the list of standard installable packages.

a) Install speedtest-cli on Fedora / CentOS / RHEL
 

On CentOS / RHEL / Scientific Linux lower than ver 8:

 

 

$ sudo yum install python

On CentOS 8 / RHEL 8 user type the following command to install Python 3 or 2:

 

 

$sudo yum install python3
$ sudo yum install python2

 

 

 


On Fedora Linux version 22+

 

 

$ sudo dnf install python
$ sudo dnf install pytho3

 


Once python is at place download speedtest.py or in case if link is not reachable download mirrored version of speedtest.py on www.pc-freak.net here
 

 

 

$ wget -O speedtest-cli https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py
$ chmod +x speedtest-cli

 


Then it is time to run script speedtest-screenshot-linux-terminal-console-cli-cmd
To test enabled Bandwidth on the server

 

 

$ python speedtest-cli


b) Install speedtest-cli on Debian

On Latest Debian 10 Buster speedtest is available out of the box in regular .deb repositories, so fetch it with apt
 

 

# apt install –yes speedtest-cli

 


You can give now speedtest-cli a try with –bytes arguments to get speed values in bytes instead of bits or if you want to generate an image with test results in picture just like it will appear if you use speedtest.net inside a gui browser, use the –share option

speedtest-screenshot-linux-terminal-console-cli-cmd-options

 

 

 

2. Getting connectivity results of all defined speedtest test City Locations


Speedtest has a list of servers through which a Upload and Download speed is tested, to run speedtest-cli to test with each and every server and get a better picture on what kind of connectivity to expect from your server towards the closest region capital cities, fetch speedtest-servers.php list and use a small shell loop below is how:

 

 

 

 

 

root@pcfreak:~#  wget http://www.speedtest.net/speedtest-servers.php
–2020-08-07 16:31:34–  http://www.speedtest.net/speedtest-servers.php
Преобразувам www.speedtest.net (www.speedtest.net)… 151.101.2.219, 151.101.66.219, 151.101.130.219, …
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:80… успешно свързване.
HTTP изпратено искане, чакам отговор… 301 Moved Permanently
Адрес: https://www.speedtest.net/speedtest-servers.php [следва]
–2020-08-07 16:31:34–  https://www.speedtest.net/speedtest-servers.php
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 307 Temporary Redirect
Адрес: https://c.speedtest.net/speedtest-servers-static.php [следва]
–2020-08-07 16:31:35–  https://c.speedtest.net/speedtest-servers-static.php
Преобразувам c.speedtest.net (c.speedtest.net)… 151.101.242.219
Connecting to c.speedtest.net (c.speedtest.net)|151.101.242.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 200 OK
Дължина: 211695 (207K) [text/xml]
Saving to: ‘speedtest-servers.php’
speedtest-servers.php                  100%[==========================================================================>] 206,73K  –.-KB/s    in 0,1s
2020-08-07 16:31:35 (1,75 MB/s) – ‘speedtest-servers.php’ saved [211695/211695]

Once file is there with below loop we extract all file defined servers id="" 's 
 

root@pcfreak:~# for i in $(cat speedtest-servers.php | egrep -Eo 'id="[0-9]{4}"' |sed -e 's#id="##' -e 's#"##g'); do speedtest-cli  –server $i; done
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by Telecoms Ltd. (Varna) [38.88 km]: 25.947 ms
Testing download speed……………………………………………………………………..
Download: 57.71 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 93.85 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by GMB Computers (Constanta) [94.03 km]: 80.247 ms
Testing download speed……………………………………………………………………..
Download: 35.86 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 80.15 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…

…..

 


etc.

For better readability you might want to add the ouput to a file or even put it to run periodically on a cron if you have some suspcion that your server Internet dedicated lines dies out to some general locations sometimes.
 

3. Testing UPlink speed with Download some big file from source location


In the past a classical way to test the bandwidth connectivity of your Internet Service Provider was to fetch some big file, Linux guys should remember it was almost a standard to roll a download of Linux kernel source .tar file with some test browser as elinks / lynx / w3c.
speedtest-screenshot-kernel-org-shot1 speedtest-screenshot-kernel-org-shot2
or if those are not at hand test connectivity on remote free shell servers whatever file downloader as wget or curl was used.
Analogical method is still possible, for example to use wget to get an idea about bandwidtch connectivity, let it roll below 500 mb from speedtest.wdc01.softlayer.com to /dev/null few times:

 

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

 

# wget -O /dev/null –progress=dot:mega http://cachefly.cachefly.net/10mb.test ; date
–2020-08-07 13:56:49–  http://cachefly.cachefly.net/10mb.test
Resolving cachefly.cachefly.net (cachefly.cachefly.net)… 205.234.175.175
Connecting to cachefly.cachefly.net (cachefly.cachefly.net)|205.234.175.175|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: ‘/dev/null’

     0K …….. …….. …….. …….. …….. …….. 30%  142M 0s
  3072K …….. …….. …….. …….. …….. …….. 60%  179M 0s
  6144K …….. …….. …….. …….. …….. …….. 90%  204M 0s
  9216K …….. ……..                                    100%  197M=0.06s

2020-08-07 13:56:50 (173 MB/s) – ‘/dev/null’ saved [10485760/10485760]

Fri 07 Aug 2020 01:56:50 PM UTC


To be sure you have a real picture on remote machine Internet speed it is always a good idea to run download of random big files on a certain locations that are well known to have a very stable Internet bandwidth to the Internet backbone routers.

4. Using Simple shell script to test Internet speed


Fetch and use speedtest.sh

 


wget https://raw.github.com/blackdotsh/curl-speedtest/master/speedtest.sh && chmod u+x speedtest.sh && bash speedtest.sh

 

 

5. Using iperf to test connectivity between two servers 

 

iperf is another good tool worthy to mention that can be used to test the speed between client and server.

To use iperf install it with apt and do on the server machine to which bandwidth will be tested:

 

# iperf -s 

 

On the client machine do:

 

# iperf -c 192.168.1.1 

 

where 192.168.1.1 is the IP of the server where iperf was spawned to listen.

6. Using Netflix fast to determine Internet connection speed on host


Fast

fast is a service provided by Netflix. Its web interface is located at Fast.com and it has a command-line interface available through npm (npm is a package manager for nodejs) so if you don't have it you will have to install it first with:

# apt install –yes npm

 

Note that if you run on Debian this will install you some 249 new nodejs packages which you might not want to have on the system, so this is useful only for machines that has already use of nodejs.

 

$ fast

 

     82 Mbps ↓


The command returns your Internet download speed. To get your upload speed, use the -u flag:

 

$ fast -u

 

   ⠧ 80 Mbps ↓ / 8.2 Mbps ↑

 

7. Use speedometer / iftop to measure incoming and outgoing traffic on interface


If you're measuring connectivity on a live production server system, then you might consider that the measurement output might not be exactly correct especially if you're measuring the Uplink / Downlink on a Heavy loaded webserver / Mail Server / Samba or DNS server.
If this is the case a very useful tools to consider to extract the already taken traffic used on your Incoming and Outgoing ( TX / RX ) Network interfaces
are speedometer and iftop, they're present and installable depending on the OS via yum / apt or the respective package manager.

 


To install on Debian server:

 

 

 

# apt install –yes iftop speedometer

 


The most basic use to check the live received traffic in a nice Ncurses like text graphic is with: 

 

 

 

 

# speedometer -r 


speedometer-check-received-transmitted-network-traffic-on-linux1

To generate real time ASCII art graph on RX / TX traffic do:

 

 

# speedometer -r eth0 -t eth0


speedometer-check-received-transmitted-network-traffic-on-linux

 

 

 

 

# iftop -P -i eth0

 

 


iftop-show-statistics-on-connections-screenshot-pcfreak

 

 

 

 

 

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020

monitoring-linux-hardware-with-software-temperature-disk-cpu-health-zabbix-userparameter-script

I'm part of a  SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool.  Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers  in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent  Platform Management remote board article).
With ipmi
 hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:
 

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

 

linux:~#  /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       –       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       2820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       21
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       –       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      –       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       –       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       –       16
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       –       34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       –       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
231 Temperature_Celsius     0x0000   097   097   000    Old_age   Offline      –       97
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       –       2104
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       –       1857
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       –       1141
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       32
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       107
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

 

2. hddtemp

 

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
 The hddtemp program monitors and reports the temperature of PATA, SATA
 or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
 Technology (S.M.A.R.T.)
information on drives that support this feature.
 

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

 

 

3. lm-sensors / i2c-tools 

 Lm-sensors is a hardware health monitoring package for Linux. It allows you
 to access information from temperature, voltage, and fan speed sensors.
i2c-tools
was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I2C devices, and not all I2C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

 

linux:~# sensors
i350bb-pci-0600
Adapter: PCI adapter
loc1:         +55.0 C  (high = +120.0 C, crit = +110.0 C)

 

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 0:         +26.0 C  (high = +78.0 C, crit = +88.0 C)
Core 1:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 2:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 3:         +28.0 C  (high = +78.0 C, crit = +88.0 C)

 


On CentOS Linux useful tool is also  lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers
psesors-server

psensor-linux-graphical-tool-to-check-cpu-hard-disk-temperature-unix

If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensora GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
else 
echo 'Sensors OK';
fi 

Of course there is much more sophisticated stuff to use for monitoring out there


Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.

Automatic network restart and reboot Linux server script if ping timeout to gateway is not responding as a way to reduce connectivity downtimes

Monday, December 10th, 2018

automatic-server-network-restart-and-reboot-script-if-connection-to-server-gateway-inavailable-tux-penguing-ascii-art-bin-bash

Inability of server to come back online server automaticallyafter electricity / network outage

These days my home server  is experiencing a lot of issues due to Electricity Power Outages, a construction dig operations to fix / change waterpipe tubes near my home are in action and perhaps the power cables got ruptered by the digger machine.
The effect of all this was that my server networking accessability was affected and as I didn't have network I couldn't access it remotely anymore at a certain point the electricity was restored (and the UPS charge could keep the server up), however the server accessibility did not due restore until I asked a relative to restart it or under a more complicated cases where Tech aquanted guy has to help – Alexander (Alex) a close friend from school years check his old site here – alex.www.pc-freak.net helps a lot.to restart the machine physically either run a quick restoration commands on root TTY terminal or generally do check whether default router is reachable.

This kind of Pc-Freak.net downtime issues over the last month become too frequent (the machine was down about 5 times for 2 to 5 hours and this was too much (and weirdly enough it was not accessible from the internet even after electricity network was restored and the only solution to that was a physical server restart (from the Power Button).

To decrease the number of cases in which known relatives or friends has to  physically go to the server and restart it, each time after network or electricity outage I wrote a small script to check accessibility towards Default defined Network Gateway for my server with few ICMP packages sent with good old PING command
and trigger a network restart and system reboot
(in case if the network restart does fail) in a row.

1. Create reboot-if-nwork-is-downsh script under /usr/sbin or other dir

Here is the script itself:

 

#!/bin/sh
# Script checks with ping 5 ICMP pings 10 times to DEF GW and if so
# triggers networking restart /etc/inid.d/networking restart
# Then does another 5 x 10 PINGS and if ping command returns errors,
# Reboots machine
# This script is useful if you run home router with Linux and you have
# electricity outages and machine doesn't go up if not rebooted in that case

GATEWAY_HOST='192.168.0.1';

run_ping () {
for i in $(seq 1 10); do
    ping -c 5 $GATEWAY_HOST
done

}

reboot_f () {
if [ $? -eq 0 ]; then
        echo "$(date "+%Y-%m-%d %H:%M:%S") Ping to $GATEWAY_HOST OK" >> /var/log/reboot.log
    else
    /etc/init.d/networking restart
        echo "$(date "+%Y-%m-%d %H:%M:%S") Restarted Network Interfaces:" >> /tmp/rebooted.txt
    for i in $(seq 1 10); do ping -c 5 $GATEWAY_HOST; done
    if [ $? -eq 0 ] && [ $(cat /tmp/rebooted.txt) -lt ‘5’ ]; then
         echo "$(date "+%Y-%m-%d %H:%M:%S") Ping to $GATEWAY_HOST FAILED !!! REBOOTING." >> /var/log/reboot.log
        /sbin/reboot

    # increment 5 times until stop
    [[ -f /tmp/rebooted.txt ]] || echo 0 > /tmp/rebooted.txt
    n=$(< /tmp/rebooted.txt)
        echo $(( n + 1 )) > /tmp/rebooted.txt
    fi
    # if 5 times rebooted sleep 30 mins and reset counter
    if [ $(cat /tmprebooted.txt) -eq ‘5’ ]; then
    sleep 1800
        cat /dev/null > /tmp/rebooted.txt
    fi
fi

}
run_ping;
reboot_f;

You can download a copy of reboot-if-nwork-is-down.sh script here.

As you see in script successful runs  as well as its failures are logged on server in /var/log/reboot.log with respective timestamp.
Also a counter to 5 is kept in /tmp/rebooted.txt, incremented on each and every script run (rebooting) if, the 5 times increment is matched

a sleep is executed for 30 minutes and the counter is being restarted.
The counter check to 5 guarantees the server will not get restarted if access to Gateway is not continuing for a long time to prevent the system is not being restarted like crazy all time.
 

2. Create a cron job to run reboot-if-nwork-is-down.sh every 15 minutes or so 

I've set the script to re-run in a scheduled (root user) cron job every 15 minutes with following  job:

To add the script to the existing cron rules without rewriting my old cron jobs and without tempering to use cronta -u root -e (e.g. do the cron job add in a non-interactive mode with a single bash script one liner had to run following command:

 

{ crontab -l; echo "*/15 * * * * /usr/sbin/reboot-if-nwork-is-down.sh 2>&1 >/dev/null; } | crontab –


I know restarting a server to restore accessibility is a stupid practice but for home-use or small client servers with unguaranteed networks with a cheap Uninterruptable Power Supply (UPS) devices it is useful.

Summary

Time will show how efficient such a  "self-healing script practice is.
Even though I'm pretty sure that even in a Corporate businesses and large Public / Private Hybrid Clouds where access to remote mounted NFS / XFS / ZFS filesystems are failing a modifications of the script could save you a lot of nerves and troubles and unhappy customers / managers screaming at you on the phone 🙂


I'll be interested to hear from others who have a better  ideas to restore ( resurrect ) access to inessible Linux server after an outage.?
 

xorg on Toshiba Satellite L40 14B with Intel GM965 video hangs up after boot and the worst fix ever / How to reinstall Ubuntu by keeping the old personal data and programs

Wednesday, April 27th, 2011

black screen ubuntu troubles

I have updated Ubuntu version 9.04 (Jaunty) to 9.10 and followed the my previous post update ubuntu from 9.04 to Latest Ubuntu

I expected that a step by step upgrade from a release to release will work like a charm and though it does on many notebooks it doesn't on Toshiba Satellite L40

The update itself went fine, whether I used the update-manager -d and followed the above pointed tutorial, however after a system restart the PC failed to boot the X server properly, a completely blank screen with blinking cursor appeared and that was all.

I restarted the system into the 2.6.35-28-generic kernel rescue-mode recovery kernel in order to be able to enter into physical console.

Logically the first thing I did is to check /var/log/messages and /var/log/Xorg.0.log but I couldn't find nothing unusual or wrong there.

I suspected something might be wrong with /etc/X11/xorg.conf so I deleted it:

ubuntu:~# rm -f /etc/X11/xorg.conf

and attempted to re-create the xorg.conf X configuration with command:

ubuntu:~# dpkg-reconfigure xserver-xorg

This command was reported to be the usual way to reconfigure the X server settings from console, but in my case (for unknown reasons) it did nothing.

Next the command which was able to re-generate the xorg.conf file was:

ubuntu:~# X -configure

The command generates a xorg.conf sample file in /root/xorg.conf.* so I used the conf to put it in /etc/X11/xorg.conf X's default location and restarted in hope that this would fix the non-booting issue.

Very sadly again the black screen of death appeared on the notebook toshiba screen.
I further thought of completely wipe out the xorg.conf in hope that at least it might boot without the conf file but this worked out neither.

I attempted to run the Xserver with a xorg.conf configured to work with vesa as it's well known vesa X server driver is supposed to work on 99% of the video cards, as almost all of them nowdays are compatible with the vesa standard, but guess what in my case vesa worked not!

The only version of X I can boot in was the failsafe X screen mode which is available through the grub's boot menu recovery mode.

Further on I decided to try few xorg.conf which I found online and were reported to work fine with Intel GM965 internal video , and yes this was also unsucessful.

Some of my other futile attempts were: to re-install the xorg server with apt-get, reinstall the xserver-xorg-video-intel driver e.g.:

ubuntu:~# apt-get install --reinstall xserver-xorg xserver-xorg-video-intel

As nothing worked out I was completely pissed off and decided to take an alternative approach which will take a lot of time but at least will probably be succesful, I decided to completely re-install the Ubuntu from a CD after backing up the /home directory and making a list of available packages on the system, so I can further easily run a tiny bash one-liner script to install all the packages which were previously existing on the laptop before the re-install:

Here is how I did it:

First I archived the /home directory:

ubuntu:/# tar -czvf home.tar.gz home/
....

For 12GB of data with some few thousands of files archiving it took about 40 minutes.

The tar spit archive became like 9GB and I hence used sftp to upload it to a remote FTP server as I was missing a flash drive or an external HDD where I can place the just archived data.

Uploading with sftp can be achieved with a command similar to:

sftp user@yourhost.com
Password:
Connected to yourhost.com.
sftp> put home.tar.gz

As a next step to backup in a file the list of all current installed packages, before I can further proceed to boot-up with the Ubuntu Maverich 10.10 CD and prooceed with the fresh install I used command:

for i in $(dpkg -l| awk '{ print $2 }'); do
echo $i; done >> my_current_ubuntu_packages.txt

Once again I used sftp as in above example to upload my_current_update_packages.txt file to my FTP host.

After backing up all the stuff necessery, I restarted the system and booted from the CD-rom with Ubuntu.
The Ubuntu installation as usual is more than a piece of cake and even if you don't have a brain you can succeed with it, so I wouldn't comment on it 😉

Right after the installation I used the sftp client once again to fetch the home.tar.gz and my_current_ubuntu_packages.txt

I placed the home.tar.gz in /home/ and untarred it inside the fresh /home dir:

ubuntu:/home# tar -zxvf home.tar.gz

Eventually the old home directory was located in /home/home so thereon I used Midnight Commander ( the good old mc text file explorer and manager ) to restore the important user files to their respective places.

As a last step I used the my_current_ubuntu_packages.txt in combination with a tiny shell script to install all the listed packages inside the file with command:

ubuntu:~# for i in $(cat my_current_ubuntu_packagespackages.txt); do
apt-get install --yes $i; sleep 1;
done

You will have to stay in front of the computer and manually answer a ncurses interface questions concerning some packages configuration and to be honest this is really annoying and time consuming.

Summing up the overall time I spend with this stupid Toshiba Satellite L40 with the shitty Intel GM965 was 4 days, where each day I tried numerous ways to fix up the X and did my best to get through the blank screen xserver non-bootable issue, without a complete re-install of the old Ubuntu system.
This is a lesson for me that if I stumble such a shitty issues I will straight proceed to the re-install option and not loose my time with non-sense fixes which would never work.

Hope the article might be helpful to somebody else who experience some problems with Linux similar to mine.

After all at least the Ubuntu Maverick 10.10 is really good looking in general from a design perspective.
What really striked me was the placement of the close, minimize and maximize window buttons , it seems in newer Ubuntus the ubuntu guys decided to place the buttons on the left, here is a screenshot:

Left button positioning of navigation Buttons in Ubuntu 10.10

I believe the solution I explain, though very radical and slow is a solution that would always work and hence worthy 😉
Let me hear from you if the article was helpful.