Posts Tagged ‘script’

How to automate open xen Hypervisor Virtual Machines backups shell script

Tuesday, June 22nd, 2021

openxen-backup-logo As a sysadmin that have my own Open Xen Debian Hypervisor running on a Lenovo ThinkServer few months ago due to a human error I managed to mess up one of my virtual machines and rebuild the Operating System from scratch and restore files service and MySQl data from backup that really pissed me of and this brought the need for having a decent Virtual Machine OpenXen backup solution I can implement on the Debian ( Buster) 10.10 running the free community Open Xen version 4.11.4+107-gef32c7afa2-1. The Hypervisor is a relative small one holding just 7 VM s:

HypervisorHost:~#  xl list
Name                                        ID   Mem VCPUs      State   Time(s)
Domain-0                                     0 11102    24     r—–  214176.4
pcfrxenweb                                  11 12288     4     -b—-  247425.5
pcfrxen                                     12 16384    10     -b—-  1371621.4
windows7                                    20  4096     2     -b—-   97887.2
haproxy2                                    21  4096     2     -b—-   11806.9
jitsi-meet                                  22  2048     2     -b—-   12843.9
zabbix                                      23  2048     2     -b—-   20275.1
centos7                                     24  2040     2     -b—-   10898.2

HypervisorHost:~# xl list|grep -v 'Name ' |grep  -v 'Domain-0'  |wc -l
7


The backup strategy of the script is very simple to shutdown the running VM machine, make a copy with rsync to a backup location the image of each of the Virtual Machines in a bash shell loop for each virtual machine shown in output of xl command and backup to a preset local directory in my case this is /backups/ the backup of each virtual machine is produced within a separate backup directory with a respective timestamp. Backup VM .img files are produced in my case to mounted 2x external attached hard drives each of which is a 4 Terabyte Seagate Plus Backup (Storage). The original version of the script was made to be a slightly different by Zhiqiang Ma whose script I used for a template to come up with my xen VM backup solution. To prevent the Hypervisor's load the script is made to do it with a nice of (nice -n 10) this might be not required or you might want to modify it to better suit your needs. Below is the script itself you can fetch a copy of it /usr/sbin/xen_vm_backups.sh :

#!/bin/bash

# Author: Zhiqiang Ma (http://www.ericzma.com/)
# Modified to work with xl and OpenXen by Georgi Georgiev – https://pc-freak.net
# Original creation dateDec. 27, 2010
# Script takes all defined vms under xen_name_list and prepares backup of each
# after shutting down the machine prepares archive and copies archive in externally attached mounted /backup/disk1 HDD
# Latest update: 08.06.2021 G. Georgiev – hipo@pc-freak.net

mark_file=/backups/disk1/tmp/xen-bak-marker
log_file=/var/log/xen/backups/bak-$(date +%Y_%m_%d).log
err_log_file=/var/log/xen/backups/bak_err-$(date +%H_%M_%Y_%m_%d).log
xen_dir=/xen/domains
xen_vmconfig_dir=/etc/xen/
local_bak_dir=/backups/disk1/tmp
#bak_dir=xenbak@backup_host1:/lhome/xenbak
bak_dir=/backups/disk1/xen-backups/xen_images/$(date +%Y_%m_%d)/xen/domains
#xen_name_list="haproxy2 pcfrxenweb jitsi-meet zabbix windows7 centos7 pcfrxenweb pcfrxen"
xen_name_list="windows7 haproxy2 jitsi-meet zabbix centos7"

if [ ! -d /var/log/xen/backups ]; then
echo mkdir -p /var/log/xen/backups
 mkdir -p /var/log/xen/backups
fi

if [ ! -d $bak_dir ]; then
echo mkdir -p $bak_dir
 mkdir -p $bak_dir

fi


# check whether bak runned last week
if [ -e $mark_file ] ; then
        echo  rm -f $mark_file
 rm -f $mark_file
else
        echo  touch $mark_file
 touch $mark_file
  # exit 0
fi

# set std and stderr to log file
        echo mv $log_file $log_file.old
       mv $log_file $log_file.old
        echo mv $err_log_file $err_log_file.old
       mv $err_log_file $err_log_file.old
        echo "exec 2> $err_log_file"
       exec 2> $err_log_file
        echo "exec > $log_file"
       exec > $log_file


# check whether the VM is running
# We only backup running VMs

echo "*** Check alive VMs"

xen_name_list_tmp=""

for i in $xen_name_list
do
        echo "/usr/sbin/xl list > /tmp/tmp-xen-list"
        /usr/sbin/xl list > /tmp/tmp-xen-list
  grepinlist=`grep $i" " /tmp/tmp-xen-list`;
  if [[ “$grepinlist” == “” ]]
  then
    echo $i is not alive.
  else
    echo $i is alive.
    xen_name_list_tmp=$xen_name_list_tmp" "$i
  fi
done

xen_name_list=$xen_name_list_tmp

echo "Alive VM list:"

for i in $xen_name_list
do
   echo $i
done

echo "End alive VM list."

###############################
date
echo "*** Backup starts"

###############################
date
echo "*** Copy VMs to local disk"

for i in $xen_name_list
do
  date
  echo "Shutdown $i"
        echo  /usr/sbin/xl shutdown $i
        /usr/sbin/xl shutdown $i
        if [ $? != ‘0’ ]; then
                echo 'Not Xen Disk image destroying …';
                /usr/sbin/xl destroy $i
        fi
  sleep 30

  echo "Copy $i"
  echo "Copy to local_bak_dir: $local_bak_dir"
      echo /usr/bin/rsync -avhW –no-compress –progress $xen_dir/$i/{disk.img,swap.img} $local_bak_dir/$i/
     time /usr/bin/rsync -avhW –no-compress –progress $xen_dir/$i/{disk.img,swap.img} $local_bak_dir/$i/
      echo /usr/bin/rsync -avhW –no-compress –progress $xen_vmconfig_dir/$i.cfg $local_bak_dir/$i.cfg
     time /usr/bin/rsync -avhW –no-compress –progress $xen_vmconfig_dir/$i.cfg $local_bak_dir/$i.cfg
  date
  echo "Create $i"
  # with vmmem=1024"
  # /usr/sbin/xm create $xen_dir/vm.run vmid=$i vmmem=1024
          echo /usr/sbin/xl create $xen_vmconfig_dir/$i.cfg
          /usr/sbin/xl create $xen_vmconfig_dir/$i.cfg
## Uncomment if you need to copy with scp somewhere
###       echo scp $log_file $bak_dir/xen-bak-111.log
###      echo  /usr/bin/rsync -avhW –no-compress –progress $log_file $local_bak_dir/xen-bak-111.log
done

####################
date
echo "*** Compress local bak vmdisks"

for i in $xen_name_list
do
  date
  echo "Compress $i"
      echo tar -z -cfv $bak_dir/$i-$(date +%Y_%m_%d).tar.gz $local_bak_dir/$i-$(date +%Y_%m_%d) $local_bak_dir/$i.cfg
     time nice -n 10 tar -z -cvf $bak_dir/$i-$(date +%Y_%m_%d).tar.gz $local_bak_dir/$i/ $local_bak_dir/$i.cfg
    echo rm -vf $local_bak_dir/$i/ $local_bak_dir/$i.cfg
    rm -vrf $local_bak_dir/$i/{disk.img,swap.img}  $local_bak_dir/$i.cfg
done

####################
date
echo "*** Copy local bak vmdisks to remote machines"

copy_remote () {
for i in $xen_name_list
do
  date
  echo "Copy to remote: vm$i"
        echo  scp $local_bak_dir/vmdisk0-$i.tar.gz $bak_dir/vmdisk0-$i.tar.gz
done

#####################
date
echo "Backup finishes"
        echo scp $log_file $bak_dir/bak-111.log

}

date
echo "Backup finished"

 

Things to configure before start using using the script to prepare backups for you is the xen_name_list variable

#  directory skele where to store already prepared backups
bak_dir=/backups/disk1/xen-backups/xen_images/$(date +%Y_%m_%d)/xen/domains

# The configurations of the running Xen Virtual Machines
xen_vmconfig_dir=/etc/xen/
# a local directory that will be used for backup creation ( I prefer this directory to be on the backup storage location )
local_bak_dir=/backups/disk1/tmp
#bak_dir=xenbak@backup_host1:/lhome/xenbak
# the structure on the backup location where daily .img backups with be produced with rsync and tar archived with bzip2
bak_dir=/backups/disk1/xen-backups/xen_images/$(date +%Y_%m_%d)/xen/domains

# list here all the Virtual Machines you want the script to create backups of
xen_name_list="windows7 haproxy2 jitsi-meet zabbix centos7"

If you need the script to copy the backup of Virtual Machine images to external Backup server via Local Lan or to a remote Internet located encrypted connection with a passwordless ssh authentication (once you have prepared the Machines to automatically login without pass over ssh with specific user), you can uncomment the script commented section to adapt it to copy to remote host.

Once you have placed at place /usr/sbin/xen_vm_backups.sh use a cronjob to prepare backups on a regular basis, for example I use the following cron to produce a working copy of the Virtual Machine backups everyday.
 

# crontab -u root -l 

# create windows7 haproxy2 jitsi-meet centos7 zabbix VMs backup once a month
00 06 1,2,3,4,5,6,7,8,9,10,11,12 * * /usr/sbin/xen_vm_backups.sh 2>&1 >/dev/null


I do clean up virtual machines Images that are older than 95 days with another cron job

# crontab -u root -l

# Delete xen image files older than 95 days to clear up space from backup HDD
45 06 17 * * find /backups/disk1/xen-backups/xen_images* -type f -mtime +95 -exec rm {} \; 2>&1 >/dev/null

#### Delete xen config backups older than 1 year+3 days (368 days)
45 06 17 * * find /backups/disk1/xen-backups/xen_config* -type f -mtime +368 -exec rm {} \; 2>&1 >/dev/null

 

# Delete xen image files older than 95 days to clear up space from backup HDD
45 06 17 * * find /backups/disk1/xen-backups/xen_images* -type f -mtime +95 -exec rm {} \; 2>&1 >/dev/null

#### Delete xen config backups older than 1 year+3 days (368 days)
45 06 17 * * find /backups/disk1/xen-backups/xen_config* -type f -mtime +368 -exec rm {} \; 2>&1 >/dev/null

How to calculate connections from IP address with shell script and log to Zabbix graphic

Thursday, March 11th, 2021

We had to test the number of connections incoming IP sorted by its TCP / IP connection state.

For example:

TIME_WAIT, ESTABLISHED, LISTEN etc.


The reason behind is sometimes the IP address '192.168.0.1' does create more than 200 connections, a Cisco firewall gets triggered and the connection for that IP is filtered out. To be able to know in advance that this problem is upcoming. a Small userparameter script is set on the Linux servers, that does print out all connections from IP by its STATES sorted out.

 

The script is calc_total_ip_match_zabbix.sh is below:

#!/bin/bash
#  check ESTIMATED / FIN_WAIT etc. netstat output for IPs and calculate total
# UserParameter=count.connections,(/usr/local/bin/calc_total_ip_match_zabbix.sh)
CHECK_IP='192.168.0.1';
f=0; 

 

for i in $(netstat -nat | grep "$CHECK_IP" | awk '{print $6}' | sort | uniq -c | sort -n); do

echo -n "$i ";
f=$((f+i));
done;
echo
echo "Total: $f"

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
1 TIME_WAIT 2 ESTABLISHED 3 LISTEN 

Total: 6

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
2 ESTABLISHED 3 LISTEN 
Total: 5


To make process with Zabbix it is necessery to have an Item created and a Depedent Item.

images/zabbix-webgui-connection-check1

 

 

 

 

webguiconnection-check1

webguiconnection-check1
 

webgui-connection-check2-item

images/webguiconnection-check1

Finally create a trigger to trigger alarm if you have more than or eqaul to 100 Total overall connections.


images/zabbix-webgui-connection-check-trigger

The Zabbix userparameter script should be as this:
cat /etc/zabbix/zabbix_agentd.d/userparameter_webgui_conn.conf
UserParameter=count.connections,(/usr/local/bin/webgui_conn_track.sh)
 

Some collleagues suggested more efficient shell script solution for suming the overall number of connections, below is less time consuming version of script, that can be used for the calculation.
 

#!/bin/bash -x
# show FIN_WAIT2 / ESTIMATED etc. and calcuate total
count=$(netstat -n | grep "192.168.0.1" | awk ' { print $6 } ' | sort -n | uniq -c | sort -nr)
total=$((${count// /+}))
echo "$count"
echo "Total:" "$total"

 

      2 ESTABLISHED
      1 TIME_WAIT
Total: 3

 


Below is the graph built with Zabbix showing all the fluctuations from connections from monitored IP.
ebgui-check_ip_graph

List all existing local admin users belonging to admin group and mail them to monitoring mail box

Monday, February 8th, 2021

local-user-account-creation-deletion-change-monitor-accounts-and-send-them-to-central-monitoring-mail

If you have a bunch of servers that needs to have a tight security with multiple Local users superuser accounts that change over time and you need to frequently keep an have a long over time if some new system UNIX local users in /etc/passwd /etc/group has been added deleted e.g. the /etc/passwd /etc/group then you might have the task to have some primitive monitoring set and the most primitive I can think of is simply routinely log users list for historical purposes to a common mailbox over time (lets say 4 times a month or even more frequently) you might send with a simple cron job a list of all existing admin authorized users to a logging sysadmin mailbox like lets say:
 

Local-unix-users@yourcompanydomain.com


A remark to make here is the common sysadmin practice scenario to have local existing non-ldap admin users group members of whom are authorized to use sudo su – root via /etc/sudoers  is described in my previous article how to add local users to admin group superuser access via sudo I thus have been managing already a number of servers that have user setup using the above explained admin group.

Thus to have the monitoring at place I've developed a tiny shell script that does check all users belonging to the predefined user group dumps it to .csv format that starts with a simple timestamp on when user admin list was made and sends it to a predefined email address as well as logs sent mail content for further reference in a local directory.

The task is a relatively easy but since nowadays the level of competency of system administration across youngsters is declinging -that's of course in my humble opinion (just like it happens in every other profession), below is the developed list-admin-users.sh
 

 

#!/bin/bash
# dump all users belonging to a predefined admin user / group in csv format 
# with a day / month year timestamp and mail it to a predefined admin
# monitoring address
TO_ADDRESS="Local-unix-users@yourcompanydomain.com";
HOSTN=$(hostname);
# root@server:/# grep -i 1000 /etc/passwd
# username:x:username:1000:username,,,:/home/username:/bin/bash
# username1:x:username1:1000:username1,,,:/home/username1:/bin/bash
# username5:x:username1:1000:username5,,,:/home/username5:/bin/bash

ADMINS_ID='4355';
#
# root@server # group_id_ID='4355'; grep -i group_id_ID /etc/passwd
# …
# username1:x:1005:4355:username1,,,:/home/username1:/bin/bash
# username5:x:1005:4355,,,:/home/username5:/bin/bash


group_id_ID='215';
group_id='group_id';
FIL="/var/log/userlist-log-dir/userlist_$(date +"%d_%m_%Y_%H_%M").txt";
CUR_D="$HOSTN: Current admin users $(date)"; >> $FIL; echo -e "##### $CUR_D #####" >> $FIL;
for i in $(cat /etc/passwd | grep -i /home|grep /bin/bash|grep -e "$ADMINS_ID" -e "$group_id_ID" | cut -d : -f1); do \
if [[ $(grep $i /etc/group|grep $group_id) ]]; then
f=$(echo $i); echo $i,group_id,$(id -g $i); else  echo $i,admin,$(id -g $i);
fi
done >> $FIL; mail -s "$CUR_D" $TO_ADDRESS < $FIL


list-admin-users.sh is ready for download also here

To make the script report you will have to place it somewhere for example in /usr/local/bin/list-admin-users.sh ,  create its log dir location /var/log/userlist-log-dir/ and set proper executable and user/group script and directory permissions to it to be only readable for root user.

root@server: # mkdir /var/log/userlist-log-dir/
root@server: # chmod +x /usr/local/bin/list-admin-users.sh
root@server: # chmod -R 700 /var/log/userlist-log-dir/


To make the script generate its admin user reports and send it to the central mailbox  a couple of times in the month early in the morning (assuming you have a properly running postfix / qmail / sendmail … smtp), as a last step you need to set a cron job to routinely invoke the script as root user.

root@server: # crontab -u root -e
12 06 5,10,15,20,25,1 /usr/local/bin/list-admin-users.sh


That's all folks now on 5th 10th, 15th, 20th 25th and 1st at 06:12 you'll get the admin user list reports done. Enjoy 🙂

Add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers

Tuesday, December 8th, 2020

Zabbix-logo-how-to-make-ntpd-time-server-monitoring-article

 

How to add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers?

We needed to set on some servers at my work an elementary check with Zabbix monitoring to check whether servers time is correctly synchronized with ntpd time service as well report if the ntp daemon is correctly running on the machine. For that a userparameter script was developed called userparameter_ntp.conf the script is simplistic and few a lines of bash shell scripting 
stuff is based on gresping information required from ntpq and ntpstat common ntp client commands to get information about the status of time synchronization on the servers.
 

[root@linuxserver ]# ntpstat
synchronised to NTP server (10.80.200.30) at stratum 3
   time correct to within 47 ms
   polling server every 1024 s

 

[root@linuxserver ]# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset  jitter
==============================================================================
+timeserver1 10.26.239.41     2 u  319 1024  377   15.864    1.270   0.262
+timeserver2 10.82.239.41     2 u  591 1024  377   16.287   -0.334   1.748
*timeserver3 10.82.239.43     2 u   47 1024  377   15.613   -0.553   0.251
 timeserver4 .INIT.          16 u    – 1024    0    0.000    0.000   0.000


Below is Zabbix UserParameter script that does report us 3 important values we monitor to make sure time server synchronization works as expected the zabbix keys we set are ntp.offset, ntp.sync, ntp.exact in attempt to describe what we're fetching from ntp client:

[root@linuxserver ]# cat /etc/zabbix/zabbix-agent.d/userparameter_ntp.conf

UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { offset=1000 } $1 ~ /\*/ { offset=$9 } END { print offset }')
#UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'FNR==4{print $9}')
UserParameter=ntp.sync,(/usr/bin/ntpstat | cut -f 1 -d " " | tr -d ' \t\n\r\f')
UserParameter=ntp.exact,(/usr/bin/ntpstat | /usr/bin/awk 'FNR==2{print $5,$6}')

In Zabbix the monitored ntpd parameters set-upped looks like this:

 

ntp_time_synchronization_check-zabbix-screenshot.

 

!Note that in above userparameter example, the commented userparameter script is a just another way to do an ntpd offset returned value which was developed before the more sophisticated with more regular expression checks from the /usr/sbin/ntpd via ntpq, perhaps if you want to extend it you can also use another script to report more verbose information to Zabbix if that is required like ouput from ntpq -c peers command:
 

UserParameter=ntp.verbose,(/usr/sbin/ntpq -c peers)

Of course to make the Zabbix fetch necessery data from monitored hosts, we need to set-up further new Zabbix Template with the respective Trigger and Items.

Below are few screenshots including the triggers used.

ntpd_server-time_synchronization_check-zabbix-screenshot-triggers

  • ntpd.trigger

{NTP:net.udp.service[ntp].last(0)}<1

  • NTP Synchronization trigger

{NTP:ntp.sync.iregexp(unsynchronised)}=1

 

 

As you can see from history we have setup our items to Store history of reported data to Zabbix from parameter script for 90 days and update our monitor check, every 30 seconds from the monitored hosts to which Tempate is applied.

Well that's all folks, time synchronization issues we'll be promptly triggering a new Alarm in Zabbix !

Vodka! :)

Wednesday, September 12th, 2007

Yesterday I drinked 200 gr. of Vodka yesterday Night, it was pretty refreshing for me but I got drunk a little.I'm smoking again … Things are going bad in my life recently. I have health issues. And I intend to go to doctor today.Yesterday I went to the polyclinic but my personal Dr. Nikolay  was not there (I was angry, I went to doctor once in years and he is not there) so I'll try again today. I had pains somewhere around the stomach. At least at work things are going smoothly at least God hears my prayers about this. I'm very confused and I have completely no idea what to do with my life. Yesterday I was out with Lily and Kiril on the fountain. The previous day Nomen, I, Yavor, Kiro and Bino went to the "Kobaklyka" (a woody place which is close to Dobrich.) Well that's most of what's happening lately with my life. I wrote a little script to make that nautilus to get restarted if it starts burning the cpu. It's a dumb script (the bad thing is that I'm loosing form scripting, Well I don't script much lately). Here is the script http://pcfreak.d-bg.net/bshscr/restart_nautilus.sh https://www.pc-freak.net/bshscr/restart_nautilus.sh. The days before the 4 days weekend, I hat to spend a lot of time on one of the servers fighting with Spammers. Hate spammers really! I ended removing bounce messages at all for one of the domains, which fixed the bounce spam method spammers use (btw qmail's chkuser seems to not work properly for some reason) … Also I started watching Stargate – SG1. First I thought it's a stupid sci-fi serial. But after the first serie I now think it has it's good moments :]. Also I had something like a Mortification Day going on during Monday. The whole day I listened to Mortification (The first Christian Death Metal Band). I Liked much the "Hammer of God" album. In the evening Sabin (Bino) came home and we watched some Mortification videos at Youtube. Right now I listen again to "Ever – Idyll" a pretty great song. And yeah I keep listening to ChristianIndustrial.net a lot, a great radio. Try it if you haven't!END—–

Check server Internet connectivity Speedtest from Linux terminal CLI

Friday, August 7th, 2020

check-server-console-speedtest

If you are a system administrator of a dedicated server and you have no access to Xserver Graphical GNOME / KDE etc. environment and you wonder how you can track the bandwidth connectivity speed of remote system to the internet and you happen to have a modern Linux distribution, here is few ways to do a speedtest.
 

1. Use speedtest-cli command line tool to test connectivity

 


speedtest-cli is a tiny tool written in python, to use it hence you need to have python installed on the server.
It is available both for Redhat Linux distros and Debians / Ubuntus etc. in the list of standard installable packages.

a) Install speedtest-cli on Fedora / CentOS / RHEL
 

On CentOS / RHEL / Scientific Linux lower than ver 8:

 

 

$ sudo yum install python

On CentOS 8 / RHEL 8 user type the following command to install Python 3 or 2:

 

 

$sudo yum install python3
$ sudo yum install python2

 

 

 


On Fedora Linux version 22+

 

 

$ sudo dnf install python
$ sudo dnf install pytho3

 


Once python is at place download speedtest.py or in case if link is not reachable download mirrored version of speedtest.py on www.pc-freak.net here
 

 

 

$ wget -O speedtest-cli https://raw.githubusercontent.com/sivel/speedtest-cli/master/speedtest.py
$ chmod +x speedtest-cli

 


Then it is time to run script speedtest-screenshot-linux-terminal-console-cli-cmd
To test enabled Bandwidth on the server

 

 

$ python speedtest-cli


b) Install speedtest-cli on Debian

On Latest Debian 10 Buster speedtest is available out of the box in regular .deb repositories, so fetch it with apt
 

 

# apt install –yes speedtest-cli

 


You can give now speedtest-cli a try with –bytes arguments to get speed values in bytes instead of bits or if you want to generate an image with test results in picture just like it will appear if you use speedtest.net inside a gui browser, use the –share option

speedtest-screenshot-linux-terminal-console-cli-cmd-options

 

 

 

2. Getting connectivity results of all defined speedtest test City Locations


Speedtest has a list of servers through which a Upload and Download speed is tested, to run speedtest-cli to test with each and every server and get a better picture on what kind of connectivity to expect from your server towards the closest region capital cities, fetch speedtest-servers.php list and use a small shell loop below is how:

 

 

 

 

 

root@pcfreak:~#  wget http://www.speedtest.net/speedtest-servers.php
–2020-08-07 16:31:34–  http://www.speedtest.net/speedtest-servers.php
Преобразувам www.speedtest.net (www.speedtest.net)… 151.101.2.219, 151.101.66.219, 151.101.130.219, …
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:80… успешно свързване.
HTTP изпратено искане, чакам отговор… 301 Moved Permanently
Адрес: https://www.speedtest.net/speedtest-servers.php [следва]
–2020-08-07 16:31:34–  https://www.speedtest.net/speedtest-servers.php
Connecting to www.speedtest.net (www.speedtest.net)|151.101.2.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 307 Temporary Redirect
Адрес: https://c.speedtest.net/speedtest-servers-static.php [следва]
–2020-08-07 16:31:35–  https://c.speedtest.net/speedtest-servers-static.php
Преобразувам c.speedtest.net (c.speedtest.net)… 151.101.242.219
Connecting to c.speedtest.net (c.speedtest.net)|151.101.242.219|:443… успешно свързване.
HTTP изпратено искане, чакам отговор… 200 OK
Дължина: 211695 (207K) [text/xml]
Saving to: ‘speedtest-servers.php’
speedtest-servers.php                  100%[==========================================================================>] 206,73K  –.-KB/s    in 0,1s
2020-08-07 16:31:35 (1,75 MB/s) – ‘speedtest-servers.php’ saved [211695/211695]

Once file is there with below loop we extract all file defined servers id="" 's 
 

root@pcfreak:~# for i in $(cat speedtest-servers.php | egrep -Eo 'id="[0-9]{4}"' |sed -e 's#id="##' -e 's#"##g'); do speedtest-cli  –server $i; done
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by Telecoms Ltd. (Varna) [38.88 km]: 25.947 ms
Testing download speed……………………………………………………………………..
Download: 57.71 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 93.85 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…
Retrieving speedtest.net server list…
Retrieving information for the selected server…
Hosted by GMB Computers (Constanta) [94.03 km]: 80.247 ms
Testing download speed……………………………………………………………………..
Download: 35.86 Mbit/s
Testing upload speed…………………………………………………………………………………………
Upload: 80.15 Mbit/s
Retrieving speedtest.net configuration…
Testing from Vivacom (83.228.93.76)…

…..

 


etc.

For better readability you might want to add the ouput to a file or even put it to run periodically on a cron if you have some suspcion that your server Internet dedicated lines dies out to some general locations sometimes.
 

3. Testing UPlink speed with Download some big file from source location


In the past a classical way to test the bandwidth connectivity of your Internet Service Provider was to fetch some big file, Linux guys should remember it was almost a standard to roll a download of Linux kernel source .tar file with some test browser as elinks / lynx / w3c.
speedtest-screenshot-kernel-org-shot1 speedtest-screenshot-kernel-org-shot2
or if those are not at hand test connectivity on remote free shell servers whatever file downloader as wget or curl was used.
Analogical method is still possible, for example to use wget to get an idea about bandwidtch connectivity, let it roll below 500 mb from speedtest.wdc01.softlayer.com to /dev/null few times:

 

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

$ wget –output-document=/dev/null http://speedtest.wdc01.softlayer.com/downloads/test500.zip

 

# wget -O /dev/null –progress=dot:mega http://cachefly.cachefly.net/10mb.test ; date
–2020-08-07 13:56:49–  http://cachefly.cachefly.net/10mb.test
Resolving cachefly.cachefly.net (cachefly.cachefly.net)… 205.234.175.175
Connecting to cachefly.cachefly.net (cachefly.cachefly.net)|205.234.175.175|:80… connected.
HTTP request sent, awaiting response… 200 OK
Length: 10485760 (10M) [application/octet-stream]
Saving to: ‘/dev/null’

     0K …….. …….. …….. …….. …….. …….. 30%  142M 0s
  3072K …….. …….. …….. …….. …….. …….. 60%  179M 0s
  6144K …….. …….. …….. …….. …….. …….. 90%  204M 0s
  9216K …….. ……..                                    100%  197M=0.06s

2020-08-07 13:56:50 (173 MB/s) – ‘/dev/null’ saved [10485760/10485760]

Fri 07 Aug 2020 01:56:50 PM UTC


To be sure you have a real picture on remote machine Internet speed it is always a good idea to run download of random big files on a certain locations that are well known to have a very stable Internet bandwidth to the Internet backbone routers.

4. Using Simple shell script to test Internet speed


Fetch and use speedtest.sh

 


wget https://raw.github.com/blackdotsh/curl-speedtest/master/speedtest.sh && chmod u+x speedtest.sh && bash speedtest.sh

 

 

5. Using iperf to test connectivity between two servers 

 

iperf is another good tool worthy to mention that can be used to test the speed between client and server.

To use iperf install it with apt and do on the server machine to which bandwidth will be tested:

 

# iperf -s 

 

On the client machine do:

 

# iperf -c 192.168.1.1 

 

where 192.168.1.1 is the IP of the server where iperf was spawned to listen.

6. Using Netflix fast to determine Internet connection speed on host


Fast

fast is a service provided by Netflix. Its web interface is located at Fast.com and it has a command-line interface available through npm (npm is a package manager for nodejs) so if you don't have it you will have to install it first with:

# apt install –yes npm

 

Note that if you run on Debian this will install you some 249 new nodejs packages which you might not want to have on the system, so this is useful only for machines that has already use of nodejs.

 

$ fast

 

     82 Mbps ↓


The command returns your Internet download speed. To get your upload speed, use the -u flag:

 

$ fast -u

 

   ⠧ 80 Mbps ↓ / 8.2 Mbps ↑

 

7. Use speedometer / iftop to measure incoming and outgoing traffic on interface


If you're measuring connectivity on a live production server system, then you might consider that the measurement output might not be exactly correct especially if you're measuring the Uplink / Downlink on a Heavy loaded webserver / Mail Server / Samba or DNS server.
If this is the case a very useful tools to consider to extract the already taken traffic used on your Incoming and Outgoing ( TX / RX ) Network interfaces
are speedometer and iftop, they're present and installable depending on the OS via yum / apt or the respective package manager.

 


To install on Debian server:

 

 

 

# apt install –yes iftop speedometer

 


The most basic use to check the live received traffic in a nice Ncurses like text graphic is with: 

 

 

 

 

# speedometer -r 


speedometer-check-received-transmitted-network-traffic-on-linux1

To generate real time ASCII art graph on RX / TX traffic do:

 

 

# speedometer -r eth0 -t eth0


speedometer-check-received-transmitted-network-traffic-on-linux

 

 

 

 

# iftop -P -i eth0

 

 


iftop-show-statistics-on-connections-screenshot-pcfreak

 

 

 

 

 

Monitoring Linux hardware Hard Drives / Temperature and Disk with lm_sensors / smartd / hddtemp and Zabbix Userparameter lm_sensors report script

Thursday, April 30th, 2020

monitoring-linux-hardware-with-software-temperature-disk-cpu-health-zabbix-userparameter-script

I'm part of a  SysAdmin Team that is partially doing some minor Zabbix imrovements on a custom corporate installed Zabbix in an ongoing project to substitute the previous HP OpenView monitoring for a bunch of Legacy Linux hosts.
As one of the necessery checks to have is regarding system Hardware, the task was to invent some simplistic way to monitor hardware with the Zabbix Monitoring tool.  Monitoring Bare Metal servers hardware of HP / Dell / Fujituse etc. servers  in Linux usually is done with a third party software provided by the Hardware vendor. But as this requires an additional services to run and sometimes is not desired. It was interesting to find out some alternative Linux native ways to do the System hardware monitoring.
Monitoring statistics from the system hardware components can be obtained directly from the server components with ipmi / ipmitool (for more info on it check my previous article Reset and Manage intelligent  Platform Management remote board article).
With ipmi
 hardware health info could be received straight from the ILO / IDRAC / HPMI of the server. However as often the Admin-Lan of the server is in a seperate DMZ secured network and available via only a certain set of routed IPs, ipmitool can't be used.

So what are the other options to use to implement Linux Server Hardware Monitoring?

The tools to use are perhaps many but I know of two which gives you most of the information you ever need to have a prelimitary hardware damage warning system before the crash, these are:
 

1. smartmontools (smartd)

Smartd is part of smartmontools package which contains two utility programs (smartctl and smartd) to control and monitor storage systems using the Self-Monitoring, Analysis and Reporting Technology system (SMART) built into most modern ATA/SATA, SCSI/SAS and NVMe disks

Disk monitoring is handled by a special service the package provides called smartd that does query the Hard Drives periodically aiming to find a warning signs of hardware failures.
The downside of smartd use is that it implies a little bit of extra load on Hard Drive read / writes and if misconfigured could reduce the the Hard disk life time.

 

linux:~#  /usr/sbin/smartctl -a /dev/sdb2
smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.19.0-5-amd64] (local build)
Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org

=== START OF INFORMATION SECTION ===
Device Model:     KINGSTON SA400S37240G
Serial Number:    50026B768340AA31
LU WWN Device Id: 5 0026b7 68340aa31
Firmware Version: S1Z40102
User Capacity:    240,057,409,536 bytes [240 GB]
Sector Size:      512 bytes logical/physical
Rotation Rate:    Solid State Device
Device is:        Not in smartctl database [for details use: -P showall]
ATA Version is:   ACS-3 T13/2161-D revision 4
SATA Version is:  SATA 3.2, 6.0 Gb/s (current: 3.0 Gb/s)
Local Time is:    Thu Apr 30 14:05:01 2020 EEST
SMART support is: Available – device has SMART capability.
SMART support is: Enabled

=== START OF READ SMART DATA SECTION ===
SMART overall-health self-assessment test result: PASSED

General SMART Values:
Offline data collection status:  (0x00) Offline data collection activity
                                        was never started.
                                        Auto Offline Data Collection: Disabled.
Self-test execution status:      (   0) The previous self-test routine completed
                                        without error or no self-test has ever
                                        been run.
Total time to complete Offline
data collection:                (  120) seconds.
Offline data collection
capabilities:                    (0x11) SMART execute Offline immediate.
                                        No Auto Offline data collection support.
                                        Suspend Offline collection upon new
                                        command.
                                        No Offline surface scan supported.
                                        Self-test supported.
                                        No Conveyance Self-test supported.
                                        No Selective Self-test supported.
SMART capabilities:            (0x0002) Does not save SMART data before
                                        entering power-saving mode.
                                        Supports SMART auto save timer.
Error logging capability:        (0x01) Error logging supported.
                                        General Purpose Logging supported.
Short self-test routine
recommended polling time:        (   2) minutes.
Extended self-test routine
recommended polling time:        (  10) minutes.

SMART Attributes Data Structure revision number: 1
Vendor Specific SMART Attributes with Thresholds:
ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
  1 Raw_Read_Error_Rate     0x0032   100   100   000    Old_age   Always       –       100
  9 Power_On_Hours          0x0032   100   100   000    Old_age   Always       –       2820
 12 Power_Cycle_Count       0x0032   100   100   000    Old_age   Always       –       21
148 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
149 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
167 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
168 Unknown_Attribute       0x0012   100   100   000    Old_age   Always       –       0
169 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
170 Unknown_Attribute       0x0000   100   100   010    Old_age   Offline      –       0
172 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
173 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       0
181 Program_Fail_Cnt_Total  0x0032   100   100   000    Old_age   Always       –       0
182 Erase_Fail_Count_Total  0x0000   100   100   000    Old_age   Offline      –       0
187 Reported_Uncorrect      0x0032   100   100   000    Old_age   Always       –       0
192 Power-Off_Retract_Count 0x0012   100   100   000    Old_age   Always       –       16
194 Temperature_Celsius     0x0022   034   052   000    Old_age   Always       –       34 (Min/Max 19/52)
196 Reallocated_Event_Count 0x0032   100   100   000    Old_age   Always       –       0
199 UDMA_CRC_Error_Count    0x0032   100   100   000    Old_age   Always       –       0
218 Unknown_Attribute       0x0032   100   100   000    Old_age   Always       –       0
231 Temperature_Celsius     0x0000   097   097   000    Old_age   Offline      –       97
233 Media_Wearout_Indicator 0x0032   100   100   000    Old_age   Always       –       2104
241 Total_LBAs_Written      0x0032   100   100   000    Old_age   Always       –       1857
242 Total_LBAs_Read         0x0032   100   100   000    Old_age   Always       –       1141
244 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       32
245 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       107
246 Unknown_Attribute       0x0000   100   100   000    Old_age   Offline      –       15940

SMART Error Log Version: 1
No Errors Logged

SMART Self-test log structure revision number 1
No self-tests have been logged.  [To run self-tests, use: smartctl -t]

Selective Self-tests/Logging not supported

 

2. hddtemp

 

Usually if smartd is used it is useful to also use hddtemp which relies on smartd data.
 The hddtemp program monitors and reports the temperature of PATA, SATA
 or SCSI hard drives by reading Self-Monitoring Analysis and Reporting
 Technology (S.M.A.R.T.)
information on drives that support this feature.
 

linux:~# /usr/sbin/hddtemp /dev/sda1
/dev/sda1: Hitachi HDS721050CLA360: 31°C
linux:~# /usr/sbin/hddtemp /dev/sdc6
/dev/sdc6: KINGSTON SV300S37A120G: 25°C
linux:~# /usr/sbin/hddtemp /dev/sdb2
/dev/sdb2: KINGSTON SA400S37240G: 34°C
linux:~# /usr/sbin/hddtemp /dev/sdd1
/dev/sdd1: WD Elements 10B8: S.M.A.R.T. not available

 

 

3. lm-sensors / i2c-tools 

 Lm-sensors is a hardware health monitoring package for Linux. It allows you
 to access information from temperature, voltage, and fan speed sensors.
i2c-tools
was historically bundled in the same package as lm_sensors but has been seperated cause not all hardware monitoring chips are I2C devices, and not all I2C devices are hardware monitoring chips.

The most basic use of lm-sensors is with the sensors command

 

linux:~# sensors
i350bb-pci-0600
Adapter: PCI adapter
loc1:         +55.0 C  (high = +120.0 C, crit = +110.0 C)

 

coretemp-isa-0000
Adapter: ISA adapter
Physical id 0:  +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 0:         +26.0 C  (high = +78.0 C, crit = +88.0 C)
Core 1:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 2:         +28.0 C  (high = +78.0 C, crit = +88.0 C)
Core 3:         +28.0 C  (high = +78.0 C, crit = +88.0 C)

 


On CentOS Linux useful tool is also  lm_sensors-sensord.x86_64 – A Daemon that periodically logs sensor readings to syslog or a round-robin database, and warns of sensor alarms.

In Debian Linux there is also the psensors-server (an HTTP server providing JSON Web service which can be used by GTK+ Application to remotely monitor sensors) useful for developers
psesors-server

psensor-linux-graphical-tool-to-check-cpu-hard-disk-temperature-unix

If you have a Xserver installed on the Server accessed with Xclient or via VNC though quite rare,
You can use xsensors or Psensora GTK+ (Widget Toolkit for creating Graphical User Interface) application software.

With this 3 tools it is pretty easy to script one liners and use the Zabbix UserParameters functionality to send hardware report data to a Company's Zabbix Sserver, though Zabbix has already some templates to do so in my case, I couldn't import this templates cause I don't have Zabbix Super-Admin credentials, thus to work around that a sample work around is use script to monitor for higher and critical considered temperature.
Here is a tiny sample script I came up in 1 min time it can be used to used as 1 liner UserParameter and built upon something more complex.

SENSORS_HIGH=`sensors | awk '{ print $6 }'| grep '^+' | uniq`;
SENSORS_CRIT=`sensors | awk '{ print $9 }'| grep '^+' | uniq`; ;SENSORS_STAT=`sensors|grep -E 'Core\s' | awk '{ print $1" "$2" "$3 }' | grep "$SENSORS_HIGH|$SENSORS_CRIT"`;
if [ ! -z $SENSORS_STAT ]; then
echo 'Temperature HIGH';
else 
echo 'Sensors OK';
fi 

Of course there is much more sophisticated stuff to use for monitoring out there


Below script can be easily adapted and use on other Monitoring Platforms such as Nagios / Munin / Cacti / Icinga and there are plenty of paid solutions, but for anyone that wants to develop something from scratch just like me I hope this
article will be a good short introduction.
If you know some other Linux hardware monitoring tools, please share.

Automatic network restart and reboot Linux server script if ping timeout to gateway is not responding as a way to reduce connectivity downtimes

Monday, December 10th, 2018

automatic-server-network-restart-and-reboot-script-if-connection-to-server-gateway-inavailable-tux-penguing-ascii-art-bin-bash

Inability of server to come back online server automaticallyafter electricity / network outage

These days my home server  is experiencing a lot of issues due to Electricity Power Outages, a construction dig operations to fix / change waterpipe tubes near my home are in action and perhaps the power cables got ruptered by the digger machine.
The effect of all this was that my server networking accessability was affected and as I didn't have network I couldn't access it remotely anymore at a certain point the electricity was restored (and the UPS charge could keep the server up), however the server accessibility did not due restore until I asked a relative to restart it or under a more complicated cases where Tech aquanted guy has to help – Alexander (Alex) a close friend from school years check his old site here – alex.www.pc-freak.net helps a lot.to restart the machine physically either run a quick restoration commands on root TTY terminal or generally do check whether default router is reachable.

This kind of Pc-Freak.net downtime issues over the last month become too frequent (the machine was down about 5 times for 2 to 5 hours and this was too much (and weirdly enough it was not accessible from the internet even after electricity network was restored and the only solution to that was a physical server restart (from the Power Button).

To decrease the number of cases in which known relatives or friends has to  physically go to the server and restart it, each time after network or electricity outage I wrote a small script to check accessibility towards Default defined Network Gateway for my server with few ICMP packages sent with good old PING command
and trigger a network restart and system reboot
(in case if the network restart does fail) in a row.

1. Create reboot-if-nwork-is-downsh script under /usr/sbin or other dir

Here is the script itself:

 

#!/bin/sh
# Script checks with ping 5 ICMP pings 10 times to DEF GW and if so
# triggers networking restart /etc/inid.d/networking restart
# Then does another 5 x 10 PINGS and if ping command returns errors,
# Reboots machine
# This script is useful if you run home router with Linux and you have
# electricity outages and machine doesn't go up if not rebooted in that case

GATEWAY_HOST='192.168.0.1';

run_ping () {
for i in $(seq 1 10); do
    ping -c 5 $GATEWAY_HOST
done

}

reboot_f () {
if [ $? -eq 0 ]; then
        echo "$(date "+%Y-%m-%d %H:%M:%S") Ping to $GATEWAY_HOST OK" >> /var/log/reboot.log
    else
    /etc/init.d/networking restart
        echo "$(date "+%Y-%m-%d %H:%M:%S") Restarted Network Interfaces:" >> /tmp/rebooted.txt
    for i in $(seq 1 10); do ping -c 5 $GATEWAY_HOST; done
    if [ $? -eq 0 ] && [ $(cat /tmp/rebooted.txt) -lt ‘5’ ]; then
         echo "$(date "+%Y-%m-%d %H:%M:%S") Ping to $GATEWAY_HOST FAILED !!! REBOOTING." >> /var/log/reboot.log
        /sbin/reboot

    # increment 5 times until stop
    [[ -f /tmp/rebooted.txt ]] || echo 0 > /tmp/rebooted.txt
    n=$(< /tmp/rebooted.txt)
        echo $(( n + 1 )) > /tmp/rebooted.txt
    fi
    # if 5 times rebooted sleep 30 mins and reset counter
    if [ $(cat /tmprebooted.txt) -eq ‘5’ ]; then
    sleep 1800
        cat /dev/null > /tmp/rebooted.txt
    fi
fi

}
run_ping;
reboot_f;

You can download a copy of reboot-if-nwork-is-down.sh script here.

As you see in script successful runs  as well as its failures are logged on server in /var/log/reboot.log with respective timestamp.
Also a counter to 5 is kept in /tmp/rebooted.txt, incremented on each and every script run (rebooting) if, the 5 times increment is matched

a sleep is executed for 30 minutes and the counter is being restarted.
The counter check to 5 guarantees the server will not get restarted if access to Gateway is not continuing for a long time to prevent the system is not being restarted like crazy all time.
 

2. Create a cron job to run reboot-if-nwork-is-down.sh every 15 minutes or so 

I've set the script to re-run in a scheduled (root user) cron job every 15 minutes with following  job:

To add the script to the existing cron rules without rewriting my old cron jobs and without tempering to use cronta -u root -e (e.g. do the cron job add in a non-interactive mode with a single bash script one liner had to run following command:

 

{ crontab -l; echo "*/15 * * * * /usr/sbin/reboot-if-nwork-is-down.sh 2>&1 >/dev/null; } | crontab –


I know restarting a server to restore accessibility is a stupid practice but for home-use or small client servers with unguaranteed networks with a cheap Uninterruptable Power Supply (UPS) devices it is useful.

Summary

Time will show how efficient such a  "self-healing script practice is.
Even though I'm pretty sure that even in a Corporate businesses and large Public / Private Hybrid Clouds where access to remote mounted NFS / XFS / ZFS filesystems are failing a modifications of the script could save you a lot of nerves and troubles and unhappy customers / managers screaming at you on the phone 🙂


I'll be interested to hear from others who have a better  ideas to restore ( resurrect ) access to inessible Linux server after an outage.?