servers Archives - ☩ Walking in Light with Christ - Faith, Computing, Diary ☩ Walking in Light with Christ

Posts Tagged ‘servers’

Configure aide file integrity check server monitoring in Zabbix to track for file changes on servers

Tuesday, March 28th, 2023

Earlier I've written a small article on how to setup AIDE monitoring for Server File integrity check on Linux, which put the basics on how this handy software to improve your server overall Security software can be installed and setup without much hassle.

Once AIDE is setup and a preset custom configuration is prepared for AIDE it is pretty useful to configure AIDE to monitor its critical file changes for better server security by monitoring the AIDE log output for new record occurs with Zabbix. Usually if no files monitored by AIDE are modified on the machine, the log size will not grow, but if some file is modified once Advanced Linux Intrusion Detecting (aide) binary runs via the scheduled Cron job, the /var/log/app_aide.log file will grow zabbix-agentd will continuously check the file for size increases and will react.

Before setting up the Zabbix required Template, you will have to set few small scripts that will be reading a preconfigured list of binaries or application files etc. that aide will monitor lets say via /etc/aide-custom.conf

1. Configure aide to monitor files for changes

Before running aide, it is a good idea to prepare a file with custom defined directories and files that you plan to monitor for integrity checking e.g. future changes with aide, for example to capture bad intruders who breaks into server which runs aide and modifies critical files such as /etc/passwd /etc/shadow /etc/group or / /usr/local/etc/* or /var/* / /usr/* critical files that shouldn't be allowed to change without the admin to soon find out.

# cat /etc/aide-custom.conf

# Example configuration file for AIDE.
@@define DBDIR /var/lib/aide
@@define LOGDIR /var/log/aide
# The location of the database to be read.
database=file:@@{DBDIR}/app_custom.db.gz
database_out=file:@@{DBDIR}/app_aide.db.new.gz
gzip_dbout=yes
verbose=5

report_url=file:@@{LOGDIR}/app_custom.log
#report_url=syslog:LOG_LOCAL2
#report_url=stderr
#NOT IMPLEMENTED report_url=mailto:root@foo.com
#NOT IMPLEMENTED report_url=syslog:LOG_AUTH

# These are the default rules.
#
#p:      permissions
#i:      inode:
#n:      number of links
#u:      user
#g:      group
#s:      size
#b:      block count
#m:      mtime
#a:      atime
#c:      ctime
#S:      check for growing size
#acl:           Access Control Lists
#selinux        SELinux security context
#xattrs:        Extended file attributes
#md5:    md5 checksum
#sha1:   sha1 checksum
#sha256:        sha256 checksum
#sha512:        sha512 checksum
#rmd160: rmd160 checksum
#tiger: tiger checksum

#haval: haval checksum (MHASH only)
#gost:   gost checksum (MHASH only)
#crc32: crc32 checksum (MHASH only)
#whirlpool:     whirlpool checksum (MHASH only)

FIPSR = p+i+n+u+g+s+m+c+acl+selinux+xattrs+sha256

#R:             p+i+n+u+g+s+m+c+acl+selinux+xattrs+md5
#L:             p+i+n+u+g+acl+selinux+xattrs
#E:             Empty group
#>:             Growing logfile p+u+g+i+n+S+acl+selinux+xattrs

# You can create custom rules like this.
# With MHASH…
# ALLXTRAHASHES = sha1+rmd160+sha256+sha512+whirlpool+tiger+haval+gost+crc32
ALLXTRAHASHES = sha1+rmd160+sha256+sha512+tiger
# Everything but access time (Ie. all changes)
EVERYTHING = R+ALLXTRAHASHES

# Sane, with multiple hashes
# NORMAL = R+rmd160+sha256+whirlpool
NORMAL = FIPSR+sha512

# For directories, don't bother doing hashes
DIR = p+i+n+u+g+acl+selinux+xattrs

# Access control only
PERMS = p+i+u+g+acl+selinux

# Logfile are special, in that they often change
LOG = >

# Just do sha256 and sha512 hashes
LSPP = FIPSR+sha512

# Some files get updated automatically, so the inode/ctime/mtime change
# but we want to know when the data inside them changes
DATAONLY = p+n+u+g+s+acl+selinux+xattrs+sha256

##############TOUPDATE
#To delegate to app team create a file like /app/aide.conf
#and uncomment the following line
#@@include /app/aide.conf
#Then remove all the following lines
/etc/zabbix/scripts/check.sh FIPSR
/etc/zabbix/zabbix_agentd.conf FIPSR
/etc/sudoers FIPSR
/etc/hosts FIPSR
/etc/keepalived/keepalived.conf FIPSR
# monitor haproxy.cfg
/etc/haproxy/haproxy.cfg FIPSR
# monitor keepalived
/home/keepalived/.ssh/id_rsa FIPSR
/home/keepalived/.ssh/id_rsa.pub FIPSR
/home/keepalived/.ssh/authorized_keys FIPSR
/usr/local/bin/script_to_run.sh FIPSR
/usr/local/bin/another_script_to_monitor_for_changes FIPSR

# cat /usr/local/bin/aide-config-check.sh
#!/bin/bash
/sbin/aide -c /etc/aide-custom.conf -D

# cat /usr/local/bin/aide-init.sh
#!/bin/bash
/sbin/aide -c /etc/custom-aide.conf -B database_out=file:/var/lib/aide/custom-aide.db.gz -i

# cat /usr/local/bin/aide-check.sh

#!/bin/bash
/sbin/aide -c /etc/custom-aide.conf -Breport_url=stdout -B database=file:/var/lib/aide/custom-aide.db.gz -C|/bin/tee -a /var/log/aide/custom-aide-check.log|/bin/logger -t custom-aide-check-report
/usr/local/bin/aide-init.sh

# cat /usr/local/bin/aide_app_cron_daily.txt

#!/bin/bash
#If first time, we need to init the DB
if [ ! -f /var/lib/aide/app_aide.db.gz ]
then
logger -p local2.info -t app-aide-check-report "Generating NEW AIDE DATABASE for APPLICATION"
nice -n 18 /sbin/aide –init -c /etc/aide_custom.conf
mv /var/lib/aide/app_aide.db.new.gz /var/lib/aide/app_aide.db.gz
fi

nice -n 18 /sbin/aide –update -c /etc/aide_app.conf
#since the option for syslog seems not fully implemented we need to push logs via logger
/bin/logger -f /var/log/aide/app_aide.log -p local2.info -t app-aide-check-report
#Acknoledge the new database as the primary (every results are sended to syslog anyway)
mv /var/lib/aide/app_aide.db.new.gz /var/lib/aide/app_aide.db.gz

What above cron job does is pretty simple, as you can read it yourself. If the configuration predefined aide database store file /var/lib/aide/app_aide.db.gz, does not
exists aide will create its fresh empty database and generate a report for all predefined files with respective checksums to be stored as a comparison baseline for file changes.

Next there is a line to write aide file changes via rsyslog through the logger and local2.info handler

2. Setup Zabbix Template with Items, Triggers and set Action

2.1 Create new Template and name it YourAppName APP-LB File integrity Check

aide-itengrity-check-zabbix_ Configuration of templates

Then setup the required Items, that will be using zabbix's Skip embedded function to scan file in a predefined period of file, this is done by the zabbix-agent that is
supposed to run on the server.

2.2 Configure Item like

aide-zabbix-triggers-screenshot

*Name: check aide log file

Type: zabbix (active)

log[/var/log/aide/app_aide.log,^File.*,,,skip]

Type of information: Log

Update Interval: 30s

Applications: File Integrity Check

Configure Trigger like

Enabled: Tick On

images/aide-zabbix-screenshots/check-aide-log-item

2.3 Create Triggers with the respective regular expressions, that would check the aide generated log file for file modifications

Configure Trigger like

Enabled: Tick On

*Name: Someone modified {{ITEM.VALUE}.regsub("(.*)", \1)}

*Expression: {PROD APP-LB File Integrity Check:log[/var/log/aide/app_aide.log,^File.*,,,skip].strlen()}>=1

Allow manual close: yes tick

*Description: Someone modified {{ITEM.VALUE}.regsub("(.*)", \1)} on {HOST.NAME}

2.4 Configure Action

aide-zabbix-file-monitoring-action-screensho

Now assuming the Zabbix server has a properly set media for communication and you set Alerting rules zabbix-server can be easily set tosend mails to a Support email to get Notifications Alerts, everytime a monitored file by aide gets changed.

That's all folks ! Enjoy being notified on every file change on your servers !

Tags: acl, ALLXTRAHASHES, application, Applications File Integrity Check, bash, cat, check, checksum, configure, Enabled, gz, information, integrity, lib, logs, reading, sbin, selinux, Server File, servers, usr, var
Posted in Computer Security, Linux, System Administration, Zabbix | 4 Comments »

Linux extending life time for a damaged hard drive server tricks on a live server. Force fcsk on next reboot.Read-only file system error solutions

Friday, February 17th, 2023

linux-extending-life-time-for-a-damaged-hard-drive-server-tricks-can-not-read-superblock-linux-force-fsck-on-next-reboot

In our daily work as system administrators we have some very old Legacy systems running Clustered High Availability proxies using CRM (Cluster Resource Manager) and some legacy systems still using Heartbeat to manage the cluster instead of the newer and modern Corosync variant.

The HA cluster is only 2 nodes Linux machine and running the obscure already long time unsupported version of Redhat 5.11 (Ootpa) who was officially became stable distant year 1998 (yeath the years were good) and whose EOL (End of Life) has been reached long time ago and the OS is no longer supported, however for about 14 years the machines has been running perfectly fine until one of the Cluster nodes managed by ocf::heartbeat:IPAddr2 , that is /etc/ha.d/resource.d/IPAddr2 shell script. Yeah for the newbies Heartbeat Application Cluster in Linux does work like that it uses a number of extendable pair of shell scripts written for different kind of Network / Web / Mail / SQL or whatever services HA management.

The first node configured however, started failing due to some errors like:

EXT3-fs error (device dm-1): ext3_journal_start_sb: Detected aborted journal
sd 0:2:0:0: rejecting I/O to offline device
Aborting journal on device sda1.
sd 0:2:0:0: rejecting I/O to offline device
printk: 159 messages suppressed.
Buffer I/O error on device sda1, logical block 526
lost page write due to I/O error on sda1
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
ext3_abort called.
EXT3-fs error (device sda1): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
sd 0:2:0:0: rejecting I/O to offline device
megaraid_sas: FW was restarted successfully, initiating next stage…
megaraid_sas: HBA recovery state machine, state 2 starting…
megasas: Waiting for FW to come to ready state
megasas: FW in FAULT state!!
FW state [-268435456] hasn't changed in 180 secs
megaraid_sas: out: controller is not in ready state
megasas: waiting_for_outstanding: after issue OCR.
megasas: waiting_for_outstanding: before issue OCR. FW state = f0000000
megaraid_sas: pending commands remain even after reset handling. megasas[0]: Dumping Frame Phys Address of all pending cmds in FW
megasas[0]: Total OS Pending cmds : 0 megasas[0]: 64 bit SGLs were sent to FW
megasas[0]: Pending OS cmds in FW :

The result out of that was a frequently the filesystem of the machine got re-mounted as Read Only and of course that is
quite bad if you have a running processess of haproxy that should be able to be living their and take up some Web traffic
for high availability and you run all the traffic only on the 2nd pair of machine.

This of course was a clear sign for a failing disks or some hit bad blocks regions or as the messages indicates, some
problem with system hardware or Raid SAS Array.

The physical raid on the system, just like rest of the hardware is very old stuff as well.

[root@haproxy_lb_node1 ~]# lspci |grep -i RAI
01:00.0 RAID bus controller: LSI Logic / Symbios Logic MegaRAID SAS 2108 [Liberator] (rev 05)

The produced errors not only made the machine to auto-mount its root / filesystem in Read-Only mode but besides has most
likely made the machine to automatically reboot every few days or few times every day in a raw.

The second Load Balancer node2 did operated perfectly, and we thought that we might just keep the broken machine in that half running
and inconsistent state for few weeks until we have built the new machines with Pre-Installed new haproxy cluster with modern
RedHat Linux 8.6 distribution, but since we have to follow SLAs (Service Line Agreements) with Customers and the end services behind the
High Availability (HA) Haproxy cluster were at danger …

We as sysadmins had the task to make our best to try to stabilize the unstable node with disk errors for the system to servive
and be able to normally serve traffic (if node2 that is in a separate Data center fails due to a hardware or electricity issues etc.).

Here is few steps we took, that has hopefully improved the situation.

1. Make backups of most important files of high importance

Always before doing anything with a broken system, prepare backup of the most important files, if that is a cluster that should be a backup of the cluster configurations (if you don't have already ones) backup of /etc/hosts / backup of any important services configs /etc/haproxy/haproxy.cfg /etc/postfix/postfix.cfg (like it was my case), preferrably backup of whole /etc/ any important files from /root/ or /home/users* directories backup of at leasts latest logs from /var/log etc.

2. Clear up all unnecessery services scripts from the server

Any additional Softwares / Services and integrity checking tools (daemons) / scripts and cron jobs, were immediately stopped and wheter unused removed.

E.g. we had moved through /etc/cron* to check what's there,

# ls -ld /etc/cron.*
drwx—— 2 root root 4096 Feb 7 18:13 /etc/cron.d
drwxr-xr-x 2 root root 4096 Feb 7 17:59 /etc/cron.daily
-rw-r–r– 1 root root 0 Jul 20 2010 /etc/cron.deny
drwxr-xr-x 2 root root 4096 Jan 9 2013 /etc/cron.hourly
drwxr-xr-x 2 root root 4096 Jan 9 2013 /etc/cron.monthly
drwxr-xr-x 2 root root 4096 Aug 26 2015 /etc/cron.weekly

And like well professional butchers removed everything unnecessery that could trigger any extra unnecessery disk read / writes to HDD.

E.g. just create

# mkdir -p /root/etc_old/{/etc/cron.d,\
/etc/cron.daily,/etc/cron.hourly,/etc/cron.monthly\
,/etc/cron.weekly}

And moved all unnecessery cron job scripts like:

1. nmon (old school network / memory / hard disk console tool for monitoring and tuning server parameters)
2. clamscan / freshclam crons
3. mlocate (the script that is taking care for periodic run of updatedb command to keep the locate command to easily search
for files inside the DB to put less read operations on disk in case if you need to find file (e.g. prevent yourself to everytime
run cmd like: find / . -iname '*whatever_you_look_for*'
4. cups cron jobs
5. logwatch cron
6. rkhunter stuff
7. logrotate (yes we stopped even logrotation trigger job as we found the server was crashing sometimes at the same time when
the lograte job to rotate logs inside /var/log/* was running perhaps leading to a hit of the I/O read error (bad blocks).

Also inspected the Administrator user root cron job for any unwated scripts and stopped two report bash scripts that were part of the PCI tightened Security procedures.
Therein found script responsible to periodically report the list of installed packages and if they have not changed, as well a script to periodically report via email the list of
/etc/{passwd,/etc/shadow} created users, used to historically keep an eye on the list of users and easily see if someone
has created new users on the machine. Those were enabled via /var/spool/cron/root cron jobs, in other cases, on other machines if it happens for you
it is a good idea to check out all the existing user cron jobs and stop anything that might be putting Read / Write extra heat pressure on machine attached the Hard drives.

# ls -al /var/spool/cron/
total 20
drwx—— 2 root root 4096 Nov 13 2015 .
drwxr-xr-x 12 root root 4096 May 11 2011 ..
-rw——- 1 root root 133 Nov 13 2015 root

3. Clear up old log files and any files unnecessery

Under /var/log and /home /var/tmp /var/spool/tmp immediately try to clear up the old log files.
From my past experience this has many times made the FS file inodes that are storing on a unbroken part (good blocks) of the hard drive and
ready to be reused by newly written rsyslog / syslogd services spitted files.

!!! Note that during the removal of some files you might hit a files stored on a bad blocks that might lead to a unexpected system reboot.

But that's okay, don't worry most likely after a hard reset by a technician in the Datacenter the machine will boot again and you can enjoy
removing remaining still files to send them to the heaven for old files.

4. Trigger an automatic system file system check with fsck on next boot

The standard way to force a Linux to aumatically recheck its Root filesystem is to simply create the /forcefsck to root partition or any other secondary disk partition you would like to check.

# touch /forcefsck

# reboot

However at some occasions you might be unable to do it because, the / (root fs) has been remounted in ReadOnly mode, yackes …

Luckily old Linux distibutions like this RHEL 5.1, has a way to force a filesystem check after reboot fsck and identify any
unknown bad-blocks and hopefully succceed in isolating them, so you don't hit into the same auto-reboots if the hard drive or Software / Hardware RAID
is not in terrible state, you can use an option built in in /sbin/shutdown command the '-F'

-F Force fsck on reboot.

Hence to make the machine reboot and trigger immediately fsck:

# shutdown -rF now

Just In case you wonder why to reboot before check the Filesystem. Well simply because you need to have them unmounted before you check.

In that specific case this produced so far a good result and the machine booted just fine and we crossed the fingers and prayed that the machine would work flawlessly in the coming few weeks, before we finalize the configuration of the substitute machines, where this old infrastructure will be migrated to a new built cluster with new Haproxy and Corosync / Pacemaker Cluster on a brand new RHEL.

NB! On newer machines this won't work however as shutdown command has been stripped off this option because no SystemV (SystemInit) or Upstart and not on SystemD newer services architecture.

5. Hints on checking the hard drives with fsck

If you happen to be able to have physical access to the remote Hardare machine via a TTY[1-9] Console, that's even better and is the standard way to do it but with this specific case we had no easy way to get access to the Physical server console.

It is even better to go there and via either via connected Monitor (Display) or KVM Switch (Those who hear KVM switch first time this is a great device in server rooms to connect multiple monitors to same Monitor Display), it is better to use a some of the multitude of options to choose from for USB Distro Linux recovery OS versions or a CDROM / DVD on older machines like this with the Redhat's recovery mode rolled on.
After mounting the partition simply check each of the disks
e.g. :

# fsck -y /dev/sdb
# fsck -y /dev/sdc

Or if you want to not waste time and look for each hard drive but directly check all the ones that are attached and known by Linux distro via /etc/fstab definition run:

# fsck -AR

If necessery and you have a mixture of filesystems for example EXT3 , EXT4 , REISERFS you can tell it to omit some filesystem, for example ext3, like that:

# fsck -AR -t noext3 -y

To skip fsck on mounted partitions with fsck:

# fsck -M /dev/sdb

One remark to make here on fsck is usually fsck to complete its job on various filesystem it uses other external component binaries usually stored in /sbin/fsck*

# ls -al /sbin/fsck*
-rwxr-xr-x 1 root root 55576 20 яну 2022 /sbin/fsck*
-rwxr-xr-x 1 root root 43272 20 яну 2022 /sbin/fsck.cramfs*
lrwxrwxrwx 1 root root 9 4 юли 2020 /sbin/fsck.exfat -> exfatfsck*
lrwxrwxrwx 1 root root 6 7 юни 2021 /sbin/fsck.ext2 -> e2fsck*
lrwxrwxrwx 1 root root 6 7 юни 2021 /sbin/fsck.ext3 -> e2fsck*
lrwxrwxrwx 1 root root 6 7 юни 2021 /sbin/fsck.ext4 -> e2fsck*
-rwxr-xr-x 1 root root 84208 8 фев 2021 /sbin/fsck.fat*
-rwxr-xr-x 2 root root 393040 30 ное 2009 /sbin/fsck.jfs*
-rwxr-xr-x 1 root root 125184 20 яну 2022 /sbin/fsck.minix*
lrwxrwxrwx 1 root root 8 8 фев 2021 /sbin/fsck.msdos -> fsck.fat*
-rwxr-xr-x 1 root root 333 16 дек 2021 /sbin/fsck.nfs*
lrwxrwxrwx 1 root root 8 8 фев 2021 /sbin/fsck.vfat -> fsck.fat*

6. Using tune2fs to adjust tunable filesystem parameters on ext2/ext3/ext4 filesystems (few examples)

a) To check whether really the filesystem was checked on boot time or check a random filesystem on the server for its last check up date with fsck:

# tune2fs -l /dev/sda1 | grep checked
Last checked: Wed Apr 17 11:04:44 2019

On some distributions like old Debian and Ubuntu, it is even possible to enable fsck to log its operations during check on reboot via changing the verbosity from NO to YES:

# sed -i "s/#VERBOSE=no/VERBOSE=yes/" /etc/default/rcS

If you're having the issues on old Debian Linuxes and not on RHEL it is possible to;

b) Enable all fsck repairs automatic on boot

by running via:

# sed -i "s/FSCKFIX=no/FSCKFIX=yes/" /etc/default/rcS

c) Forcing fcsk check on for server attached Hard Drive Partitions with tune2fs

# tune2fs -c 1 /dev/sdXY

Note that:
tune2fs can force a fsck on each reboot for EXT4, EXT3 and EXT2 filesystems only.

tune2fs can trigger a forced fsck on every reboot using the -c (max-mount-counts) option.
This option sets the number of mounts after which the filesystem will be checked, so setting it to 1 will run fsck each time the computer boots.
Setting it to -1 or 0 resets this (the number of times the filesystem is mounted will be disregarded by e2fsck and the kernel).

For example you could:

d) Set fsck to run a filesystem check every 30 boots, by using -c 30

# tune2fs -c 30 /dev/sdXY

e) Checking whether a Hard Drive has been really checked on the boot

# tune2fs -l /dev/sda1 | grep checked
Last checked: Wed Apr 17 11:04:44 2019

e) Check when was the last time the file system /dev/sdX was checked:

# tune2fs -l /dev/sdX | grep Last\ c
Last checked: Thu Jan 12 20:28:34 2017

f) Check how many times our /dev/sdX filesystem was mounted

# tune2fs -l /dev/sdX | grep Mount
Mount count: 157

g) Check how many mounts are allowed to pass before filesystem check is forced

# tune2fs -l /dev/sdX | grep Max
Maximum mount count: -1

7. Repairing disk / partitions via GRUB fsck.mode and fsck.repair kernel module options

It is also possible to force a fsck.repair on boot via GRUB, but that usually is not an option someone would like as the machine might fail too boot if it hards to repair hardly, however in difficult situations with failing disks temporary enabling it is good idea.

This can be done by including for grub initial config

GRUB_CMDLINE_LINUX_DEFAULT="quiet splash fsck.mode=force fsck.repair=yes"

fsck.mode=force – will force a fsck each time a system boot and keeping that value enabled for a long time inside GRUB is stupid for servers as

sometimes booting could be severely prolonged because of the checks especially with servers with many or slow old hard drives.

fsck.repair=yes – will make the fsck try to repair if it finds bad blocks when checking (be absolutely sure you know, what you're doing if passing this options)

The options can be also set via editing the GRUB boot screen, if you have physical access to the server and don't want to reload the grub loader and possibly make the machine unbootable on next boot.

8. Few more details on how /etc/fstab disk fsck check parameters values for Systemd Linux machines works

The "proper" way on systemd (if we can talk about proper way on Linux) to runs fsck for each filesystem that has a fsck is to pass number greater than 0 set in
/etc/fstab (last column in /etc/fstab), so make sure you edit your /etc/fstab if that's not the case.

The root partition should be set to 1 (first to be checked), while other partitions you want to be checked should be set to 2.

Example /etc/fstab:

# /etc/fstab: static file system information.

/dev/sda1 / ext4 errors=remount-ro 0 1
/dev/sda5 /home ext4 defaults 0 2

The values you can put here as a second number meaning is as follows:
0 – disabled, that is do not check filesystem
1 – partition with this PASS value has a higher priority and is checked first. This value is usually set to the root / partition
2 – partitions with this PASS value will be checked last

a) Check the produced log out of fsck

Unfortunately on the older versions of Linux distros with SystemV fsck log output might be not generated except on the physical console so if you have a kind of duplicator device physical tty on the display port of the server, you might capture some bad block reports or fixed errors messages, but if you don't you might just cross the fingers and hope that anything found FS irregularities was recovered.

On systemd Linux machines the fsck log should be produced either in /run/initramfs/fsck.log or some other location depending on the Linux distro and you should be able to see something from fsck inside /var/log/* logs:

# grep -rli fsck /var/log/*

Close it up

Having a system with failing disk is a really one of the worst sysadmin nightmares to get. The good news is that most of the cases we're prepared with some working backup or some work around stuff like the few steps explained to mitigate the amount of Read / Writes to hard disks on the failing machine HDDs. If the failing disk is a primary Linux filesystem all becomes even worse as every next reboot, you have no guarantee, whether the kernel / initrd or some of the other system components required to run the Core Linux system won't break up the normal boot. Thus one side changes on the hard drives is a risky business on ther other side, if you're in a situation where you have a mirror system or the failing system is just a Linux server installed without a Cluster pair, then this is not a big deal as you can guarantee at least one of the nodes still up, unning and serving. Still doing too much of operations with HDD is always a danger so the steps described, though in most cases leading to improvement on how the system behaves, the system should be considered totally unreliable and closely monitored not only by some monitoring stuff like Zabbix / Prometheus whatever but regularly check the systems state via normal SSH logins. It is important if you have some important datas or logs on the system that are not synchronized to a system node to copy them before doing any of the described operations. After all minimal is backuped, proceed to clear up everything that might be cleared up and still the machine to continue providing most of its functionalities, trigger fsck automatic HDD check on next reboot, reboot, check what is going on and monitor the machine from there on.

Hopefully the few described steps, has helped some sysadmin. There is plenty of things which I've described that might go wrong, even following the described steps, might not help if the machines Storage Drives / SAS / SSD has too much of a damage. But as said in most cases following this few steps would improve the machine state.

Wish you the best of luck!

Tags: backups, default, disk partitions, etc passwd, filesystems, grub, hard drive, HBA, life, linux?, Maximum, older versions, sdb, server, servers, system, time, tricks
Posted in Linux, Performance Tuning, Remote System Administration, System Administration, System Optimization, Various | No Comments »

How to monitor Haproxy Application server backends with Zabbix userparameter autodiscovery scripts

Friday, May 13th, 2022

Haproxy is doing quite a good job in High Availability tasks where traffic towards multiple backend servers has to be redirected based on the available one to sent data from the proxy to.

Lets say haproxy is configured to proxy traffic for App backend machine1 and App backend machine2.

Usually in companies people configure a monitoring like with Icinga or Zabbix / Grafana to keep track on the Application server is always up and running. Sometimes however due to network problems (like burned Network Switch / router or firewall misconfiguration) or even an IP duplicate it might happen that Application server seems to be reporting reachable from some monotoring tool on it but unreachable from Haproxy server -> App backend machine2 but reachable from App backend machine1. And even though haproxy will automatically switch on the traffic from backend machine2 to App machine1. It is a good idea to monitor and be aware that one of the backends is offline from the Haproxy host.
In this article I'll show you how this is possible by using 2 shell scripts and userparameter keys config through the autodiscovery zabbix legacy feature.
Assumably for the setup to work you will need to have as a minimum a Zabbix server installation of version 5.0 or higher.

1. Create the required haproxy_discovery.sh and haproxy_stats.sh scripts

You will have to install the two scripts under some location for example we can put it for more clearness under /etc/zabbix/scripts

[root@haproxy-server1 ]# mkdir /etc/zabbix/scripts

[root@haproxy-server1 scripts]# vim haproxy_discovery.sh
#!/bin/bash
#
# Get list of Frontends and Backends from HAPROXY
# Example: ./haproxy_discovery.sh [/var/lib/haproxy/stats] FRONTEND|BACKEND|SERVERS
# First argument is optional and should be used to set location of your HAPROXY socket
# Second argument is should be either FRONTEND, BACKEND or SERVERS, will default to FRONTEND if not set
#
# !! Make sure the user running this script has Read/Write permissions to that socket !!
#
## haproxy.cfg snippet
# global
# stats socket /var/lib/haproxy/stats mode 666 level admin

HAPROXY_SOCK=""/var/run/haproxy/haproxy.sock
[ -n “$1” ] && echo $1 | grep -q ^/ && HAPROXY_SOCK="$(echo $1 | tr -d '\040\011\012\015')"

if [[ “$1” =~ (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):[0-9]{1,5} ]];
then
HAPROXY_STATS_IP="$1"
QUERYING_METHOD="TCP"
fi

QUERYING_METHOD="${QUERYING_METHOD:-SOCKET}"

query_stats() {
if [[ ${QUERYING_METHOD} == “SOCKET” ]]; then
echo "show stat" | socat ${HAPROXY_SOCK} stdio 2>/dev/null
elif [[ ${QUERYING_METHOD} == “TCP” ]]; then
echo "show stat" | nc ${HAPROXY_STATS_IP//:/ } 2>/dev/null
fi
}

get_stats() {
echo "$(query_stats)" | grep -v "^#"
}

[ -n “$2” ] && shift 1
case $1 in
B*) END="BACKEND" ;;
F*) END="FRONTEND" ;;
S*)
for backend in $(get_stats | grep BACKEND | cut -d, -f1 | uniq); do
for server in $(get_stats | grep "^${backend}," | grep -v BACKEND | grep -v FRONTEND | cut -d, -f2); do
serverlist="$serverlist,\n"'\t\t{\n\t\t\t"{#BACKEND_NAME}":"'$backend'",\n\t\t\t"{#SERVER_NAME}":"'$server'"}'
done
done
echo -e '{\n\t"data":[\n’${serverlist#,}’]}'
exit 0
;;
*) END="FRONTEND" ;;
esac

for frontend in $(get_stats | grep "$END" | cut -d, -f1 | uniq); do
felist="$felist,\n"'\t\t{\n\t\t\t"{#'${END}'_NAME}":"'$frontend'"}'
done
echo -e '{\n\t"data":[\n’${felist#,}’]}'

[root@haproxy-server1 scripts]# vim haproxy_stats.sh
#!/bin/bash
set -o pipefail

if [[ “$1” = /* ]]
then
HAPROXY_SOCKET="$1"
shift 0
else
if [[ “$1” =~ (25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.(25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?):[0-9]{1,5} ]];
then
HAPROXY_STATS_IP="$1"
QUERYING_METHOD="TCP"
shift 1
fi
fi

pxname="$1"
svname="$2"
stat="$3"

DEBUG=${DEBUG:-0}
HAPROXY_SOCKET="${HAPROXY_SOCKET:-/var/run/haproxy/haproxy.sock}"
QUERYING_METHOD="${QUERYING_METHOD:-SOCKET}"
CACHE_STATS_FILEPATH="${CACHE_STATS_FILEPATH:-/var/tmp/haproxy_stats.cache}"
CACHE_STATS_EXPIRATION="${CACHE_STATS_EXPIRATION:-1}" # in minutes
CACHE_INFO_FILEPATH="${CACHE_INFO_FILEPATH:-/var/tmp/haproxy_info.cache}" ## unused
CACHE_INFO_EXPIRATION="${CACHE_INFO_EXPIRATION:-1}" # in minutes ## unused
GET_STATS=${GET_STATS:-1} # when you update stats cache outsise of the script
SOCAT_BIN="$(which socat)"
NC_BIN="$(which nc)"
FLOCK_BIN="$(which flock)"
FLOCK_WAIT=15 # maximum number of seconds that "flock" waits for acquiring a lock
FLOCK_SUFFIX='.lock'
CUR_TIMESTAMP="$(date '+%s')"

debug() {
[ “${DEBUG}” -eq 1 ] && echo "DEBUG: $@" >&2 || true
}

debug "SOCAT_BIN => $SOCAT_BIN"
debug "NC_BIN => $NC_BIN"
debug "FLOCK_BIN => $FLOCK_BIN"
debug "FLOCK_WAIT => $FLOCK_WAIT seconds"
debug "CACHE_FILEPATH => $CACHE_FILEPATH"
debug "CACHE_EXPIRATION => $CACHE_EXPIRATION minutes"
debug "HAPROXY_SOCKET => $HAPROXY_SOCKET"
debug "pxname => $pxname"
debug "svname => $svname"
debug "stat => $stat"

# check if socat is available in path
if [ “$GET_STATS” -eq 1 ] && [[ $QUERYING_METHOD == “SOCKET” && -z “$SOCAT_BIN” ]] || [[ $QUERYING_METHOD == “TCP” && -z “$NC_BIN” ]]
then
echo 'ERROR: cannot find socat binary'
exit 126
fi

# if we are getting stats:
# check if we can write to stats cache file, if it exists
# or cache file path, if it does not exist
# check if HAPROXY socket is writable
# if we are NOT getting stats:
# check if we can read the stats cache file
if [ “$GET_STATS” -eq 1 ]
then
if [ -e “$CACHE_FILEPATH” ] && [ ! -w “$CACHE_FILEPATH” ]
then
echo 'ERROR: stats cache file exists, but is not writable'
exit 126
elif [ ! -w ${CACHE_FILEPATH%/*} ]
then
echo 'ERROR: stats cache file path is not writable'
exit 126
fi
if [[ $QUERYING_METHOD == “SOCKET” && ! -w $HAPROXY_SOCKET ]]
then
echo "ERROR: haproxy socket is not writable"
exit 126
fi
elif [ ! -r “$CACHE_FILEPATH” ]
then
echo 'ERROR: cannot read stats cache file'
exit 126
fi

# index:name:default
MAP="
1:pxname:@
2:svname:@
3:qcur:9999999999
4:qmax:0
5:scur:9999999999
6:smax:0
7:slim:0
8:stot:@
9:bin:9999999999
10:bout:9999999999
11:dreq:9999999999
12:dresp:9999999999
13:ereq:9999999999
14:econ:9999999999
15:eresp:9999999999
16:wretr:9999999999
17:wredis:9999999999
18:status:UNK
19:weight:9999999999
20:act:9999999999
21:bck:9999999999
22:chkfail:9999999999
23:chkdown:9999999999
24:lastchg:9999999999
25:downtime:0
26:qlimit:0
27:pid:@
28:iid:@
29:sid:@
30:throttle:9999999999
31:lbtot:9999999999
32:tracked:9999999999
33:type:9999999999
34:rate:9999999999
35:rate_lim:@
36:rate_max:@
37:check_status:@
38:check_code:@
39:check_duration:9999999999
40:hrsp_1xx:@
41:hrsp_2xx:@
42:hrsp_3xx:@
43:hrsp_4xx:@
44:hrsp_5xx:@
45:hrsp_other:@
46:hanafail:@
47:req_rate:9999999999
48:req_rate_max:@
49:req_tot:9999999999
50:cli_abrt:9999999999
51:srv_abrt:9999999999
52:comp_in:0
53:comp_out:0
54:comp_byp:0
55:comp_rsp:0
56:lastsess:9999999999
57:last_chk:@
58:last_agt:@
59:qtime:0
60:ctime:0
61:rtime:0
62:ttime:0
"

_STAT=$(echo -e "$MAP" | grep :${stat}:)
_INDEX=${_STAT%%:*}
_DEFAULT=${_STAT##*:}

debug "_STAT => $_STAT"
debug "_INDEX => $_INDEX"
debug "_DEFAULT => $_DEFAULT"

# check if requested stat is supported
if [ -z “${_STAT}” ]
then
echo "ERROR: $stat is unsupported"
exit 127
fi

# method to retrieve data from haproxy stats
# usage:
# query_stats "show stat"
query_stats() {
if [[ ${QUERYING_METHOD} == “SOCKET” ]]; then
echo $1 | socat ${HAPROXY_SOCKET} stdio 2>/dev/null
elif [[ ${QUERYING_METHOD} == “TCP” ]]; then
echo $1 | nc ${HAPROXY_STATS_IP//:/ } 2>/dev/null
fi
}

# a generic cache management function, that relies on 'flock'
check_cache() {
local cache_type="${1}"
local cache_filepath="${2}"
local cache_expiration="${3}"
local cache_filemtime
cache_filemtime=$(stat -c '%Y' "${cache_filepath}" 2> /dev/null)
if [ $((cache_filemtime+60*cache_expiration)) -ge ${CUR_TIMESTAMP} ]
then
debug "${cache_type} file found, results are at most ${cache_expiration} minutes stale.."
elif "${FLOCK_BIN}" –exclusive –wait "${FLOCK_WAIT}" 200
then
cache_filemtime=$(stat -c '%Y' "${cache_filepath}" 2> /dev/null)
if [ $((cache_filemtime+60*cache_expiration)) -ge ${CUR_TIMESTAMP} ]
then
debug "${cache_type} file found, results have just been updated by another process.."
else
debug "no ${cache_type} file found, querying haproxy"
query_stats "show ${cache_type}" > "${cache_filepath}"
fi
fi 200> "${cache_filepath}${FLOCK_SUFFIX}"
}

# generate stats cache file if needed
get_stats() {
check_cache 'stat' "${CACHE_STATS_FILEPATH}" ${CACHE_STATS_EXPIRATION}
}

# generate info cache file
## unused at the moment
get_info() {
check_cache 'info' "${CACHE_INFO_FILEPATH}" ${CACHE_INFO_EXPIRATION}
}

# get requested stat from cache file using INDEX offset defined in MAP
# return default value if stat is ""
get() {
# $1: pxname/svname
local _res="$("${FLOCK_BIN}" –shared –wait "${FLOCK_WAIT}" "${CACHE_STATS_FILEPATH}${FLOCK_SUFFIX}" grep $1 "${CACHE_STATS_FILEPATH}")"
if [ -z “${_res}” ]
then
echo "ERROR: bad $pxname/$svname"
exit 127
fi
_res="$(echo $_res | cut -d, -f ${_INDEX})"
if [ -z “${_res}” ] && [[ “${_DEFAULT}” != “@” ]]
then
echo "${_DEFAULT}"
else
echo "${_res}"
fi
}

# not sure why we'd need to split on backslash
# left commented out as an example to override default get() method
# status() {
# get "^${pxname},${svnamem}," $stat | cut -d\ -f1
# }

# this allows for overriding default method of getting stats
# name a function by stat name for additional processing, custom returns, etc.
if type get_${stat} >/dev/null 2>&1
then
debug "found custom query function"
get_stats && get_${stat}
else
debug "using default get() method"
get_stats && get "^${pxname},${svname}," ${stat}
fi

! NB ! Substitute in the script /var/run/haproxy/haproxy.sock with your haproxy socket location

You can download the haproxy_stats.sh here and haproxy_discovery.sh here

2. Create the userparameter_haproxy_backend.conf

[root@haproxy-server1 zabbix_agentd.d]# cat userparameter_haproxy_backend.conf
#
# Discovery Rule
#

# HAProxy Frontend, Backend and Server Discovery rules
UserParameter=haproxy.list.discovery[*],sudo /etc/zabbix/scripts/haproxy_discovery.sh SERVER
UserParameter=haproxy.stats[*],sudo /etc/zabbix/scripts/haproxy_stats.sh $2 $3 $4

# support legacy way

UserParameter=haproxy.stat.downtime[*],sudo /etc/zabbix/scripts/haproxy_stats.sh $2 $3 downtime

UserParameter=haproxy.stat.status[*],sudo /etc/zabbix/scripts/haproxy_stats.sh $2 $3 status

UserParameter=haproxy.stat.last_chk[*],sudo /etc/zabbix/scripts/haproxy_stats.sh $2 $3 last_chk

3. Create new simple template for the Application backend Monitoring and link it to monitored host

create-configuration-template-backend-monitoring

create-template-backend-monitoring-macros

Go to Configuration -> Hosts (find the host) and Link the template to it

4. Restart Zabbix-agent, in while check autodiscovery data is in Zabbix Server

[root@haproxy-server1 ]# systemctl restart zabbix-agent

Check in zabbix the userparameter data arrives, it should not be required to add any Items or Triggers as autodiscovery zabbix feature should automatically create in the server what is required for the data regarding backends to be in.

To view data arrives go to Zabbix config menus:

Configuration -> Hosts -> Hosts: (lookup for the haproxy-server1 hostname)

The autodiscovery should have automatically created the following prototypes

Now if you look inside Latest Data for the Host you should find some information like:

HAProxy Backend [backend1] (3 Items)

HAProxy Server [backend-name_APP/server1]: Connection Response
2022-05-13 14:15:04           History

HAProxy Server [backend-name/server2]: Downtime (hh:mm:ss)
2022-05-13 14:13:57   20:30:42       History

HAProxy Server [bk_name-APP/server1]: Status
2022-05-13 14:14:25   Up (1)       Graph
       ccnrlb01   HAProxy Backend [bk_CCNR_QA_ZVT] (3 Items)

HAProxy Server [bk_name-APP3/server1]: Connection Response
2022-05-13 14:15:05           History

HAProxy Server [bk_name-APP3/server1]: Downtime (hh:mm:ss)
2022-05-13 14:14:00   20:55:20       History

HAProxy Server [bk_name-APP3/server2]: Status
2022-05-13 14:15:08   Up (1)

To make alerting in case if a backend is down which usually you would like only left thing is to configure an Action to deliver alerts to some email address.

Tags: admin, configured, esac, file, frontend, Haproxy Application, How to, index, information, Network Switch, scripts, servers, shell scripts, sid, Zabbix Server
Posted in Bash Scripting, System Optimization, Zabbix | No Comments »

Create Linux High Availability Load Balancer Cluster with Keepalived and Haproxy on Linux

Tuesday, March 15th, 2022

Configuring a Linux HA (High Availibiltiy) for an Application with Haproxy is already used across many Websites on the Internet and serious corporations that has a crucial infrastructure has long time
adopted and used keepalived to provide High Availability Application level Clustering.
Usually companies choose to use HA Clusters with Haproxy with Pacemaker and Corosync cluster tools.
However one common used alternative solution if you don't have the oportunity to bring up a High availability cluster with Pacemaker / Corosync / pcs (Pacemaker Configuration System) due to fact machines you need to configure the cluster on are not Physical but VMWare Virtual Machines which couldn't not have configured a separate Admin Lans and Heartbeat Lan as we usually do on a Pacemaker Cluster due to the fact the 5 Ethernet LAN Card Interfaces of the VMWare Hypervisor hosts are configured as a BOND (e.g. all the incoming traffic to the VMWare vSphere HV is received on one Virtual Bond interface).

I assume you have 2 separate vSphere Hypervisor Physical Machines in separate Racks and separate switches hosting the two VMs.
For the article, I'll call the two brand new brought Virtual Machines with some installation automation software such as Terraform or Ansible – vm-server1 and vm-server2 which would have configured some recent version of Linux.

In that scenario to have a High Avaiability for the VMs on Application level and assure at least one of the two is available at a time if one gets broken due toe malfunction of the HV, a Network connectivity issue, or because the VM OS has crashed.
Then one relatively easily solution is to use keepalived and configurea single High Availability Virtual IP (VIP) Address, i.e. 10.10.10.1, which would float among two VMs using keepalived so at a time at least one of the two VMs would be reachable on the Network.

haproxy_keepalived-vip-ip-diagram-linux

Having a VIP IP is quite a common solution in corporate world, as it makes it pretty easy to add F5 Load Balancer in front of the keepalived cluster setup to have a 3 Level of security isolation, which usually consists of:

1. Physical (access to the hardware or Virtualization hosts)
2. System Access (The mechanism to access the system login credetials users / passes, proxies, entry servers leading to DMZ-ed network)
3. Application Level (access to different programs behind L2 and data based on the specific identity of the individual user,
special Secondary UserID, Factor authentication, biometrics etc.)

1. Install keepalived and haproxy on machines

Depending on the type of Linux OS:

On both machines

[root@server1:~]# yum install -y keepalived haproxy
…

If you have to install keepalived / haproxy on Debian / Ubuntu and other Deb based Linux distros

[root@server1:~]# apt install keepalived haproxy –yes
…

2. Configure haproxy (haproxy.cfg) on both server1 and server2

Create some /etc/haproxy/haproxy.cfg configuration

[root@server1:~]# vim /etc/haproxy/haproxy.cfg

#———————————————————————
# Global settings
#———————————————————————
global
log 127.0.0.1 local6 debug
chroot /var/lib/haproxy
pidfile /run/haproxy.pid
stats socket /var/lib/haproxy/haproxy.sock mode 0600 level admin
maxconn 4000
user haproxy
group haproxy
daemon
#debug
#quiet

#———————————————————————
# common defaults that all the 'listen' and 'backend' sections will
# use if not designated in their block
#———————————————————————
defaults
mode tcp
log global
# option dontlognull
# option httpclose
# option httplog
# option forwardfor
option redispatch
option log-health-checks
timeout connect 10000 # default 10 second time out if a backend is not found
timeout client 300000
timeout server 300000
maxconn 60000
retries 3

#———————————————————————
# round robin balancing between the various backends
#———————————————————————

listen FRONTEND_APPNAME1
bind 10.10.10.1:15000
mode tcp
option tcplog
# #log global
log-format [%t]\ %ci:%cp\ %bi:%bp\ %b/%s:%sp\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
balance roundrobin
timeout client 350000
timeout server 350000
timeout connect 35000
server app-server1 10.10.10.55:30000 weight 1 check port 68888
server app-server2 10.10.10.55:30000 weight 2 check port 68888

listen FRONTEND_APPNAME2
bind 10.10.10.1:15000
mode tcp
option tcplog
#log global
log-format [%t]\ %ci:%cp\ %bi:%bp\ %b/%s:%sp\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
balance roundrobin
timeout client 350000
timeout server 350000
timeout connect 35000
server app-server1 10.10.10.55:30000 weight 5
server app-server2 10.10.10.55:30000 weight 5

You can get a copy of above haproxy.cfg configuration here.
Once configured roll it on.

[root@server1:~]# systemctl start haproxy
[root@server1:~]# ps -ef|grep -i hapro
root 285047 1 0 Mar07 ? 00:00:00 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid
haproxy 285050 285047 0 Mar07 ? 00:00:26 /usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p /run/haproxy.pid

Bring up the haproxy also on server2 machine, by placing same configuration and starting up the proxy.

[root@server1:~]# vim /etc/haproxy/haproxy.cfg
…

…

3. Configure keepalived on both servers

We'll be configuring 2 nodes with keepalived even though if necessery this can be easily extended and you can add more nodes.
First we make a copy of the original or existing server configuration keepalived.conf (just in case we need it later on or if you already had something other configured manually by someone – that could be so on inherited servers by other sysadmin)

[root@server1:~]# mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.orig
[root@server2:~]# mv /etc/keepalived/keepalived.conf /etc/keepalived/keepalived.conf.orig

a. Configure keepalived to serve as a MASTER Node

[root@server1:~]# vim /etc/keepalived/keepalived.conf

Master Node
global_defs {
router_id server1-fqdn # The hostname of this host.

enable_script_security
# Synchro of the state of the connections between the LBs on the eth0 interface
lvs_sync_daemon eth0

notification_email {
linuxadmin@notify-domain.com # Email address for notifications
}
notification_email_from keepalived@server1-fqdn # The from address for the notifications
smtp_server 127.0.0.1 # SMTP server address
smtp_connect_timeout 15
}

vrrp_script haproxy {
script "killall -0 haproxy"
interval 2
weight 2
user root
}

vrrp_instance LB_VIP_QA {
virtual_router_id 50
advert_int 1
priority 51

state MASTER
interface eth0
smtp_alert # Enable Notifications Via Email

authentication {
auth_type PASS
auth_pass testp141

}
### Commented because running on VM on VMWare
## unicast_src_ip 10.44.192.134 # Private IP address of master
## unicast_peer {
## 10.44.192.135 # Private IP address of the backup haproxy
## }

# }
# master node with higher priority preferred node for Virtual IP if both keepalived up
### priority 51
### state MASTER
### interface eth0
virtual_ipaddress {
10.10.10.1 dev eth0 # The virtual IP address that will be shared between MASTER and BACKUP
}
track_script {
haproxy
}
}

To dowload a copy of the Master keepalived.conf configuration click here

Below are few interesting configuration variables, worthy to mention few words on, most of them are obvious by their names but for more clarity I'll also give a list here with short description of each:

vrrp_instance – defines an individual instance of the VRRP protocol running on an interface.
state – defines the initial state that the instance should start in (i.e. MASTER / SLAVE )state –
interface – defines the interface that VRRP runs on.
virtual_router_id – should be unique value per Keepalived Node (otherwise slave master won't function properly)
priority – the advertised priority, the higher the priority the more important the respective configured keepalived node is.
advert_int – specifies the frequency that advertisements are sent at (1 second, in this case).
authentication – specifies the information necessary for servers participating in VRRP to authenticate with each other. In this case, a simple password is defined.
only the first eight (8) characters will be used as described in to note is Important thing
man keepalived.conf – keepalived.conf variables documentation !!! Nota Bene !!! – Password set on each node should match for nodes to be able to authenticate !
virtual_ipaddress – defines the IP addresses (there can be multiple) that VRRP is responsible for.
notification_email – the notification email to which Alerts will be send in case if keepalived on 1 node is stopped (e.g. the MASTER node switches from host 1 to 2)
notification_email_from – email address sender from where email will originte
! NB ! In order for notification_email to be working you need to have configured MTA or Mail Relay (set to local MTA) to another SMTP – e.g. have configured something like Postfix, Qmail or Postfix

b. Configure keepalived to serve as a SLAVE Node

[root@server1:~]# vim /etc/keepalived/keepalived.conf

#Slave keepalived
global_defs {
router_id server2-fqdn # The hostname of this host!

enable_script_security
# Synchro of the state of the connections between the LBs on the eth0 interface
lvs_sync_daemon eth0

notification_email {
linuxadmin@notify-host.com # Email address for notifications
}
notification_email_from keepalived@server2-fqdn # The from address for the notifications
smtp_server 127.0.0.1 # SMTP server address
smtp_connect_timeout 15
}

vrrp_script haproxy {
script "killall -0 haproxy"
interval 2
weight 2
user root
}

vrrp_instance LB_VIP_QA {
virtual_router_id 50
advert_int 1
priority 50

state BACKUP
interface eth0
smtp_alert # Enable Notifications Via Email

authentication {
auth_type PASS
auth_pass testp141
}
### Commented because running on VM on VMWare
## unicast_src_ip 10.10.192.135 # Private IP address of master
## unicast_peer {
## 10.10.192.134 # Private IP address of the backup haproxy
## }

### priority 50
### state BACKUP
### interface eth0
virtual_ipaddress {
10.10.10.1 dev eth0 # The virtual IP address that will be shared betwee MASTER and BACKUP.
}
track_script {
haproxy
}
}

Download the keepalived.conf slave config here

c. Set required sysctl parameters for haproxy to work as expected

[root@server1:~]# vim /etc/sysctl.conf
#Haproxy config
# haproxy
net.core.somaxconn=65535
net.ipv4.ip_local_port_range = 1024 65000
net.ipv4.ip_nonlocal_bind = 1
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_max_syn_backlog = 10240
net.ipv4.tcp_max_tw_buckets = 400000
net.ipv4.tcp_max_orphans = 60000
net.ipv4.tcp_synack_retries = 3

4. Test Keepalived keepalived.conf configuration syntax is OK

[root@server1:~]# keepalived –config-test
(/etc/keepalived/keepalived.conf: Line 7) Unknown keyword 'lvs_sync_daemon_interface'
(/etc/keepalived/keepalived.conf: Line 21) Unable to set default user for vrrp script haproxy – removing
(/etc/keepalived/keepalived.conf: Line 31) (LB_VIP_QA) Specifying lvs_sync_daemon_interface against a vrrp is deprecated.
(/etc/keepalived/keepalived.conf: Line 31) Please use global lvs_sync_daemon
(/etc/keepalived/keepalived.conf: Line 35) Truncating auth_pass to 8 characters
(/etc/keepalived/keepalived.conf: Line 50) (LB_VIP_QA) track script haproxy not found, ignoring…

I've experienced this error because first time I've configured keepalived, I did not mention the user with which the vrrp script haproxy should run,
in prior versions of keepalived, leaving the field empty did automatically assumed you have the user with which the vrrp script runs to be set to root
as of RHELs keepalived-2.1.5-6.el8.x86_64, i've been using however this is no longer so and thus in prior configuration as you can see I've
set the user in respective section to root.
The error Unknown keyword 'lvs_sync_daemon_interface' is also easily fixable by just substituting the lvs_sync_daemon_interface and lvs_sync_daemon and reloading
keepalived etc.

Once keepalived is started and you can see the process on both machines running in process list.

[root@server1:~]# ps -ef |grep -i keepalived
root 1190884 1 0 18:50 ? 00:00:00 /usr/sbin/keepalived -D
root 1190885 1190884 0 18:50 ? 00:00:00 /usr/sbin/keepalived -D

Next step is to check the keepalived statuses as well as /var/log/keepalived.log

If everything is configured as expected on both keepalived on first node you should see one is master and one is slave either in the status or the log

[root@server1:~]#systemctl restart keepalived

[root@server1:~]# systemctl status keepalived|grep -i state
Mar 14 18:59:02 server1-fqdn Keepalived_vrrp[1192003]: (LB_VIP_QA) Entering MASTER STATE

[root@server1:~]# systemctl status keepalived

● keepalived.service – LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Mon 2022-03-14 18:15:51 CET; 32min ago
Process: 1187587 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 1187589 (code=exited, status=0/SUCCESS)

Mar 14 18:15:04 server1lb-fqdn Keepalived_vrrp[1187590]: Sending gratuitous ARP on eth0 for 10.44.192.142
Mar 14 18:15:50 server1lb-fqdn systemd[1]: Stopping LVS and VRRP High Availability Monitor…
Mar 14 18:15:50 server1lb-fqdn Keepalived[1187589]: Stopping
Mar 14 18:15:50 server1lb-fqdn Keepalived_vrrp[1187590]: (LB_VIP_QA) sent 0 priority
Mar 14 18:15:50 server1lb-fqdn Keepalived_vrrp[1187590]: (LB_VIP_QA) removing VIPs.
Mar 14 18:15:51 server1lb-fqdn Keepalived_vrrp[1187590]: Stopped – used 0.002007 user time, 0.016303 system time
Mar 14 18:15:51 server1lb-fqdn Keepalived[1187589]: CPU usage (self/children) user: 0.000000/0.038715 system: 0.001061/0.166434
Mar 14 18:15:51 server1lb-fqdn Keepalived[1187589]: Stopped Keepalived v2.1.5 (07/13,2020)
Mar 14 18:15:51 server1lb-fqdn systemd[1]: keepalived.service: Succeeded.
Mar 14 18:15:51 server1lb-fqdn systemd[1]: Stopped LVS and VRRP High Availability Monitor

[root@server2:~]# systemctl status keepalived|grep -i state
Mar 14 18:59:02 server2-fqdn Keepalived_vrrp[297368]: (LB_VIP_QA) Entering BACKUP STATE

[root@server1:~]# grep -i state /var/log/keepalived.log
Mar 14 18:59:02 server1lb-fqdn Keepalived_vrrp[297368]: (LB_VIP_QA) Entering MASTER STATE

a. Fix Keepalived SECURITY VIOLATION – scripts are being executed but script_security not enabled.

When configurating keepalived for a first time we have faced the following strange error inside keepalived status inside keepalived.log

Feb 23 14:28:41 server1 Keepalived_vrrp[945478]: SECURITY VIOLATION – scripts are being executed but script_security not enabled.

To fix keepalived SECURITY VIOLATION error:

Add to /etc/keepalived/keepalived.conf on the keepalived node hosts
inside

global_defs {}

After chunk

enable_script_security

include

# Synchro of the state of the connections between the LBs on the eth0 interface
lvs_sync_daemon_interface eth0

5. Prepare rsyslog configuration and Inlcude additional keepalived options
to force keepalived log into /var/log/keepalived.log

To force keepalived log into /var/log/keepalived.log on RHEL 8 / CentOS and other Redhat Package Manager (RPM) Linux distributions

[root@server1:~]# vim /etc/rsyslog.d/48_keepalived.conf

#2022/02/02: HAProxy logs to local6, save the messages
local7.* /var/log/keepalived.log
if ($programname == 'Keepalived') then -/var/log/keepalived.log
if ($programname == 'Keepalived_vrrp') then -/var/log/keepalived.log
& stop

[root@server:~]# touch /var/log/keepalived.log

Reload rsyslog to load new config

[root@server:~]# systemctl restart rsyslog
[root@server:~]# systemctl status rsyslog

● rsyslog.service – System Logging Service
Loaded: loaded (/usr/lib/systemd/system/rsyslog.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/rsyslog.service.d
└─rsyslog-service.conf
Active: active (running) since Mon 2022-03-07 13:34:38 CET; 1 weeks 0 days ago
Docs: man:rsyslogd(8)
https://www.rsyslog.com/doc/
Main PID: 269574 (rsyslogd)
Tasks: 6 (limit: 100914)
Memory: 5.1M
CGroup: /system.slice/rsyslog.service
└─269574 /usr/sbin/rsyslogd -n

Mar 15 08:15:16 server1lb-fqdn rsyslogd[269574]: — MARK —
Mar 15 08:35:16 server1lb-fqdn rsyslogd[269574]: — MARK —
Mar 15 08:55:16 server1lb-fqdn rsyslogd[269574]: — MARK —

If once keepalived is loaded but you still have no log written inside /var/log/keepalived.log

[root@server1:~]# vim /etc/sysconfig/keepalived
KEEPALIVED_OPTIONS="-D -S 7"

[root@server2:~]# vim /etc/sysconfig/keepalived
KEEPALIVED_OPTIONS="-D -S 7"

[root@server1:~]# systemctl restart keepalived.service
[root@server1:~]# systemctl status keepalived

● keepalived.service – LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Thu 2022-02-24 12:12:20 CET; 2 weeks 4 days ago
Main PID: 1030501 (keepalived)
Tasks: 2 (limit: 100914)
Memory: 1.8M
CGroup: /system.slice/keepalived.service
├─1030501 /usr/sbin/keepalived -D
└─1030502 /usr/sbin/keepalived -D

Warning: Journal has been rotated since unit was started. Log output is incomplete or unavailable.

[root@server2:~]# systemctl restart keepalived.service
[root@server2:~]# systemctl status keepalived
…

6. Monitoring VRRP traffic of the two keepaliveds with tcpdump

Once both keepalived are up and running a good thing is to check the VRRP protocol traffic keeps fluently on both machines.
Keepalived VRRP keeps communicating over the TCP / IP Port 112 thus you can simply snoop TCP tracffic on its protocol.

[root@server1:~]# tcpdump proto 112

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:08:07.356187 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:08.356297 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:09.356408 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:10.356511 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:11.356655 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20

[root@server2:~]# tcpdump proto 112

tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:08:07.356187 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:08.356297 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:09.356408 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:10.356511 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20
11:08:11.356655 IP server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20

As you can see the VRRP traffic on the network is originating only from server1lb-fqdn, this is so because host server1lb-fqdn is the keepalived configured master node.

It is possible to spoof the password configured to authenticate between two nodes, thus if you're bringing up keepalived service cluster make sure your security is tight at best the machines should be in a special local LAN DMZ, do not configure DMZ on the internet !!! 🙂 Or if you eventually decide to configure keepalived in between remote hosts, make sure you somehow use encrypted VPN or SSH tunnels to tunnel the VRRP traffic.

[root@server1:~]# tcpdump proto 112 -vv
tcpdump: listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
11:36:25.530772 IP (tos 0xc0, ttl 255, id 59838, offset 0, flags [none], proto VRRP (112), length 40)
server1lb-fqdn > vrrp.mcast.net: vrrp server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20, addrs: VIPIP_QA auth "testp431"
11:36:26.530874 IP (tos 0xc0, ttl 255, id 59839, offset 0, flags [none], proto VRRP (112), length 40)
server1lb-fqdn > vrrp.mcast.net: vrrp server1lb-fqdn > vrrp.mcast.net: VRRPv2, Advertisement, vrid 50, prio 53, authtype simple, intvl 1s, length 20, addrs: VIPIP_QA auth "testp431"

Lets also check what floating IP is configured on the machines:

[root@server1:~]# ip -brief address show
lo UNKNOWN 127.0.0.1/8
eth0 UP 10.10.10.5/26 10.10.10.1/32

The 10.10.10.5 IP is the main IP set on LAN interface eth0, 10.10.10.1 is the floating IP which as you can see is currently set by keepalived to listen on first node.

[root@server2:~]# ip -brief address show |grep -i 10.10.10.1

An empty output is returned as floating IP is currently configured on server1

To double assure ourselves the IP is assigned on correct machine, lets ping it and check the IP assigned MAC currently belongs to which machine.

[root@server2:~]# ping 10.10.10.1
PING 10.10.10.1 (10.10.10.1) 56(84) bytes of data.
64 bytes from 10.10.10.1: icmp_seq=1 ttl=64 time=0.526 ms
^C
— 10.10.10.1 ping statistics —
1 packets transmitted, 1 received, 0% packet loss, time 0ms
rtt min/avg/max/mdev = 0.526/0.526/0.526/0.000 ms
[root@server2:~]# arp -an |grep -i 10.44.192.142
? (10.10.10.1) at 00:48:54:91:83:7d [ether] on eth0
[root@server2:~]# ip a s|grep -i 00:48:54:91:83:7d
[root@server2:~]#

As you can see from below output MAC is not found in configured IPs on server2.

[root@server1-fqdn:~]# /sbin/ip a s|grep -i 00:48:54:91:83:7d -B1 -A1
eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 00:48:54:91:83:7d brd ff:ff:ff:ff:ff:ff
inet 10.10.10.1/26 brd 10.10.1.191 scope global noprefixroute eth0

Pretty much expected MAC is on keepalived node server1.

7. Testing keepalived on server1 and server2 maachines VIP floating IP really works

To test the overall configuration just created, you should stop keeaplived on the Master node and in meantime keep an eye on Slave node (server2), whether it can figure out the Master node is gone and switch its
state BACKUP to save MASTER. By changing the secondary (Slave) keepalived to master the floating IP: 10.10.10.1 will be brought up by the scripts on server2.

Lets assume that something went wrong with server1 VM host, for example the machine crashed due to service overload, DDoS or simply a kernel bug or whatever reason.
To simulate that we simply have to stop keepalived, then the broadcasted information on VRRP TCP/IP proto port 112 will be no longer available and keepalived on node server2, once
unable to communicate to server1 should chnage itself to state MASTER.

[root@server1:~]# systemctl stop keepalived
[root@server1:~]# systemctl status keepalived
● keepalived.service – LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: inactive (dead) since Tue 2022-03-15 12:11:33 CET; 3s ago
Process: 1192001 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 1192002 (code=exited, status=0/SUCCESS)

Mar 14 18:59:07 server1lb-fqdn Keepalived_vrrp[1192003]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:32 server1lb-fqdn systemd[1]: Stopping LVS and VRRP High Availability Monitor…
Mar 15 12:11:32 server1lb-fqdn Keepalived[1192002]: Stopping
Mar 15 12:11:32 server1lb-fqdn Keepalived_vrrp[1192003]: (LB_VIP_QA) sent 0 priority
Mar 15 12:11:32 server1lb-fqdn Keepalived_vrrp[1192003]: (LB_VIP_QA) removing VIPs.
Mar 15 12:11:33 server1lb-fqdn Keepalived_vrrp[1192003]: Stopped – used 2.145252 user time, 15.513454 system time
Mar 15 12:11:33 server1lb-fqdn Keepalived[1192002]: CPU usage (self/children) user: 0.000000/44.555362 system: 0.001151/170.118126
Mar 15 12:11:33 server1lb-fqdn Keepalived[1192002]: Stopped Keepalived v2.1.5 (07/13,2020)
Mar 15 12:11:33 server1lb-fqdn systemd[1]: keepalived.service: Succeeded.
Mar 15 12:11:33 server1lb-fqdn systemd[1]: Stopped LVS and VRRP High Availability Monitor.

On keepalived off, you will get also a notification Email on the Receipt Email configured from keepalived.conf from the working keepalived node with a simple message like:

=> VRRP Instance is no longer owning VRRP VIPs <=

Once keepalived is back up you will get another notification like:

=> VRRP Instance is now owning VRRP VIPs <=

[root@server2:~]# systemctl status keepalived
● keepalived.service – LVS and VRRP High Availability Monitor
Loaded: loaded (/usr/lib/systemd/system/keepalived.service; enabled; vendor preset: disabled)
Active: active (running) since Mon 2022-03-14 18:13:52 CET; 17h ago
Process: 297366 ExecStart=/usr/sbin/keepalived $KEEPALIVED_OPTIONS (code=exited, status=0/SUCCESS)
Main PID: 297367 (keepalived)
Tasks: 2 (limit: 100914)
Memory: 2.1M
CGroup: /system.slice/keepalived.service
├─297367 /usr/sbin/keepalived -D -S 7
└─297368 /usr/sbin/keepalived -D -S 7

Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: Remote SMTP server [127.0.0.1]:25 connected.
Mar 15 12:11:33 server2lb-fqdn Keepalived_vrrp[297368]: SMTP alert successfully sent.
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: (LB_VIP_QA) Sending/queueing gratuitous ARPs on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1
Mar 15 12:11:38 server2lb-fqdn Keepalived_vrrp[297368]: Sending gratuitous ARP on eth0 for 10.10.10.1

[root@server2:~]# ip addr show|grep -i 10.10.10.1
inet 10.10.10.1/32 scope global eth0

As you see the VIP is now set on server2, just like expected – that's OK, everything works as expected. If the IP did not move double check the keepalived.conf on both nodes for errors or misconfigurations.

To recover the initial order of things so server1 is MASTER and server2 SLAVE host, we just have to switch on the keepalived on server1 machine.

[root@server1:~]# systemctl start keepalived

The automatic change of server1 to MASTER node and respective move of the VIP IP is done because of the higher priority (of importance we previously configured on server1 in keepalived.conf).

What we learned?

So what we learned in this article?
We have seen how to easily install and configure a High Availability Load balancer with Keepalived with single floating VIP IP address with 1 MASTER and 1 SLAVE host and a Haproxy example config with few frontends / App backends. We have seen how the config can be tested for potential errors and how we can monitor whether the VRRP2 network traffic flows between nodes and how to potentially debug it further if necessery.
Further on rawly explained some of the keepalived configurations but as keepalived can do pretty much more,for anyone seriously willing to deal with keepalived on a daily basis or just fine tune some already existing ones, you better read closely its manual page "man keepalived.conf" as well as the official Redhat Linux documentation page on setting up a Linux cluster with Keepalived (Be prepare for a small nightmare as the documentation of it seems to be a bit chaotic, and even I would say partly missing or opening questions on what does the developers did meant – not strange considering the havoc that is pretty much as everywhere these days.)

Finally once keepalived hosts are prepared, it was shown how to test the keepalived application cluster and Floating IP does move between nodes in case if one of the 2 keepalived nodes is inaccessible.

The same logic can be repeated multiple times and if necessery you can set multiple VIPs to expand the HA reachable IPs solution.

high-availability-with-two-vips-example-diagram

The presented idea is with haproxy forward Proxy server to proxy requests towards Application backend (servince machines), however if you need to set another set of server on the flow to process HTML / XHTML / PHP / Perl / Python programming code, with some common Webserver setup ( Nginx / Apache / Tomcat / JBOSS) and enable SSL Secure certificate with lets say Letsencrypt, this can be relatively easily done. If you want to implement letsencrypt and a webserver check this redundant SSL Load Balancing with haproxy & keepalived article.

That's all folks, hope you enjoyed.
If you need to configure keepalived Cluster or a consultancy write your query here 🙂

Tags: Ansible, application cluster, backup, Below, cluster config, common, configured, Configuring, CPU, Easy, everything, haproxy, howto, incoming traffic, inside, installation, keepalived, linux?, long time, Mar, master node, network, nightmare, Sending, server2, servers, Set, Slave, smtp, something, Stopping, test keepalived, var, vip address, virtual machines, Warning Journal
Posted in Cloud services, Educational, Haproxy, Keepalived, System Administration | No Comments »

How to move transfer binary files encoded with base64 on Linux with Copy Paste of text ASCII encoded string

Monday, October 25th, 2021

If you have to work on servers in a protected environments that are accessed via multiple VPNs, Jump hosts or Web Citrix and you have no mean to copy binary files to your computer or from your computer because you have all kind of FTP / SFTP or whatever Data Copy clients disabled on remote jump host side or CITRIX server and you still are looking for a way to copy files between your PC and the Remote server Side.
Or for example if you have 2 or more servers that are in a special Demilitarized Network Zones ( DMZ ) and the machines does not have SFTP / FTP / WebServer or other kind of copy protocol service that can be used to copy files between the hosts and you still need to copy some files between the 2 or more machines in a slow but still functional way, then you might not know of one old school hackers trick you can employee to complete the copy of files between DMZ-ed Server Host A lets say with IP address (192.168.50.5) -> Server Host B (192.168.30.7). The way to complete the binary file copy is to Encode the binary on Server Host A and then, use cat command to display the encoded string and copy whole encoded cat command output to your (local PC buffer from where you access the remote side via SSH via the CITRIX or Jump host.). Then decode the encoded file with an encoding tool such as base64 or uuencode. In this article, I'll show how this is done with base64 and uuencode. Base64 binary is pretty standard in most Linux / Unix OS-es today on most Linux distributions it is part of the coreutils package.
The main use of base64 encoding to encode non-text Attachment files to Electronic Mail, but for our case it fits perfectly.
Keep in mind, that this hack to copy the binary from Machine A to Machine B of course depends on the Copy / Paste buffer being enabled both on remote Jump host or Citrix from where you reach the servers as well as your own PC laptop from where you access the remote side.

Base64 Encoding and Decoding text strings legend

The file copy process to the highly secured PCI host goes like this:

1. On Server Host A encode with md5sum command

[root@serverA ~]:# md5sum -b /tmp/inputbinfile-to-encode
66c4d7b03ed6df9df5305ae535e40b7d *inputbinfile-to-encode

As you see one good location to encode the file would be /tmp as this is a temporary home or you can use alternatively your HOME dir

but you have to be quite careful to not run out of space if you produce it anywhere 🙂

2. Encode the binary file with base64 encoding

[root@serverB ~]:# base64 -w0 inputbinfile-to-encode > outputbin-file.base64

The -w0 option is given to disable line wrapping. Line wrapping is perhaps not needed if you will copy paste the data.

base64-encoded-binary-file-text-string-linux-screenshot

Base64 Encoded string chunk with line wrapping

For a complete list of possible accepted arguments check here.

3. Cat the inputbinfile-to-encode just generated to display the text encoded file in your SecureCRT / Putty / SuperPutty etc. remote ssh access client

[root@serverA ~]:# cat /tmp/inputbinfile-to-encode
f0VMRgIBAQAAAAAAAAAAAAMAPgABAAAAMGEAAAAAAABAAAAAAAAAACgXAgAAAAAAAAAAA
EAAOAALAEAAHQAcAAYAAAAEAAA ……………………………………………………………… cTD6lC+ViQfUCPn9bs

4. Select the cat-ted string and copy it to your PC Copy / Paste buffer

If the bin file is not few kilobytes, but few megabytes copying the file might be tricky as the string produced from cat command would be really long, so make sure the SSH client you're using is configured to have a large buffer to scroll up enough and be able to select the whole encoded string until the end of the cat command and copy it to Copy / Paste buffer.

5. On Server Host B paste the bas64 encoded binary inside a newly created file

Open with a text editor vim / mc or whatever is available

[root@serverB ~]:# vi inputbinfile-to-encode

Some very paranoid Linux / UNIX systems might not have even a normal text editor like 'vi' if you happen to need to copy files on such one a useful thing is to use a simple cat on the remote side to open a new File Descriptor buffer, like this:

[root@server2 ~]:# cat >> inputbinfile-to-encode <<'EOF'
Paste the string here

6. Decode the encoded binary with base64 cmd again

[root@serverB ~]:# base64 –decode outputbin-file.base64 > inputbinfile-to-encode

7. Set proper file permissions (the same as on Host A)

[root@serverB ~]:# chmod +x inputbinfile-to-encode

…

8. Check again the binary file checksum on Host B is identical as on Host A

[root@serverB ~]:# md5sum -b inputbinfile-to-encode
66c4d7b03ed6df9df5305ae535e40b7d *inputbinfile-to-encode

As you can md5sum match on both sides so file should be OK.

9. Encoding and decoding files with uuencode

If you are lucky and you have uuencode installed (sharutils) package is present on remote machine to encode lets say an archived set of binary files in .tar.gz format do:

Prepare the archive of all the files you want to copy with tar on Host A:

[root@Machine1 ~]:# tar -czvf /bin/whatever /usr/local/bin/htop /usr/local/bin/samhain /etc/hosts archived-binaries-and-configs.tar.gz

[root@Machine1 ~]:# uuencode archived-binaries-and-configs.tar.gz archived-binaries-and-configs.uu

Cat / Copy / paste the encoded content as usual to a file on Host B:

Then on Machine 2 decode:

[root@Machine2 ~]:# uuencode -c < archived-binaries-and-configs.tar.gz.uu

Conclusion

In this short method I've shown you a hack that is used often by script kiddies to copy over files between pwn3d machines, a method which however is very precious and useful for sysadmins like me who has to admin a paranoid secured servers that are placed in a very hard to access environments.

With the same method you can encode or decode not only binary file but also any standard input/output file content. base64 encoding is quite useful stuff to use also in bash scripts or perl where you want to have the script copy file in a plain text format . Datas are encoded and decoded to make the data transmission and storing process easier. You have to keep in mind always that Encoding and Decoding are not similar to encryption and decryption as encr. deprytion gives a special security layers to the encoded that. Encoded data can be easily revealed by decoding, so if you need to copy between the servers very sensitive data like SSL certificates Private RSA / DSA key, this command line utility tool better to be not used for sesitive data copying.

Tags: access, Ascii, binary files, chmod, com, copy paste, home, How to, linux?, match, move, plain, Select, servers, Set, tar, text, text ascii, transfer, unix, use
Posted in Curious Facts, Hacks, Linux, System Administration, Various | No Comments »

Install and enable Sysstats IO / DIsk / CPU / Network monitoring console suite on Redhat 8.3, Few sar useful command examples

Tuesday, September 28th, 2021

Why to monitoring CPU, Memory, Hard Disk, Network usage etc. with sysstats tools?

Using system monitoring tools such as Zabbix, Nagios Monit is a good approach, however sometimes due to zabbix server interruptions you might not be able to track certain aspects of system performance on time. Thus it is always a good idea to
Gain more insights on system peroformance from command line. Of course there is cmd tools such as iostat and top, free, vnstat that provides plenty of useful info on system performance issues or bottlenecks. However from my experience to have a better historical data that is systimized and all the time accessible from console it is a great thing to have sysstat package at place. Since many years mostly on every server I administer, I've been using sysstats to monitor what is going on servers over a short time frames and I'm quite happy with it. In current company we're using Redhats and CentOS-es and I had to install sysstats on Redhat 8.3. I've earlier done it multiple times on Debian / Ubuntu Linux and while I've faced on some .deb distributions complications of making sysstat collect statistics I've come with an article on Howto fix sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux

Sysstat contains the following tools related to collecting I/O and CPU statistics:
iostat
Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.
mpstat
Displays more in-depth CPU statistics.
Sysstat also contains tools that collect system resource utilization data and create daily reports based on that data. These tools are:
sadc
Known as the system activity data collector, sadc collects system resource utilization information and writes it to a file.
sar
Producing reports from the files created by sadc, sar reports can be generated interactively or written to a file for more intensive analysis.

My experience with CentOS 7 and Fedora to install sysstat it was pretty straight forward, I just had to install it via yum install sysstat wait for some time and use sar (System Activity Reporter) tool to report collected system activity info stats over time.
Unfortunately it seems on RedHat 8.3 as well as on CentOS 8.XX instaling sysstats does not work out of the box.

To complete a successful installation of it on RHEL 8.3, I had to:

[root@server ~]# yum install -y sysstat

To make sysstat enabled on the system and make it run, I've enabled it in sysstat

[root@server ~]# systemctl enable sysstat

Running immediately sar command, I've faced the shitty error:

“Cannot open /var/log/sysstat/sa18:
No such file or directory. Please check if data collecting is enabled”

Once installed I've waited for about 5 minutes hoping, that somehow automatically sysstat would manage it but it didn't.

To solve it, I've had to create additionally file /etc/cron.d/sysstat (weirdly RPM's post install instructions does not tell it to automatically create it)

[root@server ~]# vim /etc/cron.d/sysstat

# run system activity accounting tool every 10 minutes
0 * * * * root /usr/lib64/sa/sa1 60 59 &
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A &

/usr/local/lib/sa1 is a shell script that we can use for scheduling cron which will create daily binary log file.
/usr/local/lib/sa2 is a shell script will change binary log file to human-readable form.

[root@server ~]# chmod 600 /etc/cron.d/sysstat

[root@server ~]# systemctl restart sysstat

In a while if sysstat is working correctly you should get produced its data history logs inside /var/log/sa

[root@server ~]# ls -al /var/log/sa

Note that the standard sysstat history files on Debian and other modern .deb based distros such as Debian 10 (in y.2021) is stored under /var/log/sysstat

Here is few useful uses of sysstat cmds

1. Check with sysstat machine history SWAP and RAM Memory use

To lets say check last 10 minutes SWAP memory use:

[hipo@server yum.repos.d] $ sar -W |last -n 10

Linux 4.18.0-240.el8.x86_64 (server) 09/28/2021 _x86_64_ (8 CPU)

12:00:00 AM pswpin/s pswpout/s
12:00:01 AM 0.00 0.00
12:01:01 AM 0.00 0.00
12:02:01 AM 0.00 0.00
12:03:01 AM 0.00 0.00
12:04:01 AM 0.00 0.00
12:05:01 AM 0.00 0.00
12:06:01 AM 0.00 0.00

[root@ccnrlb01 ~]# sar -r | tail -n 10
14:00:01 93008 1788832 95.06 0 1357700 725740 9.02 795168 683484 32
14:10:01 78756 1803084 95.81 0 1358780 725740 9.02 827660 652248 16
14:20:01 92844 1788996 95.07 0 1344332 725740 9.02 813912 651620 28
14:30:01 92408 1789432 95.09 0 1344612 725740 9.02 816392 649544 24
14:40:01 91740 1790100 95.12 0 1344876 725740 9.02 816948 649436 36
14:50:01 91688 1790152 95.13 0 1345144 725740 9.02 817136 649448 36
15:00:02 91544 1790296 95.14 0 1345448 725740 9.02 817472 649448 36
15:10:01 91108 1790732 95.16 0 1345724 725740 9.02 817732 649340 36
15:20:01 90844 1790996 95.17 0 1346000 725740 9.02 818016 649332 28
Average: 93473 1788367 95.03 0 1369583 725074 9.02 800965 671266 29

2. Check system load? Are my processes waiting too long to run on the CPU?

[root@server ~ ]# sar -q |head -n 10
Linux 4.18.0-240.el8.x86_64 (server) 09/28/2021 _x86_64_ (8 CPU)

12:00:00 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 blocked
12:00:01 AM 0 272 0.00 0.02 0.00 0
12:01:01 AM 1 271 0.00 0.02 0.00 0
12:02:01 AM 0 268 0.00 0.01 0.00 0
12:03:01 AM 0 268 0.00 0.00 0.00 0
12:04:01 AM 1 271 0.00 0.00 0.00 0
12:05:01 AM 1 271 0.00 0.00 0.00 0
12:06:01 AM 1 265 0.00 0.00 0.00 0

3. Show various CPU statistics per CPU use

On a multiprocessor, multi core server sometimes for scripting it is useful to fetch processor per use historic data,
this can be attained with:

[hipo@server ~ ] $ mpstat -P ALL
Linux 4.18.0-240.el8.x86_64 (server) 09/28/2021 _x86_64_ (8 CPU)

06:08:38 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %gnice %idle
06:08:38 PM all 0.17 0.02 0.25 0.00 0.05 0.02 0.00 0.00 0.00 99.49
06:08:38 PM 0 0.22 0.02 0.28 0.00 0.06 0.03 0.00 0.00 0.00 99.39
06:08:38 PM 1 0.28 0.02 0.36 0.00 0.08 0.02 0.00 0.00 0.00 99.23
06:08:38 PM 2 0.27 0.02 0.31 0.00 0.06 0.01 0.00 0.00 0.00 99.33
06:08:38 PM 3 0.15 0.02 0.22 0.00 0.03 0.01 0.00 0.00 0.00 99.57
06:08:38 PM 4 0.13 0.02 0.20 0.01 0.03 0.01 0.00 0.00 0.00 99.60
06:08:38 PM 5 0.14 0.02 0.27 0.00 0.04 0.06 0.01 0.00 0.00 99.47
06:08:38 PM 6 0.10 0.02 0.17 0.00 0.04 0.02 0.00 0.00 0.00 99.65
06:08:38 PM 7 0.09 0.02 0.15 0.00 0.02 0.01 0.00 0.00 0.00 99.70

sar-sysstat-cpu-statistics-screenshot

Monitor processes and threads currently being managed by the Linux kernel.

[hipo@server ~ ] $ pidstat

[hipo@server ~ ] $ pidstat -d 2

pidstat-show-processes-with-most-io-activities-linux-screenshot

This report tells us that there is few processes with heave I/O use Filesystem system journalling daemon jbd2, apache, mysqld and supervise, in 3rd column you see their respective PID IDs.

To show threads used inside a process (like if you press SHIFT + H) inside Linux top command:

[hipo@server ~ ] $ pidstat -t -p 10765 1 3

Linux 4.19.0-14-amd64 (server) 28.09.2021 _x86_64_ (10 CPU)

21:41:22 UID TGID TID %usr %system %guest %wait %CPU CPU Command
21:41:23 108 10765 – 1,98 0,99 0,00 0,00 2,97 1 mysqld
21:41:23 108 – 10765 0,00 0,00 0,00 0,00 0,00 1 |__mysqld
21:41:23 108 – 10768 0,00 0,00 0,00 0,00 0,00 0 |__mysqld
21:41:23 108 – 10771 0,00 0,00 0,00 0,00 0,00 5 |__mysqld
21:41:23 108 – 10784 0,00 0,00 0,00 0,00 0,00 7 |__mysqld
21:41:23 108 – 10785 0,00 0,00 0,00 0,00 0,00 6 |__mysqld
21:41:23 108 – 10786 0,00 0,00 0,00 0,00 0,00 2 |__mysqld
…

10765 – is the Process ID whose threads you would like to list

With pidstat, you can further monitor processes for memory leaks with:

[hipo@server ~ ] $ pidstat -r 2

4. Report paging statistics for some old period

[root@server ~ ]# sar -B -f /var/log/sa/sa27 |head -n 10
Linux 4.18.0-240.el8.x86_64 (server) 09/27/2021 _x86_64_ (8 CPU)

15:42:26 LINUX RESTART (8 CPU)

15:55:30 LINUX RESTART (8 CPU)

04:00:01 PM pgpgin/s pgpgout/s fault/s majflt/s pgfree/s pgscank/s pgscand/s pgsteal/s %vmeff
04:01:01 PM 0.00 14.47 629.17 0.00 502.53 0.00 0.00 0.00 0.00
04:02:01 PM 0.00 13.07 553.75 0.00 419.98 0.00 0.00 0.00 0.00
04:03:01 PM 0.00 11.67 548.13 0.00 411.80 0.00 0.00 0.00 0.00

5. Monitor Received RX and Transmitted TX network traffic perl Network interface real time

To print out Received and Send traffic per network interface 4 times in a raw

sar-sysstats-network-traffic-statistics-screenshot

[hipo@server ~ ] $ sar -n DEV 1 4

To continusly monitor all network interfaces I/O traffic

[hipo@server ~ ] $ sar -n DEV 1

To only monitor a certain network interface lets say loopback interface (127.0.0.1) received / transmitted bytes

[hipo@server yum.repos.d] $ sar -n DEV 1 2|grep -i lo
06:29:53 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
06:29:54 PM lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: lo 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

6. Monitor block devices use

To check block devices use 3 times in a raw

[hipo@server yum.repos.d] $ sar -d 1 3

sar-sysstats-blockdevice-statistics-screenshot

7. Output server monitoring data in CSV database structured format

For preparing a nice graphs with Excel from CSV strucuted file format, you can dump the collected data as so:

[root@server yum.repos.d]# sadf -d /var/log/sa/sa27 — -n DEV | grep -v lo|head -n 10
server-name-fqdn;-1;2021-09-27 13:42:26 UTC;LINUX-RESTART (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;-1;2021-09-27 13:55:30 UTC;LINUX-RESTART (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth1;19.42;16.12;1.94;1.68;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth0;7.18;9.65;0.55;0.78;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth2;5.65;5.13;0.42;0.39;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth1;18.90;15.55;1.89;1.60;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth0;7.15;9.63;0.55;0.74;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth2;5.67;5.15;0.42;0.39;0.00;0.00;0.00;0.00
…

To graph the output data you can use Excel / LibreOffice's Excel equivalent Calc or if you need to dump a CSV sar output and generate it on the fly from a script use gnuplot

What we've learned?

How to install and enable on cron sysstats on Redhat and CentOS 8 Linux ?
How to continuously monitor CPU / Disk and Network, block devices, paging use and processes and threads used by the kernel per process ?
As well as how to export previously collected data to CSV to import to database or for later use inrder to generate graphic presentation of data.
Cheers ! 🙂

Tags: access, cmds, com, command, console, cron, data, Debian Ubuntu Linux, eth0, eth1, eth2, How to, Install, installation, iowait, Output, period, Redhat, root, servers, sysstat, systemctl, timestamp, use, usr
Posted in Linux, Monitoring, System Administration, Various | No Comments »

Adding proxy to yum repository on Redhat / Fedora / CentOS and other RPM based Linux distributions, Listing and enabling new RPM repositories

Tuesday, September 7th, 2021

yum-add-proxy-host-for-redhat-linux-centos-list-rpm-repositories-enable-disable-repositories

Sometimes if you work in a company that is following PCI standards with very tight security you might need to use a custom company prepared RPM repositories that are accessible only via a specific custom maintained repositories or alternatively you might need the proxy node to access an external internet repository from the DMZ-ed firewalled zone where the servers lays .
Hence to still be able to maintain the RPM based servers up2date to the latest security patches and install software with yumone very useful feature of yum package manager is to use a proxy host through which you will reach your Redhat Package Manager files files.

1. The http_proxy and https_proxy shell variables

To set a proxy host you need to define there the IP / Hostname or the Fully Qualified Domain Name (FQDN).

By default "http_proxy and https_proxy are empty. As you can guess https_proxy is used if you have a Secure Socket Layer (SSL) certificate for encrypting the communication channel (e.g. you have https:// URL).

[root@rhel: ~]# echo $http_proxy
[root@rhel: ~]#

2. Setting passwordless or password protected proxy host via http_proxy, https_proxy variables

There is a one time very straight forward to configure proxying of traffic via a specific remote configured server with server bourne again shell (BASH)'s understood variables:

a.) Set password free open proxy to shell environment.

[root@centos: ~]# export https_proxy="https://remote-proxy-server:8080"

Now use yum as usual to update the available installabe package list or simply upgrade to the latest packages with lets say:

[root@rhel: ~]# yum check-update && yum update

b.) Configuring password protected proxy for yum

If your proxy is password protected for even tigher security you can provide the password on the command line as well.

[root@centos: ~]# export http_proxy="http://username:pAssW0rd@server:port/"

Note that if you have some special characters you will have to pass the string inside single quotes or escape them to make sure the password will properly handled to server, before trying out the proxy with yum, echo the variable.

[root@centos: ~]# export http_proxy='http://username:p@s#w:E@192.168.0.1:3128/'
[root@centos: ~]# echo $http_proxy
http://username:p@s#w:E@server:port/

Then do whatever with yum:

[root@centos: ~]# yum check-update && yum search sharutils

If something is wrong and proxy is not properly connected try to reach for the repository manually with curl or wget

[root@centos: ~]# curl -ilk http://download.fedoraproject.org/pub/epel/7/SRPMS/ /epel/7/SRPMS/
HTTP/1.1 302 Found
Date: Tue, 07 Sep 2021 16:49:59 GMT
Server: Apache
X-Frame-Options: SAMEORIGIN
X-Xss-Protection: 1; mode=block
X-Content-Type-Options: nosniff
Referrer-Policy: same-origin
Location: http://mirror.telepoint.bg/epel/7/SRPMS/
Content-Type: text/plain
Content-Length: 0
AppTime: D=2264
X-Fedora-ProxyServer: proxy01.iad2.fedoraproject.org
X-Fedora-RequestID: YTeYOE3mQPHH_rxD0sdlGAAAA80
X-Cache: MISS from pcfreak
X-Cache-Lookup: MISS from pcfreak:3128
Via: 1.1 pcfreak (squid/4.6)
Connection: keep-alive

Or if you need, you can test the user, password protected proxy with wget as so:

[root@centos: ~]# wget –proxy-user=USERNAME –proxy-password=PASSWORD http://your-proxy-domain.com/optional-rpms/

If you have lynx installed on the machine you can do the remote proxy successful authentication check with it with less typing:

[root@centos: ~]# lynx -pauth=USER:PASSWORD http://proxy-domain.com/optional-rpm/

3. Making yum proxy connection permanent via /etc/yum.conf

Perhaps the easiest and quickest way to add the http_proxy / https_proxy configured is to store it to automatically load on each server ssh login in your admin user (root) in /root/.bashrc or /root/.bash_profile or in the global /etc/profile or /etc/profile.d/custom.sh etc.

However if you don't want to have hacks and have more cleanness on the systems, the recommended "Redhat way" so to say is to store the configuration inside /etc/yum.conf

To do it via /etc/yum.conf you have to have some records there like:

# The proxy server – proxy server:port number
proxy=http://mycache.mydomain.com:3128
# The account details for yum connections
proxy_username=yum-user
proxy_password=qwerty-secret-pass

4. Listing RPM repositories and their state

As I had to install sharutils RPM package to the server which contains the file /bin/uuencode (that is provided on CentOS 7.9 Linux from Repo: base/7/x86_64 I had to check whether the repository was installed on the server.

To get a list of all yum repositories avaiable

[root@centos:/etc/yum.repos.d]# yum repolist all
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
* base: centos.telecoms.bg
* epel: mirrors.netix.net
* extras: centos.telecoms.bg
* remi: mirrors.netix.net
* remi-php74: mirrors.netix.net
* remi-safe: mirrors.netix.net
* updates: centos.telecoms.bg
repo id repo name status
base/7/x86_64 CentOS-7 – Base enabled: 10,072
base-debuginfo/x86_64 CentOS-7 – Debuginfo disabled
base-source/7 CentOS-7 – Base Sources disabled
c7-media CentOS-7 – Media disabled
centos-kernel/7/x86_64 CentOS LTS Kernels for x86_64 disabled
centos-kernel-experimental/7/x86_64 CentOS Experimental Kernels for x86_64 disabled
centosplus/7/x86_64 CentOS-7 – Plus disabled
centosplus-source/7 CentOS-7 – Plus Sources disabled
cr/7/x86_64 CentOS-7 – cr disabled
epel/x86_64 Extra Packages for Enterprise Linux 7 – x86_64 enabled: 13,667
epel-debuginfo/x86_64 Extra Packages for Enterprise Linux 7 – x86_64 – Debug disabled
epel-source/x86_64 Extra Packages for Enterprise Linux 7 – x86_64 – Source disabled
epel-testing/x86_64 Extra Packages for Enterprise Linux 7 – Testing – x86_64 disabled
epel-testing-debuginfo/x86_64 Extra Packages for Enterprise Linux 7 – Testing – x86_64 – Debug disabled
epel-testing-source/x86_64 Extra Packages for Enterprise Linux 7 – Testing – x86_64 – Source disabled
extras/7/x86_64 CentOS-7 – Extras enabled: 500
extras-source/7 CentOS-7 – Extras Sources disabled
fasttrack/7/x86_64 CentOS-7 – fasttrack disabled
remi Remi's RPM repository for Enterprise Linux 7 – x86_64 enabled: 7,229
remi-debuginfo/x86_64 Remi's RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-glpi91 Remi's GLPI 9.1 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-glpi92 Remi's GLPI 9.2 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-glpi93 Remi's GLPI 9.3 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-glpi94 Remi's GLPI 9.4 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-modular Remi's Modular repository for Enterprise Linux 7 – x86_64 disabled
remi-modular-test Remi's Modular testing repository for Enterprise Linux 7 – x86_64 disabled
remi-php54 Remi's PHP 5.4 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php55 Remi's PHP 5.5 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php55-debuginfo/x86_64 Remi's PHP 5.5 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
!remi-php56 Remi's PHP 5.6 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php56-debuginfo/x86_64 Remi's PHP 5.6 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php70 Remi's PHP 7.0 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php70-debuginfo/x86_64 Remi's PHP 7.0 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php70-test Remi's PHP 7.0 test RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php70-test-debuginfo/x86_64 Remi's PHP 7.0 test RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php71 Remi's PHP 7.1 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php71-debuginfo/x86_64 Remi's PHP 7.1 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php71-test Remi's PHP 7.1 test RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php71-test-debuginfo/x86_64 Remi's PHP 7.1 test RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
!remi-php72 Remi's PHP 7.2 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php72-debuginfo/x86_64 Remi's PHP 7.2 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php72-test Remi's PHP 7.2 test RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php72-test-debuginfo/x86_64 Remi's PHP 7.2 test RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php73 Remi's PHP 7.3 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php73-debuginfo/x86_64 Remi's PHP 7.3 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php73-test Remi's PHP 7.3 test RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php73-test-debuginfo/x86_64 Remi's PHP 7.3 test RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php74 Remi's PHP 7.4 RPM repository for Enterprise Linux 7 – x86_64 enabled: 423
remi-php74-debuginfo/x86_64 Remi's PHP 7.4 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php74-test Remi's PHP 7.4 test RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php74-test-debuginfo/x86_64 Remi's PHP 7.4 test RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php80 Remi's PHP 8.0 RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php80-debuginfo/x86_64 Remi's PHP 8.0 RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-php80-test Remi's PHP 8.0 test RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-php80-test-debuginfo/x86_64 Remi's PHP 8.0 test RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-safe Safe Remi's RPM repository for Enterprise Linux 7 – x86_64 enabled: 4,549
remi-safe-debuginfo/x86_64 Remi's RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
remi-test Remi's test RPM repository for Enterprise Linux 7 – x86_64 disabled
remi-test-debuginfo/x86_64 Remi's test RPM repository for Enterprise Linux 7 – x86_64 – debuginfo disabled
updates/7/x86_64 CentOS-7 – Updates enabled: 2,741
updates-source/7 CentOS-7 – Updates Sources disabled
zabbix/x86_64 Zabbix Official Repository – x86_64 enabled: 178
zabbix-debuginfo/x86_64 Zabbix Official Repository debuginfo – x86_64 disabled
zabbix-frontend/x86_64 Zabbix Official Repository frontend – x86_64 disabled
zabbix-non-supported/x86_64 Zabbix Official Repository non-supported – x86_64 enabled: 5
repolist: 39,364

[root@centos:/etc/yum.repos.d]# yum repolist all|grep -i 'base/7/x86_64'
base/7/x86_64 CentOS-7 – Base enabled: 10,072

As you can see in CentOS 7 sharutils is enabled from default repositories, however this is not the case on Redhat 7.9, hence to install sharutils there you can one time enable RPM repository to install sharutils

[root@centos:/etc/yum.repos.d]# yum –enablerepo=rhel-7-server-optional-rpms install sharutils

To install zabbix-agent on the same Redhat server, without caring that I need precisely know the RPM repository that is providing zabbix agent that in that was (Repo: 3party/7Server/x86_64) I had to:

[root@centos:/etc/yum.repos.d]# yum –enablerepo \* install zabbix-agent zabbix-sender

Permanently enabling repositories of course is possible via editting or creating fresh new file configuration manually on CentOS / Fedora under directory /etc/yum.repos.d/.
On Redhat Enterprise Linux servers it is easier to use the subscription-manager command instead, like this:

[root@rhel:/root]# subscription-manager repos –disable=epel/7Server/x86_64

[root@rhel:/root]# subscription-manager repos –enable=rhel-6-server-optional-rpms

Tags: admin user, configured, enable, Enterprise Linux, external internet, linux?, Listing, MISS, passwordless, proxy, repository, rpm, SAMEORIGIN, servers, SRPMS, variables, yum
Posted in Linux, Linux Package Management | No Comments »

Disable NetworkManager automatic Ethernet Interface Management on Redhat Linux , CentOS 6 / 7 / 8

Friday, February 5th, 2021

rhel-centos-fedora-network-manager-disable-automatic-lan-interface-management

Most of Linux distributions had introduced the NetworkManager service and are slowly trying to push out the old ways and use entirely it to manage network configs. Though at times this is very helpful stuff especially if you have Linux running on Laptop on servers is a guarantee for troubles.

If you are a system administrator like me and you need that needs to configure a New server with lets say 8 (Ethernet interface) LAN cards each to be configured with different IPs and you have a mixture of configuration where some eth1,eth2 etc. (4 of the interface IPs has to be static IPs and others has to be taken from a DHCP lease. NetworkManager is not something that you will want as usually you don't expect soon a network IP topology change. Below is example from a Living Hypervisor server machine that has 8 Network Interfaces configured together with few Virtual Interfaces used by the running KVM Virtual Machines.

[root@redhat :~ ]# ip address show |grep ": <"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
4: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br2 state UP group default qlen 1000
5: ens1f2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
6: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br1 state UP group default qlen 1000
7: ens1f3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
8: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
9: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
10: venet0: <BROADCAST,POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
11: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
12: br2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
13: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
14: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
15: host-routed: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
16: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
17: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
18: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
19: virbr1-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr1 state DOWN group default qlen 1000
26: vme52540019e701: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 1000
27: vme52540081868b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br1 state UNKNOWN group default qlen 1000
28: vme525400a13f03: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br2 state UNKNOWN group default qlen 1000

Having a NM managing so many LAN connected Ethernets can create you A LOT of surprises even if your servers are in a Highly Secured data center where chance of sudden IP change or network misbehaves are minimal. Even minimal some in Housing might do something wrong on the Rack mixing up with another server or switch andyour server might end up easily with unexplainable Network problems because of this NM service which is trying 'to balance' any network issues according to some algorithms …

Thus to save yourlself the troubles and completely disable NetworkManager (NM) Ethernets handling.

As a hint some of the troubles you might get especially if the System Hardware has issues with the Integrated Motherboard LAN Controllers such as of Dell PowerEdge R640 Rack Server.
I've recently observed one such Dell Rack mounted machine I had to configure from scratch which has out of the box NM preinstalled by a colleague and was doing strange stuff with the routings causing it to become remotely inacessible after reboot.
Even though I have started configuring the IPs and have double and triple check the configuration and machine had proper set of /etc/sysconfig/network-scripts/ifcfg-* configuration it still failed to boot with a network properly brought up and become unreachable via remote SSH connection immediately after sending machine to init 6 with /usr/sbin/init 6 (alias for shutdown -r now or reboot -f now :)

On Redhat 8 / CentOS 8 to Disabling permanently NM you have to disable NM systemd services permanently and add NM_CONTROLLED=no to each of the Ethernet configurations listed in network-scripts/ifcfg-eno3 eno4 eno1np0 etc. ifaces.

1. Disable completely Network Manager service and mask it

[root@redhat :~ ]# systemctl mask NetworkManager.service
[root@redhat :~ ]# systemctl stop NetworkManager.service
[root@redhat :~ ]# systemctl disable NetworkManager.service

2. Check if all systemd networkmanager components scripts are really disabled

# systemctl list-unit-files | grep NetworkManager

NetworkManager-dispatcher.service disabled
NetworkManager-wait-online.service enabled
NetworkManager.service disabled

NetworkManager-wait-online.service seems to be also enabled so we have to disable it.

[root@redhat :~ ]# systemctl mask NetworkManager-wait-online.service
[root@redhat :~ ]# systemctl disable NetworkManager-wait-online.service

Double check NM services

[root@redhat :~ ]# systemctl list-unit-files | grep NetworkManager
…

3. Install / Enable old (legacy) network-scripts

network-scripts is disabled by default due to it doesn't play well with NM.
Install the rpm package to enable it back

[root@redhat :~ ]# yum install -y network-scripts

4. Test if network-scripts is really enabled

Use Redhat's nmcli command for controlling network manager if it reports NM not running then you're fine

[root@redhat :~ ]# nmcli device
Error: NetworkManager is not running.

5. Disable legacy use network-scripts print outs

Bring down some interface with ifdown Redhat script frontend to ifconfig and bring it up with ifup iface-name

# ifup eno4
WARN : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
WARN : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.
WARN : [ifup] It is advised to switch to 'NetworkManager' instead – it provides 'ifup/ifdown' scripts as well.

Notice the warnings they're harmless and safe to ignore however it is pretty annoying to see them, to disable them:

[root@redhat :~ ]# touch /etc/sysconfig/disable-deprecation-warnings

6. Use network.service old-fashioned systemd service

From now on you can start using the good old well known and properly working network.service

[root@redhat :~ ]# systemctl status network

To enable the network service to start after boot:

[root@redhat :~ ]# systemctl enable network

**7. Disable NetworkManager use from Network configuration scripts ifcfg-* for all server available configured ethernet cards**

Open with text editor every network script and append NM_CONTROLLED="no" to the end of the file.

[root@redhat :~ ]# vi /etc/sysconfig/network-scripts/ifcfg-ethernetX
NM_CONTROLLED="no"

To save yourself the time if you want to disable NetworkManager use for all /etc/sysconfig/network-scripts/ifcfg-* use a simple shell loop:

[root@redhat :~ ]# cd /etc/sysconfig/network-scripts/
[root@redhat :/etc/sysconfig/network-scripts ]# for i in *ifcfg*; do echo NM_CONTROLLED="no" >> $i; done

To load the new network settings do another network reload / restart

[root@redhat :~ ]# systemctl restart network

To disable NetworkManager on older CentOS 6 / Redhat 6 / SuSE / Fedora Linux where the OS still not systemd enabled instead of using systemctl you can straight do it with old and well known chkconfig redhat script.

[root@centos6 :~ ]# service NetworkManager stop
[root@centos6 :~ ]# chkconfig NetworkManager off

Tags: broadcast, default, down, Install Enable, MULTICAST, NM, servers, systemctl, unknown, Use Redhat
Posted in Linux, System Administration | 1 Comment »

Hack: Using ssh / curl or wget to test TCP port connection state to remote SSH, DNS, SMTP, MySQL or any other listening service in PCI environment servers

Wednesday, December 30th, 2020

If you work on PCI high security environment servers in isolated local networks where each package installed on the Linux / Unix system is of importance it is pretty common that some basic stuff are not there in most cases it is considered a security hole to even have a simple telnet installed on the system. I do have experience with such environments myself and thus it is pretty daunting stuff so in best case you can use something like a simple ssh client if you're lucky and the CentOS / Redhat / Suse Linux whatever distro has openssh-client package installed.
If you're lucky to have the ssh onboard you can use telnet in same manner as netcat or the swiss army knife (nmap) network mapper tool to test whether remote service TCP / port is opened or not. As often this is useful, if you don't have access to the CISCO / Juniper or other (networ) / firewall equipment which is setting the boundaries and security port restrictions between networks and servers.

Below is example on how to use ssh client to test port connectivity to lets say the Internet, i.e. Google / Yahoo search engines.

[root@pciserver: /home ]# ssh -oConnectTimeout=3 -v google.com -p 23
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1g 21 Apr 2020
debug1: Connecting to google.com [172.217.169.206] port 23.
debug1: connect to address 172.217.169.206 port 23: Connection timed out
debug1: Connecting to google.com [2a00:1450:4017:80b::200e] port 23.
debug1: connect to address 2a00:1450:4017:80b::200e port 23: Cannot assign requested address
ssh: connect to host google.com port 23: Cannot assign requested address
root@pcfreak:/var/www/images# ssh -oConnectTimeout=3 -v google.com -p 80
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1g 21 Apr 2020
debug1: Connecting to google.com [172.217.169.206] port 80.
debug1: connect to address 172.217.169.206 port 80: Connection timed out
debug1: Connecting to google.com [2a00:1450:4017:807::200e] port 80.
debug1: connect to address 2a00:1450:4017:807::200e port 80: Cannot assign requested address
ssh: connect to host google.com port 80: Cannot assign requested address
root@pcfreak:/var/www/images# ssh google.com -p 80
ssh_exchange_identification: Connection closed by remote host
root@pcfreak:/var/www/images# ssh google.com -p 80 -v -oConnectTimeout=3
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1g 21 Apr 2020
debug1: Connecting to google.com [172.217.169.206] port 80.
debug1: connect to address 172.217.169.206 port 80: Connection timed out
debug1: Connecting to google.com [2a00:1450:4017:80b::200e] port 80.
debug1: connect to address 2a00:1450:4017:80b::200e port 80: Cannot assign requested address
ssh: connect to host google.com port 80: Cannot assign requested address
root@pcfreak:/var/www/images# ssh google.com -p 80 -v -oConnectTimeout=5
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1g 21 Apr 2020
debug1: Connecting to google.com [142.250.184.142] port 80.
debug1: connect to address 142.250.184.142 port 80: Connection timed out
debug1: Connecting to google.com [2a00:1450:4017:80c::200e] port 80.
debug1: connect to address 2a00:1450:4017:80c::200e port 80: Cannot assign requested address
ssh: connect to host google.com port 80: Cannot assign requested address
root@pcfreak:/var/www/images# ssh google.com -p 80 -v
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1g 21 Apr 2020
debug1: Connecting to google.com [172.217.169.206] port 80.
debug1: Connection established.
debug1: identity file /root/.ssh/id_rsa type 0
debug1: identity file /root/.ssh/id_rsa-cert type -1
debug1: identity file /root/.ssh/id_dsa type -1
debug1: identity file /root/.ssh/id_dsa-cert type -1
debug1: identity file /root/.ssh/id_ecdsa type -1
debug1: identity file /root/.ssh/id_ecdsa-cert type -1
debug1: identity file /root/.ssh/id_ed25519 type -1
debug1: identity file /root/.ssh/id_ed25519-cert type -1
debug1: identity file /root/.ssh/id_xmss type -1
debug1: identity file /root/.ssh/id_xmss-cert type -1
debug1: Local version string SSH-2.0-OpenSSH_7.9p1 Debian-10+deb10u2
debug1: ssh_exchange_identification: HTTP/1.0 400 Bad Request

debug1: ssh_exchange_identification: Content-Type: text/html; charset=UTF-8

debug1: ssh_exchange_identification: Referrer-Policy: no-referrer

debug1: ssh_exchange_identification: Content-Length: 1555

debug1: ssh_exchange_identification: Date: Wed, 30 Dec 2020 14:13:25 GMT

debug1: ssh_exchange_identification:

debug1: ssh_exchange_identification: <!DOCTYPE html>

debug1: ssh_exchange_identification: <html lang=en>

debug1: ssh_exchange_identification:   <meta charset=utf-8>

debug1: ssh_exchange_identification:   <meta name=viewport content="initial-scale=1, minimum-scale=1, width=device-width">

debug1: ssh_exchange_identification:   <title>Error 400 (Bad Request)!!1</title>

debug1: ssh_exchange_identification:   <style>

debug1: ssh_exchange_identification:     *{margin:0;padding:0}html,code{font:15px/22px arial,sans-serif}html{background:#fff;color:#222;padding:15px}body{margin:7% auto 0;max-width:390px;min-height:180px;padding:30px 0 15px}* > body{background:url(//www.google.com/images/errors/robot.png) 10
debug1: ssh_exchange_identification: 0% 5px no-repeat;padding-right:205px}p{margin:11px 0 22px;overflow:hidden}ins{color:#777;text-decoration:none}a img{border:0}@media screen and (max-width:772px){body{background:none;margin-top:0;max-width:none;padding-right:0}}#logo{background:url(//www.g
debug1: ssh_exchange_identification: oogle.com/images/branding/googlelogo/1x/googlelogo_color_150x54dp.png) no-repeat;margin-left:-5px}@media only screen and (min-resolution:192dpi){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) no-repeat 0
debug1: ssh_exchange_identification: % 0%/100% 100%;-moz-border-image:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_color_150x54dp.png) 0}}@media only screen and (-webkit-min-device-pixel-ratio:2){#logo{background:url(//www.google.com/images/branding/googlelogo/2x/googlelogo_
debug1: ssh_exchange_identification: color_150x54dp.png) no-repeat;-webkit-background-size:100% 100%}}#logo{display:inline-block;height:54px;width:150px}

debug1: ssh_exchange_identification:   </style>

debug1: ssh_exchange_identification:   <a href=//www.google.com/><span id=logo aria-label=Google></span></a>

debug1: ssh_exchange_identification:   <p><b>400.</b> <ins>That\342\200\231s an error.</ins>

debug1: ssh_exchange_identification:   <p>Your client has issued a malformed or illegal request. <ins>That\342\200\231s all we know.</ins>

ssh_exchange_identification: Connection closed by remote host

Here is another example on how to test remote host whether a certain service such as DNS (bind) or telnetd is enabled and listening on remote local network IP with ssh

[root@pciserver: /home ]# ssh 192.168.1.200 -p 53 -v -oConnectTimeout=5
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1g 21 Apr 2020
debug1: Connecting to 192.168.1.200 [192.168.1.200] port 53.
debug1: connect to address 192.168.1.200 port 53: Connection timed out
ssh: connect to host 192.168.1.200 port 53: Connection timed out

[root@server: /home ]# ssh 192.168.1.200 -p 23 -v -oConnectTimeout=5
OpenSSH_7.9p1 Debian-10+deb10u2, OpenSSL 1.1.1g 21 Apr 2020
debug1: Connecting to 192.168.1.200 [192.168.1.200] port 23.
debug1: connect to address 192.168.1.200 port 23: Connection timed out
ssh: connect to host 192.168.1.200 port 23: Connection timed out

But what if Linux server you have tow work on is so paranoid that you even the ssh client is absent? Well you can use anything else that is capable of doing a connectivity to remote port such as wget or curl. Some web servers or application servers usually have wget or curl as it is integral part for some local shell scripts doing various operation needed for proper services functioning or simply to test locally a local or remote listener services, if that's the case we can use curl to connect and get output of a remote service simulating a normal telnet connection like this:

host:~# curl -vv 'telnet://remote-server-host5:22'
* About to connect() to remote-server-host5 port 22 (#0)
* Trying 10.52.67.21… connected
* Connected to aflpvz625 (10.52.67.21) port 22 (#0)
SSH-2.0-OpenSSH_5.3

Now lets test whether we can connect remotely to a local net remote IP's Qmail mail server with curls telnet simulation mode:

host:~# curl -vv 'telnet://192.168.0.200:25'
* Expire in 0 ms for 6 (transfer 0x56066e5ab900)
* Trying 192.168.0.200…
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x56066e5ab900)
* Connected to 192.168.0.200 (192.168.0.200) port 25 (#0)
220 This is Mail Pc-Freak.NET ESMTP

Fine it works, lets now test whether a remote server who has MySQL listener service on standard MySQL port TCP 3306 is reachable with curl

host:~# curl -vv 'telnet://192.168.0.200:3306'
* Expire in 0 ms for 6 (transfer 0x5601fafae900)
* Trying 192.168.0.200…
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5601fafae900)
* Connected to 192.168.0.200 (192.168.0.200) port 3306 (#0)
Warning: Binary output can mess up your terminal. Use "–output -" to tell
Warning: curl to output it to your terminal anyway, or consider "–output
Warning: <FILE>" to save to a file.
* Failed writing body (0 != 107)
* Closing connection 0
root@pcfreak:/var/www/images# curl -vv 'telnet://192.168.0.200:3306'
* Expire in 0 ms for 6 (transfer 0x5598ad008900)
* Trying 192.168.0.200…
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x5598ad008900)
* Connected to 192.168.0.200 (192.168.0.200) port 3306 (#0)
Warning: Binary output can mess up your terminal. Use "–output -" to tell
Warning: curl to output it to your terminal anyway, or consider "–output
Warning: <FILE>" to save to a file.
* Failed writing body (0 != 107)
* Closing connection 0

As you can see the remote connection is returning binary data which is unknown to a standard telnet terminal thus to get the output received we need to pass curl suggested arguments.

host:~# curl -vv 'telnet://192.168.0.200:3306' –output –
* Expire in 0 ms for 6 (transfer 0x55b205c02900)
* Trying 192.168.0.200…
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x55b205c02900)
* Connected to 192.168.0.200 (192.168.0.200) port 3306 (#0)
g

The curl trick used to troubleshoot remote port to remote host from a Windows OS host which does not have telnet installed by default but have curl instead.

Also When troubleshooting vSphere Replication, it is often necessary to troubleshoot port connectivity as common Windows utilities are not available.
As Curl is available in the VMware vCenter Server Appliance command line interface.

On servers where curl is not there but you have wget is installed you can use it also to test a remote port

# wget -vv -O /dev/null http://google.com:554 –timeout=5
–2020-12-30 16:54:22– http://google.com:554/
Resolving google.com (google.com)… 172.217.169.206, 2a00:1450:4017:80b::200e
Connecting to google.com (google.com)|172.217.169.206|:554… failed: Connection timed out.
Connecting to google.com (google.com)|2a00:1450:4017:80b::200e|:554… failed: Cannot assign requested address.
Retrying.

–2020-12-30 16:54:28– (try: 2) http://google.com:554/
Connecting to google.com (google.com)|172.217.169.206|:554… ^C

As evident from output the port 554 is filtered in google which is pretty normal.

If curl or wget is not there either as a final alternative you can either install some perl, ruby, python or bash script etc. that can opens a remote socket to the remote IP.

Tags: address, common, connection, connectivity, curl, DNS, file, Google Yahoo, Hack Using, host, identity, importance, pci, port, servers, smtp, ssh, state, tcp, telnet connection, test, www
Posted in Curious Facts, Hacks, System Administration, Various | No Comments »

Add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers

Tuesday, December 8th, 2020

How to add Zabbix time synchronization ntp userparameter check script to Monitor Linux servers?

We needed to set on some servers at my work an elementary check with Zabbix monitoring to check whether servers time is correctly synchronized with ntpd time service as well report if the ntp daemon is correctly running on the machine. For that a userparameter script was developed called userparameter_ntp.conf the script is simplistic and few a lines of bash shell scripting
stuff is based on gresping information required from ntpq and ntpstat common ntp client commands to get information about the status of time synchronization on the servers.

[root@linuxserver ]# ntpstat
synchronised to NTP server (10.80.200.30) at stratum 3
time correct to within 47 ms
polling server every 1024 s

[root@linuxserver ]# ntpq -c peers
     remote           refid      st t when poll reach   delay   offset jitter
==============================================================================
+timeserver1 10.26.239.41     2 u 319 1024 377   15.864    1.270   0.262
+timeserver2 10.82.239.41     2 u 591 1024 377   16.287   -0.334   1.748
*timeserver3 10.82.239.43     2 u   47 1024 377   15.613   -0.553   0.251
timeserver4 .INIT.          16 u    – 1024    0    0.000    0.000   0.000

Below is Zabbix UserParameter script that does report us 3 important values we monitor to make sure time server synchronization works as expected the zabbix keys we set are ntp.offset, ntp.sync, ntp.exact in attempt to describe what we're fetching from ntp client:

[root@linuxserver ]# cat /etc/zabbix/zabbix-agent.d/userparameter_ntp.conf

UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'BEGIN { offset=1000 } $1 ~ /\*/ { offset=$9 } END { print offset }')
#UserParameter=ntp.offset,(/usr/sbin/ntpq -pn | /usr/bin/awk 'FNR==4{print $9}')
UserParameter=ntp.sync,(/usr/bin/ntpstat | cut -f 1 -d " " | tr -d ' \t\n\r\f')
UserParameter=ntp.exact,(/usr/bin/ntpstat | /usr/bin/awk 'FNR==2{print $5,$6}')

In Zabbix the monitored ntpd parameters set-upped looks like this:

ntp_time_synchronization_check-zabbix-screenshot.

!Note that in above userparameter example, the commented userparameter script is a just another way to do an ntpd offset returned value which was developed before the more sophisticated with more regular expression checks from the /usr/sbin/ntpd via ntpq, perhaps if you want to extend it you can also use another script to report more verbose information to Zabbix if that is required like ouput from ntpq -c peers command:

UserParameter=ntp.verbose,(/usr/sbin/ntpq -c peers)

Of course to make the Zabbix fetch necessery data from monitored hosts, we need to set-up further new Zabbix Template with the respective Trigger and Items.

Below are few screenshots including the triggers used.

ntpd_server-time_synchronization_check-zabbix-screenshot-triggers

ntpd.trigger

{NTP:net.udp.service[ntp].last(0)}<1

NTP Synchronization trigger

{NTP:ntp.sync.iregexp(unsynchronised)}=1

As you can see from history we have setup our items to Store history of reported data to Zabbix from parameter script for 90 days and update our monitor check, every 30 seconds from the monitored hosts to which Tempate is applied.

Well that's all folks, time synchronization issues we'll be promptly triggering a new Alarm in Zabbix !

Tags: Add Zabbix, checks, command, How to, net, ntpd, sbin, script, servers, time server
Posted in Linux, Monitoring, System Administration, Zabbix | No Comments »

☩ Walking in Light with Christ – Faith, Computing, Diary

Posts Tagged ‘servers’

How to monitor Haproxy Application server backends with Zabbix userparameter autodiscovery scripts

1. Create the required haproxy_discovery.sh and haproxy_stats.sh scripts

2. Create the userparameter_haproxy_backend.conf

3. Create new simple template for the Application backend Monitoring and link it to monitored host

4. Restart Zabbix-agent, in while check autodiscovery data is in Zabbix Server

Create Linux High Availability Load Balancer Cluster with Keepalived and Haproxy on Linux

1. Install keepalived and haproxy on machines

2. Configure haproxy (haproxy.cfg) on both server1 and server2

…

3. Configure keepalived on both servers

4. Test Keepalived keepalived.conf configuration syntax is OK

5. Prepare rsyslog configuration and Inlcude additional keepalived options
to force keepalived log into /var/log/keepalived.log

6. Monitoring VRRP traffic of the two keepaliveds with tcpdump

7. Testing keepalived on server1 and server2 maachines VIP floating IP really works

What we learned?

Hack: Using ssh / curl or wget to test TCP port connection state to remote SSH, DNS, SMTP, MySQL or any other listening service in PCI environment servers

Daily Bible quote

GET ARTICLE UPDATES

Useful blog? Help it:

Links to Other Places

Recent Posts

Ads

Categories

About Myself

Recent Comments

Top Post Views

blogtopsites

Posts Tagged ‘servers’

1. Configure aide to monitor files for changes

2. Setup Zabbix Template with Items, Triggers and set Action

2.2 Configure Item like

2.3 Create Triggers with the respective regular expressions, that would check the aide generated log file for file modifications

2.4 Configure Action

1. Make backups of most important files of high importance

2. Clear up all unnecessery services scripts from the server

3. Clear up old log files and any files unnecessery

4. Trigger an automatic system file system check with fsck on next boot

5. Hints on checking the hard drives with fsck

6. Using tune2fs to adjust tunable filesystem parameters on ext2/ext3/ext4 filesystems (few examples)

7. Repairing disk / partitions via GRUB fsck.mode and fsck.repair kernel module options

8. Few more details on how /etc/fstab disk fsck check parameters values for Systemd Linux machines works

Close it up

1. Create the required haproxy_discovery.sh and haproxy_stats.sh scripts

2. Create the userparameter_haproxy_backend.conf

3. Create new simple template for the Application backend Monitoring and link it to monitored host

4. Restart Zabbix-agent, in while check autodiscovery data is in Zabbix Server

1. Install keepalived and haproxy on machines

2. Configure haproxy (haproxy.cfg) on both server1 and server2

…

3. Configure keepalived on both servers

4. Test Keepalived keepalived.conf configuration syntax is OK

5. Prepare rsyslog configuration and Inlcude additional keepalived options to force keepalived log into /var/log/keepalived.log

6. Monitoring VRRP traffic of the two keepaliveds with tcpdump

7. Testing keepalived on server1 and server2 maachines VIP floating IP really works

What we learned?

1. On Server Host A encode with md5sum command

2. Encode the binary file with base64 encoding

3. Cat the inputbinfile-to-encode just generated to display the text encoded file in your SecureCRT / Putty / SuperPutty etc. remote ssh access client

4. Select the cat-ted string and copy it to your PC Copy / Paste buffer

5. On Server Host B paste the bas64 encoded binary inside a newly created file

6. Decode the encoded binary with base64 cmd again

7. Set proper file permissions (the same as on Host A)

8. Check again the binary file checksum on Host B is identical as on Host A

9. Encoding and decoding files with uuencode

Conclusion

“Cannot open /var/log/sysstat/sa18: No such file or directory. Please check if data collecting is enabled”

1. Check with sysstat machine history SWAP and RAM Memory use

2. Check system load? Are my processes waiting too long to run on the CPU?

3. Show various CPU statistics per CPU use

4. Report paging statistics for some old period

5. Monitor Received RX and Transmitted TX network traffic perl Network interface real time

6. Monitor block devices use

7. Output server monitoring data in CSV database structured format

What we've learned?

1. The http_proxy and https_proxy shell variables

2. Setting passwordless or password protected proxy host via http_proxy, https_proxy variables

3. Making yum proxy connection permanent via /etc/yum.conf

4. Listing RPM repositories and their state

1. Disable completely Network Manager service and mask it

2. Check if all systemd networkmanager components scripts are really disabled

3. Install / Enable old (legacy) network-scripts

4. Test if network-scripts is really enabled

5. Disable legacy use network-scripts print outs

6. Use network.service old-fashioned systemd service

7. Disable NetworkManager use from Network configuration scripts ifcfg-* for all server available configured ethernet cards

Daily Bible quote

GET ARTICLE UPDATES

Useful blog? Help it:

Links to Other Places

Recent Posts

Ads

Categories

About Myself

Recent Comments

Tags

Top Post Views

blogtopsites

5. Prepare rsyslog configuration and Inlcude additional keepalived options
to force keepalived log into /var/log/keepalived.log

“Cannot open /var/log/sysstat/sa18:
No such file or directory. Please check if data collecting is enabled”

**7. Disable NetworkManager use from Network configuration scripts ifcfg-* for all server available configured ethernet cards**