Posts Tagged ‘log’

Zabbix script to track arp address cache loss (arp incomplete) from Linux server to gateway IP

Tuesday, January 30th, 2024

Zabbix_arp-network-incomplete-check-logo.svg

Some of the Linux servers recently, I'm responsible had a very annoying issue recently. The problem is ARP address to default configured server gateway is being lost, every now and then and it takes up time, fot the remote CISCO router to realize the problem and resolve it. We have debugged with the Network expert colleague, while he was checking the Cisco router and we were checking the arp table on the Linux server with arp command. And we came to conclusion this behavior is due to some network mess because of too many NAT address configurations on the network or due to a Cisco bug. The colleagues asked Cisco but cisco does not have any solution to the issue and the only close work around for the gateway loosing the mac is to set a network rule on the Cisco router to flush its arp record for the server it was loosing the MAC address for.
This does not really solve completely the problem but at least, once we run into the issue, it gets resolved as quick as 5 minutes time. }

As we run a cluster environment it is useful to Monitor and know immediately once we hit into the MAC gateway disappear issue and if the issue persists, exclude the Linux node from the Cluster so we don't loose connection traffic.
For the purpose of Monitoring MAC state from the Linux haproxy machine towards the Network router GW, I have developed a small userparameter script, that is periodically checking the state of the MAC address of the IP address of remote gateway host and log to a external file for any problems with incomplete MAC address of the Remote configured default router.

In case if you happen to need the same MAC address state monitoring for your servers, I though that might be of a help to anyone out there.
To monitor MAC address incomplete state with Zabbix, do the following:
 

1. Create  userparamater_arp_gw_check.conf Zabbix script
 

# cat userparameter_arp_gw_check.conf 
UserParameter=arp.check,/usr/local/bin/check_gw_arp.sh

 

2. Create the following shell script /usr/local/bin/check_gw_arp.sh

 

#!/bin/bash
# simple script to run on cron peridically or via zabbix userparameter
# to track arp loss issues to gateway IP
#gw_ip='192.168.0.55';
gw_ip=$(ip route show|grep -i default|awk '{ print $3 }');
log_f='/var/log/arp_incomplete.log';
grep_word='incomplete';
inactive_status=$(arp -n "$gw_ip" |grep -i $grep_word);
# if GW incomplete record empty all is ok
if [[ $inactive_status == ” ]]; then 
echo $gw_ip OK 1; 
else 
# log inactive MAC to gw_ip
echo "$(date '+%Y-%m-%d %H:%M:%S')" "ARP_ERROR $inactive_status 0" | tee -a $log_f 2>&1 >/dev/null;
# printout to zabbix
echo "1 ARP FAILED: $inactive_status"; 
fi

You can download the check_gw_arp.sh here.

The script is supposed to automatically grep for the Default Gateway router IP, however before setting it up. Run it and make sure this corresponds correctly to the default Gateway IP MAC you would like to monitor.
 

3. Create New Zabbix Template for ARP incomplete monitoring
 

arp-machine-to-default-gateway-failure-monitoring-template-screenshot

Create Application 

*Name
Default Gateway ARP state

4. Create Item and Dependent Item 
 

Create Zabbix Item and Dependent Item like this

arp-machine-to-default-gateway-failure-monitoring-item-screenshot

 

arp-machine-to-default-gateway-failure-monitoring-item1-screenshot

arp-machine-to-default-gateway-failure-monitoring-item2-screenshot


5. Create Trigger to trigger WARNING or whatever you like
 

arp-machine-to-default-gateway-failure-monitoring-trigger-screenshot


arp-machine-to-default-gateway-failure-monitoring-trigger1-screenshot

arp-machine-to-default-gateway-failure-monitoring-trigger2-screenshot


6. Create Zabbix Action to notify via Email etc.
 

arp-machine-to-default-gateway-failure-monitoring-action1-screenshot

 

arp-machine-to-default-gateway-failure-monitoring-action2-screenshot

That's all. Once you set up this few little things, you can enjoy having monitoring Alerts for your ARP state incomplete on your Linux / Unix servers.
Enjoy !

Install Zabbix Proxy configure and connect to Zabbix server on CentOS Linux

Thursday, May 4th, 2023

Install Zabbix Proxy configure and connect to Zabbix server on CentOS Linux

1. Why use Zabbix-Proxy hidden advantages of using Zabbix-Proxy ?
 

Proxy can be used for many purposes and can provide many hidden benefits, just to name few of them:

  • Offload Zabbix Server when monitoring thousands of devices
  • Monitor remote locations
  • Monitor locations having unreliable communications
  • Simplify maintenance of distributed monitoring
  • Improved Security (Zabbix server can be restricted to be connectable only by the set of connected Zabbix Proxy / Proxies


advantages-of-using-zabbix-proxy-instead-of-direct-connect-monitored-hosts-to-zabbix-server-diagram

 

A Zabbix proxy is the ideal solution if you have numerous hosts with multiple slow items that are affecting the performance of the server simply because processes are spending most of the time simply waiting for a response. A proxy can collect information from all hosts using its internal processes and then send raw historical data to the server. The time needed to connect and receive the host response will be on the proxy site, and the server performance will not be affected at all. A proxy just sends raw values to the server, and the server itself does not have to connect to the host to get the data.
 

2. Install zabbix-proxy-sqlite3 rpm package from Zabbix Official Repositories download page

Zabbix repository provides choice of 3 packages named as follows:

zabbix-proxy-mysql
zabbix-proxy-pgsql
zabbix-proxy-sqlite3

where the last value of the name (after zabbix-proxy) represents database type of the package — MySQL, PostgreSQL and SQLite respectively.

To not bother installing MySQL / PostgreSQL separate database servers, a lightweight choice is to use the sqlite3 db version. 
As I prefer zabbix-proxy data to be stored inside a flat database, thus I choose to use zabbix-proxy-sqlite3.

[root@sysadminshelp:/root ]# yum info zabbix-proxy-sqlite3-5.0.31-1.el7.x86_64
Заредени плъгини: fastestmirror
Loading mirror speeds from cached hostfile
 * base: mirrors.netix.net
 * epel: fedora.ipacct.com
 * extras: mirrors.netix.net
 * remi: remi.mirror.karneval.cz
 * remi-php74: remi.mirror.karneval.cz
 * remi-safe: remi.mirror.karneval.cz
 * updates: mirrors.netix.net
Инсталирани пакети
Име         : zabbix-proxy-sqlite3
Архитект.   : x86_64
Версия      : 5.0.31
Издание     : 1.el7
Обем        : 4.4 M
Хранилище   : installed
Обобщение   : Zabbix proxy for SQLite3 database
URL         : http://www.zabbix.com/
Лиценз      : GPLv2+
Описание    : Zabbix proxy with SQLite3 database support.

My experience to try to install thethe default CentOS RPM package for zabbix-proxy-sqlite3 provided by default
RPM package that came with CentOS did not work as expected and trying to install / configure and use it via

[root@sysadminshelp:/root ]# yum install zabbix-proxy-sqlite3.x86_64 -y

[root@sysadminshelp:/root ]# vi /etc/zabbix/zabbix_proxy.conf


Led me to a nasty errors seen in /var/log/zabbixsrv/zabbix_proxy.log like:

May 1st 2023, 08:42:45.020 zabbix_server cannot set list of PSK ciphersuites: file ssl_lib.c line 1314: error:1410D0B9:SSL routines:SSL_CTX_set_cipher_list:no cipher match
May 1st 2023, 08:42:45.018 zabbix_server cannot set list of PSK ciphersuites: file ssl_lib.c line 1314: error:1410D0B9:SSL routines:SSL_CTX_set_cipher_list:no cipher match
May 1st 2023, 08:42:45.013 zabbix_server cannot set list of PSK ciphersuites: file ssl_lib.c line 1314: error:1410D0B9:SSL routines:SSL_CTX_set_cipher_list:no cipher match
May 1st 2023, 08:42:45.013 zabbix_server cannot set list of PSK ciphersuites: file ssl_lib.c line 1314: error:1410D0B9:SSL routines:SSL_CTX_set_cipher_list:no cipher match
May 1st 2023, 08:42:45.011 zabbix_server cannot set list of PSK ciphersuites: file ssl_lib.c line 1314: error:1410D0B9:SSL routines:SSL_CTX_set_cipher_list:no cipher match


After some googling and reading some threads came upon this one https://support.zabbix.com/browse/ZBXNEXT-3604, there is exmplaed errors preventing the configured zabbix-proxy
to start are caused by the zabbix-proxy-sqlite3 package provided by Redhat (due to openssl incompitability bug or something ).

As one of people in the discussion pointed out the quickest workaround suggested is simply to use the official Zabbix Repository packages for zabbix-proxy-sqlite3, in order to not waste anymore time on this
trivial stuff to install it, simply run:

[root@sysadminshelp:/root ]# rpm -Uvh \
https://repo.zabbix.com/zabbix/5.0/rhel/7/x86_64/zabbix-proxy-sqlite3-5.0.31-1.el7.x86_64.rpm

Alternative way if you seem to not have the machine connected to the internet is simply download the package with wget / lynx / curl / w3m from another machine 
that can reach the Internet upload the package via the local LAN or VPN and install it:

# wget https://repo.zabbix.com/zabbix/5.0/rhel/7/x86_64/zabbix-proxy-sqlite3-5.0.31-1.el7.x86_64.rpm

[root@sysadminshelp:/root ]# rpm -ivh zabbix-proxy-sqlite3-5.0.31-1.el7.x86_64.rpm

NOTE ! Before you install proxy, keep in mind that your proxy version must match the Zabbix server version !

3. Generate a PSK random secret key and set proper permissions for zabbix-proxy directories


[root@sysadminshelp:/root ]# cd /etc/zabbix/
    
[root@sysadminshelp:/root ]# openssl rand -hex 32 >> /etc/zabbix/zabbix_proxy.psk     
[root@sysadminshelp:/root ]# chown root:zabbix zabbix_proxy.psk [root@sysadminshelp:/root ]# vi /etc/zabbix/zabbix_proxy.conf [root@sysadminshelp:/root ]# mkdir -p /var/lib/zabbix-proxy/sqlite3db
[root@sysadminshelp:/root ]# chown -R zabbix:zabbix /var/lib/zabbix-proxy
[root@sysadminshelp:/var/lib/zabbixsrv/sqlite3db]# sqlite3 zabbix_proxy
SQLite version 3.7.17 2013-05-20 00:56:22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> .databases
seq  name             file
—  —————  ———————————————————-
0    main             /var/lib/zabbixsrv/sqlite3db/zabbix_proxy
sqlite>
[root@sysadminshelp:/root ]# vi /etc/zabbix_proxy.conf
#DBName=zabbix_proxy
DBName=/var/lib/zabbixsrv/sqlite3db/zabbix_proxy

4. Configure zabbix proxy to be able to connect to Zabbix Server

[root@sysadminshelp:/root ]#  vi /etc/zabbix/zabbix_proxy.conf     ############ GENERAL PARAMETERS #################
    ProxyMode=0
    Server=192.168.1.28
    ServerPort=10051
    Hostname=zabbix-proxy
    ListenPort=10051
    SourceIP=10.168.1.55
    LogFile=/var/log/zabbix/zabbix_proxy.log
    LogFileSize=1
    DebugLevel=2
    PidFile=/var/run/zabbix/zabbix_proxy.pid
    DBName=/var/lib/zabbix-proxy/sqlite3db/zabbix_proxy.db
    DBUser=zabbix
    
    ######### PROXY SPECIFIC PARAMETERS #############
    ProxyOfflineBuffer=24
    HeartbeatFrequency=60
    ConfigFrequency=120
    
    ############ ADVANCED PARAMETERS ################
    StartPollersUnreachable=3
    StartHTTPPollers=3
    JavaGateway=127.0.0.1
    JavaGatewayPort=10052
    StartJavaPollers=5
    SNMPTrapperFile=/var/log/snmptrap/snmptrap.log
    StartSNMPTrapper=1
    CacheSize=32M
    Timeout=4
    ExternalScripts=/usr/lib/zabbix/externalscripts
    LogSlowQueries=3000
    
    ####### TLS-RELATED PARAMETERS #######
    TLSConnect=psk
    TLSAccept=psk
    TLSPSKIdentity=PSK zabbix-proxy-fqdn-hostname
    TLSPSKFile=/etc/zabbix/zabbix_proxy.psk

5. Check and make sure the installed zabbix proxy as well as the zabbix_proxy server zabbix_agentd client and zabbix_server are at the same major version release

a) Check zabbix proxy version

[root@sysadminshelp:/etc/zabbix]# zabbix_proxy -V
zabbix_proxy (Zabbix) 5.0.31
Revision f64a07aefca 30 January 2023, compilation time: Jan 30 2023 09:55:10

Copyright (C) 2023 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <https://www.gnu.org/licenses/>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.0.1e-fips 11 Feb 2013
Running with OpenSSL 1.0.1e-fips 11 Feb 2013

[root@sysadminshelp:/etc/zabbix]#

b) check zabbix_agentd version

[root@sysadminshelp:/etc/zabbix]# zabbix_agentd -V
zabbix_agentd (daemon) (Zabbix) 5.0.30
Revision 2c96c38fb4b 28 November 2022, compilation time: Nov 28 2022 11:27:43

Copyright (C) 2022 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <https://www.gnu.org/licenses/>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.0.1e-fips 11 Feb 2013
Running with OpenSSL 1.0.1e-fips 11 Feb 2013

c) Check zabbix server version

[root@zabbix:~]# zabbix_server -V
zabbix_server (Zabbix) 5.0.30
Revision 2c96c38fb4b 28 November 2022, compilation time: Nov 28 2022 09:19:03

Copyright (C) 2022 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <https://www.gnu.org/licenses/>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.1.1d  10 Sep 2019
Running with OpenSSL 1.1.1n  15 Mar 2022

6. Starting the zabbix-proxy for a first time

Before beginning with installation make sure selinux is disabled, as it might cause some issues with Zabbix

[root@sysadminshelp:/etc/zabbix]# sestatus
SELinux status:                 disabled

If you need to have the selinux enabled you will have to allow the zabbix-proxy into selinux as well:

cd /tmp
# grep zabbix_proxy /var/log/audit/audit.log | grep denied | audit2allow -m zabbix_proxy > zabbix_proxy.te
grep zabbix_proxy /var/log/audit/audit.log | grep denied | audit2allow -M zabbix_proxy
semodule -i zabbix_proxy.pp


[root@sysadminshelp:/etc/zabbix]# systemctl start zabbix-proxy

Also lets enable zabbix-proxy to automatically start it on next server reboot / boot.

root@sysadminshelp:/etc/zabbix]# systemctl enable zabbix-proxy

Normally running zabbix-proxy should provide a status messages like:

[root@sysadminshelp:/etc/zabbix]# systemctl status zabbix-proxy
● zabbix-proxy.service – Zabbix Proxy
   Loaded: loaded (/usr/lib/systemd/system/zabbix-proxy.service; disabled; vendor preset: disabled)
   Active: active (running) since чт 2023-05-04 14:58:36 CEST; 2h 59min ago
  Process: 8500 ExecStop=/bin/kill -SIGTERM $MAINPID (code=exited, status=0/SUCCESS)
  Process: 8504 ExecStart=/usr/sbin/zabbix_proxy -c $CONFFILE (code=exited, status=0/SUCCESS)
 Main PID: 8506 (zabbix_proxy)
   CGroup: /system.slice/zabbix-proxy.service
           ├─8506 /usr/sbin/zabbix_proxy -c /etc/zabbix/zabbix_proxy.conf
           ├─8507 /usr/sbin/zabbix_proxy: configuration syncer [synced config 40521 bytes in 0.0…
           ├─8508 /usr/sbin/zabbix_proxy: trapper #1 [processed data in 0.000808 sec, waiting fo…
           ├─8509 /usr/sbin/zabbix_proxy: trapper #2 [processed data in 0.005028 sec, waiting fo…
           ├─8510 /usr/sbin/zabbix_proxy: trapper #3 [processed data in 0.001240 sec, waiting fo…
           ├─8511 /usr/sbin/zabbix_proxy: trapper #4 [processed data in 0.004378 sec, waiting fo…
           ├─8512 /usr/sbin/zabbix_proxy: trapper #5 [processed data in 0.004991 sec, waiting fo…
           ├─8513 /usr/sbin/zabbix_proxy: preprocessing manager #1 [queued 0, processed 3 values…
           ├─8514 /usr/sbin/zabbix_proxy: preprocessing worker #1 started
           ├─8515 /usr/sbin/zabbix_proxy: preprocessing worker #2 started
           ├─8516 /usr/sbin/zabbix_proxy: preprocessing worker #3 started
           ├─8517 /usr/sbin/zabbix_proxy: heartbeat sender [sending heartbeat message success in…
           ├─8518 /usr/sbin/zabbix_proxy: data sender [sent 0 values in 0.005241 sec, idle 1 sec…
           ├─8519 /usr/sbin/zabbix_proxy: housekeeper [deleted 4501 records in 0.011462 sec, idl…
           ├─8520 /usr/sbin/zabbix_proxy: http poller #1 [got 0 values in 0.000248 sec, idle 5 s…
           ├─8521 /usr/sbin/zabbix_proxy: http poller #2 [got 0 values in 0.000239 sec, idle 5 s…
           ├─8522 /usr/sbin/zabbix_proxy: http poller #3 [got 0 values in 0.000328 sec, idle 5 s…
           ├─8523 /usr/sbin/zabbix_proxy: discoverer #1 [processed 0 rules in 0.000261 sec, idle…
           ├─8524 /usr/sbin/zabbix_proxy: history syncer #1 [processed 0 values in 0.000009 sec,…
           ├─8525 /usr/sbin/zabbix_proxy: history syncer #2 [processed 0 values in 0.000007 sec,…
           ├─8526 /usr/sbin/zabbix_proxy: history syncer #3 [processed 0 values in 0.000014 sec,…
           ├─8527 /usr/sbin/zabbix_proxy: history syncer #4 [processed 0 values in 0.000021 sec,…
           ├─8528 /usr/sbin/zabbix_proxy: java poller #1 [got 0 values in 0.000017 sec, idle 5 s…
           ├─8529 /usr/sbin/zabbix_proxy: java poller #2 [got 0 values in 0.000019 sec, idle 5 s…
           ├─8530 /usr/sbin/zabbix_proxy: java poller #3 [got 0 values in 0.000019 sec, idle 5 s…
           ├─8531 /usr/sbin/zabbix_proxy: java poller #4 [got 0 values in 0.000018 sec, idle 5 s…
           ├─8532 /usr/sbin/zabbix_proxy: java poller #5 [got 0 values in 0.000013 sec, idle 5 s…
           ├─8533 /usr/sbin/zabbix_proxy: snmp trapper [processed data in 0.000026 sec, idle 1 s…
           ├─8534 /usr/sbin/zabbix_proxy: self-monitoring [processed data in 0.000034 sec, idle …
           ├─8535 /usr/sbin/zabbix_proxy: task manager [processed 0 task(s) in 0.000169 sec, idl…
           ├─8536 /usr/sbin/zabbix_proxy: poller #1 [got 0 values in 0.000012 sec, idle 5 sec]
           ├─8537 /usr/sbin/zabbix_proxy: poller #2 [got 0 values in 0.000021 sec, idle 5 sec]
           ├─8538 /usr/sbin/zabbix_proxy: poller #3 [got 0 values in 0.000039 sec, idle 5 sec]
           ├─8539 /usr/sbin/zabbix_proxy: poller #4 [got 0 values in 0.000024 sec, idle 5 sec]
           ├─8540 /usr/sbin/zabbix_proxy: poller #5 [got 0 values in 0.000019 sec, idle 5 sec]
           ├─8541 /usr/sbin/zabbix_proxy: unreachable poller #1 [got 0 values in 0.000011 sec, i…
           ├─8542 /usr/sbin/zabbix_proxy: unreachable poller #2 [got 0 values in 0.000018 sec, i…
           ├─8543 /usr/sbin/zabbix_proxy: unreachable poller #3 [got 0 values in 0.000041 sec, i…
           └─8544 /usr/sbin/zabbix_proxy: icmp pinger #1 [got 0 values in 0.000022 sec, idle 5 s…

май 04 14:58:36 sysadminshelp systemd[1]: Stopped Zabbix Proxy.
май 04 14:58:36 sysadminshelp systemd[1]: Starting Zabbix Proxy…
май 04 14:58:36 sysadminshelp systemd[1]: Started Zabbix Proxy.

zabbix-server-zabbix-proxy-and-zabbix-clients-overview-diagram

7. Configure zabbix-agentd to use your just new brand new zabbix-proxy

Here is my sample configuration file:

[root@sysadminshelp:/etc/zabbix]# grep -v \# /etc/zabbix/zabbix_agentd.conf | sed '/^$/d'
PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=zabbix-proxy
ServerActive=zabbix-proxy:10051
ListenIP
Hostname=sysadminshelp
BufferSend=30
BufferSize=100
Include=/etc/zabbix/zabbix_agentd.d/*.conf


Note that the ServerActive given "zabbix-proxy" should be resolvable from the host, or even better you might want to put the IP of the Proxy if
you don't have at least a pseudo Hostname already configured inside /etc/hosts or actual DNS 'A' Active record configured inside a properly resolving
DNS server configured on the host via /etc/resolv.conf.


8. Create and Configure new proxy into the Zabbix-server host
 

Go to the zabbix server web interface URL into menus:

zabbix-administration-proxy-config
 

Administration -> Proxies (Proxy) 


Click on ;

Create Proxy button (uppper right corner)

*Proxy name: usually-your-host-pingable-fqdn
Proxy mode: Active
Proxy address: 192.168.1.50
Description: pcfreak zabbix proxy


Administration -> Proxies -> Encryption


From "Connection to proxy"

Untick "No encryption"

and

Tick "PSK"


zabbix-administration-proxy-config-encryption

*PSK Identity: PSK proxy
*PSK: Put the key here (copy from /etc/zabbix/zabbix_proxy.psk generated steps earlier with openssl)

[root@sysadminshelp:/etc/zabbix]# cat zabbix_proxy.psk
faddbd96be00ac42c892fda5201634df25d51f3ndbbbf6cee9d354b2817092a28

Press the "Update" Button

zabbix-administration-proxy-config-encryption1

and go again to Proxies and check the zabbix-proxy is connected to the server and hosts configured to use the zabbix proxy reporting frequently.

To make sure that the configured new hosts to use the Zabbix Proxy instead of direct connection to Zabbix Server, go to Latest Data and check whether the configured Hostnames to connect to the Zabbix-Proxy continues to sent Data still.

9. Debugging problems with zabix-proxy and zabbix-agentd connectivity to proxy

In case of troubles check out what is going on inside the Zabbix Proxy / Agent and Server log files
 

[root@sysadminshelp:/etc/zabbix]# tail -n 50 /var/log/zabbix/zabbix_proxy.log

 6832:20230504:134032.281 Starting Zabbix Proxy (active) [zabbix-proxy]. Zabbix 5.0.31 (revision f
64a07aefca).
  6832:20230504:134032.281 **** Enabled features ****
  6832:20230504:134032.281 SNMP monitoring:       YES
  6832:20230504:134032.281 IPMI monitoring:       YES
  6832:20230504:134032.281 Web monitoring:        YES
  6832:20230504:134032.281 VMware monitoring:     YES
  6832:20230504:134032.281 ODBC:                  YES
  6832:20230504:134032.281 SSH support:           YES
  6832:20230504:134032.281 IPv6 support:          YES
  6832:20230504:134032.281 TLS support:           YES
  6832:20230504:134032.281 **************************
  6832:20230504:134032.281 using configuration file: /etc/zabbix/zabbix_proxy.conf
  6832:20230504:134032.291 current database version (mandatory/optional): 05000000/05000005
  6832:20230504:134032.291 required mandatory version: 05000000
  6832:20230504:134032.292 proxy #0 started [main process]
  6833:20230504:134032.292 proxy #1 started [configuration syncer #1]
  6833:20230504:134032.329 received configuration data from server at "192.168.1.28", datalen 40521
  6834:20230504:134032.392 proxy #2 started [trapper #1]
  6835:20230504:134032.401 proxy #3 started [trapper #2]
  6836:20230504:134032.402 proxy #4 started [trapper #3]
  6838:20230504:134032.405 proxy #6 started [trapper #5]
  6837:20230504:134032.409 proxy #5 started [trapper #4]
  6843:20230504:134032.409 proxy #11 started [heartbeat sender #1]
  6845:20230504:134032.412 proxy #13 started [housekeeper #1]
  6847:20230504:134032.412 proxy #15 started [discoverer #1]
  8526:20230504:145836.512 proxy #20 started [history syncer #3]
  8517:20230504:145836.512 proxy #11 started [heartbeat sender #1]
  8530:20230504:145836.515 proxy #24 started [java poller #3]
  8531:20230504:145836.517 proxy #25 started [java poller #4]
  8532:20230504:145836.520 proxy #26 started [java poller #5]
  8536:20230504:145836.522 proxy #30 started [poller #1]
  8527:20230504:145836.525 proxy #21 started [history syncer #4]
  8535:20230504:145836.525 proxy #29 started [task manager #1]
  8533:20230504:145836.528 proxy #27 started [snmp trapper #1]
  8539:20230504:145836.528 proxy #33 started [poller #4]
  8538:20230504:145836.529 proxy #32 started [poller #3]
  8534:20230504:145836.532 proxy #28 started [self-monitoring #1]
  8544:20230504:145836.532 proxy #38 started [icmp pinger #1]
  8543:20230504:145836.532 proxy #37 started [unreachable poller #3]
  8542:20230504:145836.535 proxy #36 started [unreachable poller #2]
  8541:20230504:145836.537 proxy #35 started [unreachable poller #1]
  8540:20230504:145836.540 proxy #34 started [poller #5]
  8507:20230504:150036.453 received configuration data from server at "192.168.1.28", datalen 40521
  8507:20230504:150236.503 received configuration data from server at "192.168.1.28", datalen 40521
  8507:20230504:150436.556 received configuration data from server at "192.168.1.28", datalen 40521
  8507:20230504:150636.608 received configuration data from server at "192.168.1.28", datalen 40521
  8507:20230504:150836.662 received configuration data from server at "192.168.1.28", datalen 40521

 

[root@sysadminshelp:/etc/zabbix]# tail -n 10  /var/log/zabbix-agent/zabbix_agentd.log
3096166:20230504:182840.461 agent #1 started [collector]
3096167:20230504:182840.462 agent #2 started [listener #1]
3096168:20230504:182840.463 agent #3 started [listener #2]
3096169:20230504:182840.464 agent #4 started [listener #3]
3096170:20230504:182840.464 agent #5 started [active checks #1]

If necessery to Debug further and track some strange errors, you might want to increase the DebugLevel to lets say DebugLevel=5

5 – extended debugging (produces even more information)

If checking both zabbix_agentd.log and zabbix_proxy.log cannot give you enough of a hint on what might be the issues you face with your userparameter scripts or missing Monitored data etc. and hopefully you have access to the zabbix-server machine, check out the zabbix server log as well

[root@zabbix:~]# tail -n 100 /var/log/zabbix/zabbix_server.log

3145027:20230504:182641.556 sending configuration data to proxy "zabbix-proxy" at "192.168.1.50", datalen 40521, bytes 6120 with compression ratio 6.6
3145029:20230504:182716.529 cannot send list of active checks to "192.168.1.30": host [pcfrxenweb] not found
3145028:20230504:182731.959 cannot send list of active checks to "192.168.1.30": host [pcfrxenweb] not found
3145029:20230504:182756.634 cannot send list of active checks to "192.168.1.30": host [pcfrxenweb] not found

Wrapping it up

In this article, we have learned how to install and configure a zabbix-proxy server and prepare a PSK encryption secret key for it.
We learned also  how to connect this server to the central zabbix monitoring host machine in Active mode, so both Zabbix proxy and server can communicate in a secure crypted form,
as well as how to set zabbix_agentd clients to connect to the zabbix proxy
which will from itself send its data to the Central Zabbix server host as well as how to Debug and hopefully solve issues with communication between Zabbix client -> Zabbix Proxy -> Zabbix server.

I know this article, does not say anything revolutionary and there is plenty of posts online talking about how to run yourself a zabbix proxy and make in your home or corporate network,
but I thought to write it down as by writting it and reading a bit more on the topic of Zabbix Server / Proxy / Agent, that give myself a better overview on how this technologies work and such an article will give myself an easier step by step guide to follow,
in future when I have to configure Zabbix Environments for personal hobby or professionally for customers.
Hope you enjoyed. Cheers ! 🙂

How to disable haproxy log for certain frontend / backend or stop haproxy logging completely

Wednesday, September 14th, 2022

haproxy-disable-logging-for-single-frontend-or-backend-or-stop-message-logging-completely-globally

In my previous article I've shortly explained on how it is possible to configure multiple haproxy instances to log in separate log files as well as how to configure a specific frontend to log inside a separate file. Sometimes it is simply unnecessery to keep any kind of log file for haproxy to spare disk space or even for anonymity of traffic. Hence in this tiny article will explain how to disable globally logging for haproxy and how logging for a certain frontend or backend could be stopped.

1. Disable globally logging of haproxy service
 

Disabling globally logging for haproxy in case if you don't need the log is being achieved by redirecting the log variable to /dev/null handler and to also mute the reoccurring alert, notice and info messages, that are produced in case of some extra ordinary events during start / stop of haproxy or during mising backends etc. you can send those messages to local0 and loca1 handlers which will be discarded later by rsyslogd configuration, for example thsi can be achieved with a configuration like:
 

global     log /dev/log    local0 info alert     log /dev/log    local1 notice alert  defaults log global mode http option httplog option dontlognull

 

<level>    is optional and can be specified to filter outgoing messages. By
           default, all messages are sent. If a level is specified, only
           messages with a severity at least as important as this level
           will be sent. An optional minimum level can be specified. If it
           is set, logs emitted with a more severe level than this one will
           be capped to this level. This is used to avoid sending "emerg"
           messages on all terminals on some default syslog configurations.
           Eight levels are known :
             emerg  alert  crit   err    warning notice info  debug

         

By using the log level you can also tell haproxy to omit from logging errors from log if for some reasons haproxy receives a lot of errors and this is flooding your logs, like this:

    backend Backend_Interface
  http-request set-log-level err
  no log


But sometimes you might need to disable it for a single frontend only and comes the question.


2. How to disable logging for a single frontend interface?

I thought that might be more complex but it was pretty easy with the option dontlog-normal haproxy.cfg variable:

Here is sample configuration with frontend and backend on how to instrucruct the haproxy frontend to disable all logging for the frontend
 

frontend ft_Frontend_Interface
#        log  127.0.0.1 local4 debug
        bind 10.44.192.142:12345
       
option dontlog-normal
        mode tcp
        option tcplog

              timeout client 350000
        log-format [%t]\ %ci:%cp\ %fi:%fp\ %b/%s:%sp\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
        default_backend bk_WLP_echo_port_service

backend bk_Backend_Interface
                        timeout server 350000
                        timeout connect 35000
        server serverhost1 10.10.192.12:12345 weight 1 check port 12345
        server serverhost2 10.10.192.13:12345 weight 3 check port 12345

 


As you can see from those config, we have also enabled as a check port 12345 which is the application port service if something goes wrong with the application and 12345 is not anymore responding the respective server will get excluded automatically by haproxy and only one of machines will serve, the weight tells it which server will have the preference to serve the traffic the weight ratio will be 1 request will end up on one machine and 3 requests on the other machine.


3. How to disable single configured backend to not log anything but still have a log for the frontend
 

Omit the use of option dontlog normal from frontend inside the backend just set  no log:

backend bk_Backend_Interface
                       
 no log
                        timeout server 350000
                        timeout connect 35000
        server serverhost1 10.10.192.12:12345 weight 1 check port 12345
        server serverhost2 10.10.192.13:12345 weight 3 check port 12345

That's all reload haproxy service on the machine and backend will no longer log to your default configured log file via the respective local0 – local6 handler.

List and fix failed systemd failed services after Linux OS upgrade and how to get full info about systemd service from jorunal log

Friday, February 25th, 2022

systemd-logo-unix-linux-list-failed-systemd-services

I have recently upgraded a number of machines from Debian 10 Buster to Debian 11 Bullseye. The update as always has some issues on some machines, such as problem with package dependencies, changing a number of external package repositories etc. to match che Bullseye deb packages. On some machines the update was less painful on others but the overall line was that most of the machines after the update ended up with one or more failed systemd services. It could be that some of the machines has already had this failed services present and I never checked them from the previous time update from Debian 9 -> Debian 10 or just some mess I've left behind in the hurry when doing software installation in the past. This doesn't matter anyways the fact was that I had to deal to a number of systemctl services which I managed to track by the Failed service mesage on system boot on one of the physical machines and on the OpenXen VTY Console the rest of Virtual Machines after update had some Failed messages. Thus I've spend some good amount of time like an overall of a day or two fixing strange failed services. This is how this small article was born in attempt to help sysadmins or any home Linux desktop users, who has updated his Debian Linux / Ubuntu or any other deb based distribution but due to the chaotic nature of Linux has ended with same strange Failed services and look for a way to find the source of the failures and get rid of the problems. 
Systemd is a very complicated system and in my many sysadmin opinion it makes more problems than it solves, but okay for today's people's megalomania mindset it matches well.

Systemd_components-systemd-journalctl-cgroups-loginctl-nspawn-analyze.svg

 

1. Check the journal for errors, running service irregularities and so on
 

First thing to do to track for errors, right after the update is to take some minutes and closely check,, the journalctl for any strange errors, even on well maintained Unix machines, this journal log would bring you to a problem that is not fatal but still some process or stuff is malfunctioning in the background that you would like to solve:
 

root@pcfreak:~# journalctl -x
Jan 10 10:10:01 pcfreak CRON[17887]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17887]: USER_END pid=17887 uid=0 auid=0 ses=340858 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17888]: CRED_DISP pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17888]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17888]: USER_END pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17884]: CRED_DISP pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17884]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17884]: USER_END pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17886]: CRED_DISP pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="www-data" exe="/usr/sbin/c>
Jan 10 10:10:01 pcfreak CRON[17886]: pam_unix(cron:session): session closed for user www-data
Jan 10 10:10:01 pcfreak audit[17886]: USER_END pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permi>
Jan 10 10:10:08 pcfreak NetworkManager[696]:  [1641802208.0899] device (eth1): carrier: link connected
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:19 pcfreak NetworkManager[696]:
 [1641802219.7920] device (eth1): carrier: link connected
Jan 10 10:10:19 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:20 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:22 pcfreak NetworkManager[696]:
 [1641802222.2772] device (eth1): carrier: link connected
Jan 10 10:10:22 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:23 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:33 pcfreak sshd[18142]: Unable to negotiate with 66.212.17.162 port 19255: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1,diff>
Jan 10 10:10:41 pcfreak NetworkManager[696]:
 [1641802241.0186] device (eth1): carrier: link connected
Jan 10 10:10:41 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx

If you want to only check latest journal log messages use the -x -e (pager catalog) opts

root@pcfreak;~# journalctl -xe

Feb 25 13:08:29 pcfreak audit[2284920]: USER_LOGIN pid=2284920 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='op=login acct=28696E76616C>
Feb 25 13:08:29 pcfreak sshd[2284920]: Received disconnect from 177.87.57.145 port 40927:11: Bye Bye [preauth]
Feb 25 13:08:29 pcfreak sshd[2284920]: Disconnected from invalid user ubuntuuser 177.87.57.145 port 40927 [preauth]

Next thing to after the update was to get a list of failed service only.


2. List all systemd failed check services which was supposed to be running

root@pcfreak:/root # systemctl list-units | grep -i failed
● certbot.service                                                                                                       loaded failed failed    Certbot
● logrotate.service                                                                                                     loaded failed failed    Rotate log files
● maldet.service                                                                                                        loaded failed failed    LSB: Start/stop maldet in monitor mode
● named.service                                                                                                         loaded failed failed    BIND Domain Name Server


Alternative way is with the –failed option

hipo@jeremiah:~$ systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
7 loaded units listed.

 

root@jeremiah:/etc/apt/sources.list.d#  systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon


To get a full list of objects of systemctl you can pass as state:
 

# systemctl –state=help
Full list of possible load states to pass is here
Show service properties


Check whether a service is failed or has other status and check default set systemd variables for it.

root@jeremiah~:# systemctl is-failed vboxweb.service
inactive

# systemctl show haproxy
Type=notify
Restart=always
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
SuccessExitStatus=143
MainPID=304858
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success

Full output of the above command is dumped in show_systemctl_properties.txt


3. List all running systemd services for a better overview on what's going on on machine
 

To get a list of all properly systemd loaded services you can use –state running.

hipo@jeremiah:~$ systemctl list-units –state running|head -n 10
  UNIT                              LOAD   ACTIVE SUB     DESCRIPTION
  proc-sys-fs-binfmt_misc.automount loaded active running Arbitrary Executable File Formats File System Automount Point
  cups.path                         loaded active running CUPS Scheduler
  init.scope                        loaded active running System and Service Manager
  session-2.scope                   loaded active running Session 2 of user hipo
  accounts-daemon.service           loaded active running Accounts Service
  anydesk.service                   loaded active running AnyDesk
  apache-htcacheclean.service       loaded active running Disk Cache Cleaning Daemon for Apache HTTP Server
  apache2.service                   loaded active running The Apache HTTP Server
  avahi-daemon.service              loaded active running Avahi mDNS/DNS-SD Stack

 

It is useful thing is to list all unit-files configured in systemd and their state, you can do it with:

 


root@pcfreak:~# systemctl list-unit-files
UNIT FILE                                                                 STATE           VENDOR PRESET
proc-sys-fs-binfmt_misc.automount                                         static          –            
-.mount                                                                   generated       –            
backups.mount                                                             generated       –            
dev-hugepages.mount                                                       static          –            
dev-mqueue.mount                                                          static          –            
media-cdrom0.mount                                                        generated       –            
mnt-sda1.mount                                                            generated       –            
proc-fs-nfsd.mount                                                        static          –            
proc-sys-fs-binfmt_misc.mount                                             disabled        disabled     
run-rpc_pipefs.mount                                                      static          –            
sys-fs-fuse-connections.mount                                             static          –            
sys-kernel-config.mount                                                   static          –            
sys-kernel-debug.mount                                                    static          –            
sys-kernel-tracing.mount                                                  static          –            
var-www.mount                                                             generated       –            
acpid.path                                                                masked          enabled      
cups.path                                                                 enabled         enabled      

 

 


root@pcfreak:~# systemctl list-units –type service –all
  UNIT                                   LOAD      ACTIVE   SUB     DESCRIPTION
  accounts-daemon.service                loaded    inactive dead    Accounts Service
  acct.service                           loaded    active   exited  Kernel process accounting
● alsa-restore.service                   not-found inactive dead    alsa-restore.service
● alsa-state.service                     not-found inactive dead    alsa-state.service
  apache2.service                        loaded    active   running The Apache HTTP Server
● apparmor.service                       not-found inactive dead    apparmor.service
  apt-daily-upgrade.service              loaded    inactive dead    Daily apt upgrade and clean activities
 apt-daily.service                      loaded    inactive dead    Daily apt download activities
  atd.service                            loaded    active   running Deferred execution scheduler
  auditd.service                         loaded    active   running Security Auditing Service
  auth-rpcgss-module.service             loaded    inactive dead    Kernel Module supporting RPCSEC_GSS
  avahi-daemon.service                   loaded    active   running Avahi mDNS/DNS-SD Stack
  certbot.service                        loaded    inactive dead    Certbot
  clamav-daemon.service                  loaded    active   running Clam AntiVirus userspace daemon
  clamav-freshclam.service               loaded    active   running ClamAV virus database updater
..

 


linux-systemd-components-diagram-linux-kernel-system-targets-systemd-libraries-daemons

 

4. Finding out more on why a systemd configured service has failed


Usually getting info about failed systemd service is done with systemctl status servicename.service
However, in case of troubles with service unable to start to get more info about why a service has failed with (-l) or (–full) options


root@pcfreak:~# systemctl -l status logrotate.service
● logrotate.service – Rotate log files
     Loaded: loaded (/lib/systemd/system/logrotate.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 00:00:06 EET; 13h ago
TriggeredBy: ● logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
    Process: 2045320 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=1/FAILURE)
   Main PID: 2045320 (code=exited, status=1/FAILURE)
        CPU: 2.479s

Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: For now we will assume you meant to write /32
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| ERROR: '0.0.0.0/0.0.0.0' needs to be replaced by the term 'all'.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| SECURITY NOTICE: Overriding config setting. Using 'all' instead.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: (B) '::/0' is a subnetwork of (A) '::/0'
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: because of this '::/0' is ignored to keep splay tree searching predictable
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: You should probably remove '::/0' from the ACL named 'all'
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Failed with result 'exit-code'.
Feb 25 00:00:06 pcfreak systemd[1]: Failed to start Rotate log files.
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Consumed 2.479s CPU time.


systemctl -l however is providing only the last log from message a started / stopped or whatever status service has generated. Sometimes systemctl -l servicename.service is showing incomplete the splitted error message as there is a limitation of line numbers on the console, see below

 

root@pcfreak:~# systemctl status -l certbot.service
● certbot.service – Certbot
     Loaded: loaded (/lib/systemd/system/certbot.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 09:28:33 EET; 4h 0min ago
TriggeredBy: ● certbot.timer
       Docs: file:///usr/share/doc/python-certbot-doc/html/index.html
             https://certbot.eff.org/docs
    Process: 290017 ExecStart=/usr/bin/certbot -q renew (code=exited, status=1/FAILURE)
   Main PID: 290017 (code=exited, status=1/FAILURE)
        CPU: 9.771s

Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using th>
Feb 25 09:28:33 pcfrxen certbot[290017]: All renewals failed. The following certificates could not be renewed:
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/mail.pcfreak.org-0003/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/www.eforia.bg-0005/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/zabbix.pc-freak.net/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]: 3 renew failure(s), 5 parse failure(s)
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Failed with result 'exit-code'.
Feb 25 09:28:33 pcfrxen systemd[1]: Failed to start Certbot.
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Consumed 9.771s CPU time.

 

5. Get a complete log of journal to make sure everything configured on server host runs as it should

Thus to get more complete list of the message and be able to later google and look if has come with a solution on the internet  use:

root@pcfrxen:~#  journalctl –catalog –unit=certbot

— Journal begins at Sat 2022-01-22 21:14:05 EET, ends at Fri 2022-02-25 13:32:01 EET. —
Jan 23 09:58:18 pcfrxen systemd[1]: Starting Certbot…
░░ Subject: A start job for unit certbot.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A start job for unit certbot.service has begun execution.
░░ 
░░ The job identifier is 5754.
Jan 23 09:58:20 pcfrxen certbot[124996]: Traceback (most recent call last):
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/renewal.py", line 71, in _reconstitute
Jan 23 09:58:20 pcfrxen certbot[124996]:     renewal_candidate = storage.RenewableCert(full_path, config)
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 471, in __init__
Jan 23 09:58:20 pcfrxen certbot[124996]:     self._check_symlinks()
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 537, in _check_symlinks

root@server:~# journalctl –catalog –unit=certbot|grep -i pluginerror|tail -1
Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using the manual plugin non-interactively.')


Or if you want to list and read only the last messages in the journal log regarding a service

root@server:~# journalctl –catalog –pager-end –unit=certbot


If you have disabled a failed service because you don't need it to run at all on the machine with:

root@rhel:~# systemctl stop rngd.service
root@rhel:~# systemctl disable rngd.service

And you want to clear up any failed service information that is kept in the systemctl service log you can do it with:
 

root@rhel:~# systemctl reset-failed

Another useful systemctl option is cat, you can use it to easily list a service it is useful to quickly check what is a service, an actual shortcut to save you from giving a full path to the service e.g. cat /lib/systemd/system/certbot.service

root@server:~# systemctl cat certbot
# /lib/systemd/system/certbot.service
[Unit]
Description=Certbot
Documentation=file:///usr/share/doc/python-certbot-doc/html/index.html
Documentation=https://certbot.eff.org/docs
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot -q renew
PrivateTmp=true


After failed SystemD services are fixed, it is best to reboot the machine and check put some more time to inspect rawly the complete journal log to make sure, no error  was left behind.


Closure
 

As you can see updating a machine from a major to a major version even if you follow the official documentation and you have plenty of experience is always more or a less a pain in the ass, which can eat up much of your time banging your head solving problems with failed daemons issues with /etc/rc.local (which I have faced becase of #/bin/sh -e (which would make /etc/rc.local) to immediately quit if any error from command $? returns different from 0 etc.. The  logical questions comes then;
1. Is it really worthy to update at all regularly, especially if you don't know of a famous major Vulnerability 🙂 ?
2. Or is it worthy to update from OS major release to OS major release at all?  
3. Or should you only try to patch the service that is exposed to an external reachable computer network or the internet only and still the the same OS release until End of Life (LTS = Long Term Support) as called in Debian or  End Of Life  (EOL) Cycle as called in RPM based distros the period until the OS major release your software distro has official security patches is reached.

Anyone could take any approach but for my own managed systems small network at home my practice was always to try to keep up2date everything every 3 or 6 months maximum. This has caused me multiple days of irritation and stress and perhaps many white hairs and spend nerves on shit.


4. Based on the company where I'm employed the better strategy is to patch to the EOL is still offered and keep the rule First Things First (FTF), once the EOL is reached, just make a copy of all servers data and configuration to external Data storage, bring up a new Physical or VM and migrate the services.
Test after the migration all works as expected if all is as it should be change the DNS records or Leading Infrastructure Proxies whatever to point to the new service and that's it! Yes it is true that migration based on a full OS reinstall is more time consuming and requires much more planning, but usually the result is much more expected, plus it is much less stressful for the guy doing the job.

CentOS 8 / Redhat 8 insert additional guests additions to VM to enable Fullscreen, Copy / Paste and Shared Folder from host OS

Monday, January 10th, 2022

virtualbox-guest-additions-install-on-centos-8.3-linux-oracle-logo

My experience with enabling virtualbox additions guest tools on many of the separate Linux distributions throughout time is pretty bad as it always is a pain in the ass to enable fully functional full screen and copy paste for Virtualbox…
 
For those who installed it for a first time vbox guest addition tools for Virtualbox are additional software components added so the Emulated Operating system
could allow better screen resolution and better mouse integration support.

So far I've installed virtualbox additions tools to CentOS 7 and Debian Linux various releases and faced complications there as well.
Few days ago my colleague Georgi Stoyanov have installed CentOS 8.3 with current version of VirtualBox 6.1 (vesrsion from beginning of 2022) and he has also shared had issues with enabling the CentOS 8.3 Linux to work with guestadditions but eventually found a resolution.

Thus he has shared with me the solution and I share it with you, so hopefully someone else could enable Guesttools on his CentOS 8.3 with less digging online.
The error received is:

# ./VBoxLinuxAdditions.run

Trying to install Guest Additions in RHEL 8.3.

VirtualBox Guest Additions: Starting.
VirtualBox Guest Additions: Building the VirtualBox Guest Additions kernel
modules. This may take a while.
VirtualBox Guest Additions: To build modules for other installed kernels, run
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup
VirtualBox Guest Additions: or
VirtualBox Guest Additions: /sbin/rcvboxadd quicksetup all
VirtualBox Guest Additions: Building the modules for kernel
4.18.0-193.el8.x86_64.

VirtualBox Guest Additions: Look at /var/log/vboxadd-setup.log to find out what
went wrong
ValueError: File context for /opt/VBoxGuestAdditions-6.0.20/other/mount.vboxsf already defined
VirtualBox Guest Additions: Running kernel modules will not be replaced until
the system is restarted
Press Return to close this window…

No idea what to do next. Been trying for sometime.


To enable guestaddtions in CentOS 8.3, e.g. get arount the error you have to:


1. Install all necessery dependncies RPMs required by GuestAddition tools

 

# dnf install tar bzip2 kernel-devel-$(uname -r) kernel-headers perl gcc make elfutils-libelf-devel

# dnf -y install gcc automake make kernel-headers dkms bzip2 libxcrypt-compat kernel-devel perl

2.  Run below semanage and restorecon commands

 

# semanage fcontext -d /opt/VBoxGuestAdditions-/other/mount.vboxsf
# restorecon /opt/VBoxGuestAdditions-/other/mount.vboxsf

 

3.  Insert Virtualbox guest additions ISO and Run it

 

centos-insert-guest-additions-linux-virtualbox-screenshot
 

Devices -> Insert Guest Additions CD Image

 

Click Run button to exec Vbox_GAs_6.0.18 script or run it manually

Run-Guest-Additions-screenshot-virtualbox-centos-8

or mount it manually with mount command and execute the VBoxLinuxAdditions.run to do so:

 

$ cd /run/media/`whoami`/VB*
$ su
# ./VBoxLinuxAdditions.run
Installing additional modules …
VirtualBox Guest Additions: Building the VirtualBox Guest Additions kernel modules.  This may take a while.
VirtualBox Guest Additions: Running kernel modules will not be replaced until the system is restarted
VirtualBox Guest Additions: Starting.

 

4. Reboot the VM
 

# reboot

5. Check and Confirm Virtualbox guest additions are properly installed and running
 

# lsmod | grep vbox

 

6. Enable Copy / Paste from to Virttual Machine e.g. Shared Clipboard / Shared Folder etc.

 

Share-Clipboard-in-Virtualbox-screenshot-centos-8

 

The three options most useful besides the support for FullScreen OS emulation by Virtualbox to enable right after
guesttools is on are:


1. Devices -> Shared Clipboard -> Bidirectional
2. Devices -> Drag and Drop -> Bidirectional
3. Devices -> Shared Folders -> Shared Folder Settings

 

Zabbix rkhunter monitoring check if rootkits trojans and viruses or suspicious OS activities are detected

Wednesday, December 8th, 2021

monitor-rkhunter-with-zabbix-zabbix-rkhunter-check-logo

If you're using rkhunter to monitor for malicious activities, a binary changes, rootkits, viruses, malware, suspicious stuff and other famous security breach possible or actual issues, perhaps you have configured your machines to report to some Email.
But what if you want to have a scheduled rkhunter running on the machine and you don't want to count too much on email alerting (especially because email alerting) makes possible for emails to be tracked by sysadmin pretty late?

We have been in those situation and in this case me and my dear colleague Georgi Stoyanov developed a small rkhunter Zabbix userparameter check to track and Alert if any traces of "Warning"''s are mateched in the traditional rkhunter log file /var/log/rkhunter/rkhunter.log

To set it up and use it is pretty use you will need to have a recent version of zabbix-agent installed on the machine and connected to a Zabbix server, in my case this is:

[root@centos ~]# rpm -qa |grep -i zabbix-agent
zabbix-agent-4.0.7-1.el7.x86_64

 placed inside /etc/zabbix/zabbix_agentd.d/userparameter_rkhunter_warning_check.conf

[root@centos /etc/zabbix/zabbix_agentd.d ]# cat userparameter_rkhunter_warning_check.conf
# userparameter script to check if any Warning is inside /var/log/rkhunter/rkhunter.log and if found to trigger Zabbix alert
UserParameter=rkhunter.warning, (TODAY=$(date |awk '{ print $1" "$2" "$3 }'); if [ $(cat /var/log/rkhunter/rkhunter.log | awk “/$TODAY/,EOF” | /bin/grep -i ‘\[ Warning \]’ | /usr/bin/wc -l) != ‘0’ ]; then echo 1; else echo 0; fi)
UserParameter=rkhunter.suspected,(/bin/grep -i 'Suspect files: ' /var/log/rkhunter/rkhunter.log|tail -n 1| awk '{ print $4 }')
UserParameter=rkhunter.rootkits,(/bin/grep -i 'Possible rootkits: ' /var/log/rkhunter/rkhunter.log|tail -n 1| awk '{ print $4 }')


2. Prepare Rkhunter Template, Triggers and Items


In Zabbix Server that you access from web control interface, you will have to prepare a new template called lets say Rkhunter with the necessery Triggers and Items


2.1 Create Rkhunter Items
 

On Zabbix Server side, uou will have to configure 3 Items for the 3 configured userparameter above script keys, like so:

rkhunter-items-zabbix-screenshot.png

  • rkhunter.suspected Item configuration


rkhunter-suspected-files

  • rkhunter.warning Zabbix Item config

rkhunter-warning-found-check-zabbix

  • rkhunter.rootkits Zabbix Item config

     

     

    rkhunter-zabbix-possible-rootkits-item1

    2.2 Create Triggers
     

You need to have an overall of 3 triggers like in below shot:

zabbix-rkhunter-all-triggers-screenshot
 

  • rkhunter.rootkits Trigger config

rkhunter-rootkits-trigger-zabbix1

  • rkhunter.suspected Trigger cfg

rkhunter-suspected-trigger-zabbix1

  • rkhunter warning Trigger cfg

rkhunter-warning-trigger1

3. Reload zabbix-agent and test the keys


It is necessery to reload zabbix agent for the new userparameter to start to be sent to remote zabbix server (through a proxy if you have one configured).

[root@centos ~]# systemctl restart zabbix-agent


To make the zabbix-agent send the keys to the server you can use zabbix_sender to have the test tool you will have to have installed (zabbix-sender) on the server.

To trigger a manualTest if you happen to have some problems with the key which shouldn''t be the case you can sent a value to the respectve key with below command:

[root@centos ~ ]# zabbix_sender -vv -c "/etc/zabbix/zabbix_agentd.conf" -k "khunter.warning" -o "1"


Check on Zabbix Server the sent value is received, for any oddities as usual check what is inside  /var/log/zabbix/zabbix_agentd.log for any errors or warnings.

Apache disable requests to not log to access.log Logfile through SetEnvIf and dontlog httpd variables

Monday, October 11th, 2021

apache-disable-certain-strings-from-logging-to-access-log-logo

Logging to Apache access.log is mostly useful as this is a great way to keep log on who visited your website and generate periodic statistics with tools such as Webalizer or Astats to keep track on your visitors and generate various statistics as well as see the number of new visitors as well most visited web pages (the pages which mostly are attracting your web visitors), once the log analysis tool generates its statistics, it can help you understand better which Web spiders visit your website the most (as spiders has a predefined) IP addresses, which can give you insight on various web spider site indexation statistics on Google, Yahoo, Bing etc. . Sometimes however either due to bugs in web spiders algorithms or inconsistencies in your website structure, some of the web pages gets double visited records inside the logs, this could happen for example if your website uses to include iframes.

Having web pages accessed once but logged to be accessed twice hence is erroneous and unwanted, and though that usually have to be fixed by the website programmers, if such approach is not easily doable in the moment and the website is running on critical production system, the double logging of request can be omitted thanks to a small Apache log hack with SetEnvIf Apache config directive. Even if there is no double logging inside Apache log happening it could be that some cron job or automated monitoring scripts or tool such as monit is making periodic requests to Apache and this is garbling your Log Statistics results.

In this short article hence I'll explain how to do remove certain strings to not get logged inside /var/log/httpd/access.log.

1. Check SetEnvIf is Loaded on the Webserver
 

On CentOS / RHEL Linux:

# /sbin/apachectl -M |grep -i setenvif
AH00558: httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain. Set the 'ServerName' directive globally to suppress this message
 setenvif_module (shared)


On Debian / Ubuntu Linux:

/usr/sbin/apache2ctl -M |grep -i setenvif
AH00548: NameVirtualHost has no effect and will be removed in the next release /etc/apache2/sites-enabled/000-default.conf:1
 setenvif_module (shared)


2. Using SetEnvIf to omit certain string to get logged inside apache access.log


SetEnvIf could be used either in some certain domain VirtualHost configuration (if website is configured so), or it can be set as a global Apache rule from the /etc/httpd/conf/httpd.conf 

To use SetEnvIf  you have to place it inside a <Directory …></Directory> configuration block, if it has to be enabled only for a Certain Apache configured directory, otherwise you have to place it in the global apache config section.

To be able to use SetEnvIf, only in a certain directories and subdirectories via .htaccess, you will have defined in <Directory>

AllowOverride FileInfo


The general syntax to omit a certain Apache repeating string from keep logging with SetEnvIf is as follows:
 

SetEnvIf Request_URI "^/WebSiteStructureDirectory/ACCESS_LOG_STRING_TO_REMOVE$" dontlog


General syntax for SetEnvIf is as follows:

SetEnvIf attribute regex env-variable

SetEnvIf attribute regex [!]env-variable[=value] [[!]env-variable[=value]] …

Below is the overall possible attributes to pass as described in mod_setenvif official documentation.
 

  • Host
  • User-Agent
  • Referer
  • Accept-Language
  • Remote_Host: the hostname (if available) of the client making the request.
  • Remote_Addr: the IP address of the client making the request.
  • Server_Addr: the IP address of the server on which the request was received (only with versions later than 2.0.43).
  • Request_Method: the name of the method being used (GET, POST, etc.).
  • Request_Protocol: the name and version of the protocol with which the request was made (e.g., "HTTP/0.9", "HTTP/1.1", etc.).
  • Request_URI: the resource requested on the HTTP request line – generally the portion of the URL following the scheme and host portion without the query string.

Next locate inside the configuration the line:

CustomLog /var/log/apache2/access.log combined


To enable filtering of included strings, you'll have to append env=!dontlog to the end of line.

 

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

You might be using something as cronolog for log rotation to prevent your WebServer logs to become too big in size and hard to manage, you can append env=!dontlog to it in same way.

If you haven't used cronolog is it is perhaps best to show you the package description.

server:~# apt-cache show cronolog|grep -i description -A10 -B5
Version: 1.6.2+rpk-2
Installed-Size: 63
Maintainer: Debian QA Group <packages@qa.debian.org>
Architecture: amd64
Depends: perl:any, libc6 (>= 2.4)
Description-en: Logfile rotator for web servers
 A simple program that reads log messages from its input and writes
 them to a set of output files, the names of which are constructed
 using template and the current date and time.  The template uses the
 same format specifiers as the Unix date command (which are the same
 as the standard C strftime library function).
 .
 It intended to be used in conjunction with a Web server, such as
 Apache, to split the access log into daily or monthly logs:
 .
   TransferLog "|/usr/bin/cronolog /var/log/apache/%Y/access.%Y.%m.%d.log"
 .
 A cronosplit script is also included, to convert existing
 traditionally-rotated logs into this rotation format.

Description-md5: 4d5734e5e38bc768dcbffccd2547922f
Homepage: http://www.cronolog.org/
Tag: admin::logging, devel::lang:perl, devel::library, implemented-in::c,
 implemented-in::perl, interface::commandline, role::devel-lib,
 role::program, scope::utility, suite::apache, use::organizing,
 works-with::logfile
Section: web
Priority: optional
Filename: pool/main/c/cronolog/cronolog_1.6.2+rpk-2_amd64.deb
Size: 27912
MD5sum: 215a86766cc8d4434cd52432fd4f8fe7

If you're using cronolog to daily rotate the access.log and you need to filter out the strings out of the logs, you might use something like in httpd.conf:

 

CustomLog "|/usr/bin/cronolog –symlink=/var/log/httpd/access.log /var/log/httpd/access.log_%Y_%m_%d" combined env=!dontlog


 

3. Disable Apache logging access.log from certain USERAGENT browser
 

You can do much more with SetEnvIf for example you might want to omit logging requests from a UserAgent (browser) to end up in /dev/null (nowhere), e.g. prevent any Website requests originating from Internet Explorer (MSIE) to not be logged.

SetEnvIf User_Agent "(MSIE)" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


4. Disable Apache logging from requests coming from certain FQDN (Fully Qualified Domain Name) localhost 127.0.0.1 or concrete IP / IPv6 address

SetEnvIf Remote_Host "dns.server.com$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


Of course for this to work, your website should have a functioning DNS servers and Apache should be configured to be able to resolve remote IPs to back resolve to their respective DNS defined Hostnames.

SetEnvIf recognized also perl PCRE Regular Expressions, if you want to filter out of Apache access log requests incoming from multiple subdomains starting with a certain domain hostname.

 

SetEnvIf Remote_Host "^example" dontlog

– To not log anything coming from localhost.localdomain address ( 127.0.0.1 ) as well as from some concrete IP address :

SetEnvIf Remote_Addr "127\.0\.0\.1" dontlog

SetEnvIf Remote_Addr "192\.168\.1\.180" dontlog

– To disable IPv6 requests that be coming at the log even though you don't happen to use IPv6 at all

SetEnvIf Request_Addr "::1" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog


– Note here it is obligatory to escape the dots '.'


5. Disable robots.txt Web Crawlers requests from being logged in access.log

SetEnvIf Request_URI "^/robots\.txt$" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

Using SetEnvIfNoCase to read incoming useragent / Host / file requests case insensitve

The SetEnvIfNoCase is to be used if you want to threat incoming originators strings as case insensitive, this is useful to omit extraordinary regular expression SetEnvIf rules for lower upper case symbols.

SetEnvIFNoCase User-Agent "Slurp/cat" dontlog
SetEnvIFNoCase User-Agent "Ask Jeeves/Teoma" dontlog
SetEnvIFNoCase User-Agent "Googlebot" dontlog
SetEnvIFNoCase User-Agent "bingbot" dontlog
SetEnvIFNoCase Remote_Host "fastsearch.net$" dontlog

Omit from access.log logging some standard web files .css , .js .ico, .gif , .png and Referrals from own domain

Sometimes your own site scripts do refer to stuff on your own domain that just generates junks in the access.log to keep it off.

SetEnvIfNoCase Request_URI "\.(gif)|(jpg)|(png)|(css)|(js)|(ico)|(eot)$" dontlog

 

SetEnvIfNoCase Referer "www\.myowndomain\.com" dontlog

CustomLog /var/log/apache2/access.log combined env=!dontlog

 

6. Disable Apache requests in access.log and error.log completely


Sometimes at rare cases the produced Apache logs and error log is really big and you already have the requests logged in another F5 Load Balancer or Haproxy in front of Apache WebServer or alternatively the logging is not interesting at all as the Web Application served written in ( Perl / Python / Ruby ) does handle the logging itself. 
I've earlier described how this is done in a good amount of details in previous article Disable Apache access.log and error.log logging on Debian Linux and FreeBSD

To disable it you will have to comment out CustomLog or set it to together with ErrorLog to /dev/null in apache2.conf / httpd.conf (depending on the distro)
 

CustomLog /dev/null
ErrorLog /dev/null


7. Restart Apache WebServer to load settings
 

An important to mention is in case you have Webserver with multiple complex configurations and there is a specific log patterns to omit from logs it might be a very good idea to:

a. Create /etc/httpd/conf/dontlog.conf / etc/apache2/dontlog.conf
add inside all your custom dontlog configurations
b. Include dontlog.conf from /etc/httpd/conf/httpd.conf / /etc/apache2/apache2.conf

Finally to make the changes take affect, of course you will need to restart Apache webserver depending on the distro and if it is with systemd or System V:

For systemd RPM based distro:

systemctl restart httpd

or for Deb based Debian etc.

systemctl apache2 restart

On old System V scripts systems:

On RedHat / CentOS etc. restart Apache with:
 

/etc/init.d/httpd restart


On Deb based SystemV:
 

/etc/init.d/apache2 restart


What we learned ?
 

We have learned about SetEnvIf how it can be used to prevent certain requests strings getting logged into access.log through dontlog, how to completely stop certain browser based on a useragent from logging to the access.log as well as how to omit from logging certain requests incoming from certain IP addresses / IPv6 or FQDNs and how to stop robots.txt from being logged to httpd log.


Finally we have learned how to completely disable Apache logging if logging is handled by other external application.
 

How to calculate connections from IP address with shell script and log to Zabbix graphic

Thursday, March 11th, 2021

We had to test the number of connections incoming IP sorted by its TCP / IP connection state.

For example:

TIME_WAIT, ESTABLISHED, LISTEN etc.


The reason behind is sometimes the IP address '192.168.0.1' does create more than 200 connections, a Cisco firewall gets triggered and the connection for that IP is filtered out. To be able to know in advance that this problem is upcoming. a Small userparameter script is set on the Linux servers, that does print out all connections from IP by its STATES sorted out.

 

The script is calc_total_ip_match_zabbix.sh is below:

#!/bin/bash
#  check ESTIMATED / FIN_WAIT etc. netstat output for IPs and calculate total
# UserParameter=count.connections,(/usr/local/bin/calc_total_ip_match_zabbix.sh)
CHECK_IP='192.168.0.1';
f=0; 

 

for i in $(netstat -nat | grep "$CHECK_IP" | awk '{print $6}' | sort | uniq -c | sort -n); do

echo -n "$i ";
f=$((f+i));
done;
echo
echo "Total: $f"

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
1 TIME_WAIT 2 ESTABLISHED 3 LISTEN 

Total: 6

 

root@pcfreak:/bashscripts# ./calc_total_ip_match_zabbix.sh 
2 ESTABLISHED 3 LISTEN 
Total: 5


images/zabbix-webgui-connection-check1

To make process with Zabbix it is necessery to have an Item created and a Depedent Item.

 

webguiconnection-check1

webguiconnection-check1
 

webgui-connection-check2-item

images/webguiconnection-check1

Finally create a trigger to trigger alarm if you have more than or eqaul to 100 Total overall connections.


images/zabbix-webgui-connection-check-trigger

The Zabbix userparameter script should be as this:

[root@host: ~]# cat /etc/zabbix/zabbix_agentd.d/userparameter_webgui_conn.conf
UserParameter=count.connections,(/usr/local/bin/webgui_conn_track.sh)

 

Some collleagues suggested more efficient shell script solution for suming the overall number of connections, below is less time consuming version of script, that can be used for the calculation.
 

#!/bin/bash -x
# show FIN_WAIT2 / ESTIMATED etc. and calcuate total
count=$(netstat -n | grep "192.168.0.1" | awk ' { print $6 } ' | sort -n | uniq -c | sort -nr)
total=$((${count// /+}))
echo "$count"
echo "Total:" "$total"

      2 ESTABLISHED
      1 TIME_WAIT
Total: 3

 


Below is the graph built with Zabbix showing all the fluctuations from connections from monitored IP. ebgui-check_ip_graph

 

How to enable HaProxy logging to a separate log /var/log/haproxy.log / prevent HAProxy duplicate messages to appear in /var/log/messages

Wednesday, February 19th, 2020

haproxy-logging-basics-how-to-log-to-separate-file-prevent-duplicate-messages-haproxy-haproxy-weblogo-squares
haproxy  logging can be managed in different form the most straight forward way is to directly use /dev/log either you can configure it to use some log management service as syslog or rsyslogd for that.

If you don't use rsyslog yet to install it: 

# apt install -y rsyslog

Then to activate logging via rsyslogd we can should add either to /etc/rsyslogd.conf or create a separte file and include it via /etc/rsyslogd.conf with following content:
 

Enable haproxy logging from rsyslogd


Log haproxy messages to separate log file you can use some of the usual syslog local0 to local7 locally used descriptors inside the conf (be aware that if you try to use some wrong value like local8, local9 as a logging facility you will get with empty haproxy.log, even though the permissions of /var/log/haproxy.log are readable and owned by haproxy user.

When logging to a local Syslog service, writing to a UNIX socket can be faster than targeting the TCP loopback address. Generally, on Linux systems, a UNIX socket listening for Syslog messages is available at /dev/log because this is where the syslog() function of the GNU C library is sending messages by default. To address UNIX socket in haproxy.cfg use:

log /dev/log local2 


If you want to log into separate log each of multiple running haproxy instances with different haproxy*.cfg add to /etc/rsyslog.conf lines like:

local2.* -/var/log/haproxylog2.log
local3.* -/var/log/haproxylog3.log


One important note to make here is since rsyslogd is used for haproxy logging you need to have enabled in rsyslogd imudp and have a UDP port listener on the machine.

E.g. somewhere in rsyslog.conf or via rsyslog include file from /etc/rsyslog.d/*.conf needs to have defined following lines:

$ModLoad imudp
$UDPServerRun 514


I prefer to use external /etc/rsyslog.d/20-haproxy.conf include file that is loaded and enabled rsyslogd via /etc/rsyslog.conf:

# vim /etc/rsyslog.d/20-haproxy.conf

$ModLoad imudp
$UDPServerRun 514​
local2.* -/var/log/haproxy2.log


It is also possible to produce different haproxy log output based on the severiy to differentiate between important and less important messages, to do so you'll need to rsyslog.conf something like:
 

# Creating separate log files based on the severity
local0.* /var/log/haproxy-traffic.log
local0.notice /var/log/haproxy-admin.log

 

Prevent Haproxy duplicate messages to appear in /var/log/messages

If you use local2 and some default rsyslog configuration then you will end up with the messages coming from haproxy towards local2 facility producing doubled simultaneous records to both your pre-defined /var/log/haproxy.log and /var/log/messages on Proxy servers that receive few thousands of simultanous connections per second.
This is a problem since doubling the log will produce too much data and on systems with smaller /var/ partition you will quickly run out of space + this haproxy requests logging to /var/log/messages makes the file quite unreadable for normal system events which are so important to track clearly what is happening on the server daily.

To prevent the haproxy duplicate messages you need to define somewhere in rsyslogd usually /etc/rsyslog.conf local2.none near line of facilities configured to log to file:

*.info;mail.none;authpriv.none;cron.none;local2.none     /var/log/messages

This configuration should work but is more rarely used as most people prefer to have haproxy log being written not directly to /dev/log which is used by other services such as syslogd / rsyslogd.

To use /dev/log to output logs from haproxy configuration in global section use config like:
 

global
        log /dev/log local2 debug
        chroot /var/lib/haproxy
        stats socket /run/haproxy/admin.sock mode 660 level admin
        stats timeout 30s
        user haproxy
        group haproxy
        daemon

The log global directive basically says, use the log line that was set in the global section for whole config till end of file. Putting a log global directive into the defaults section is equivalent to putting it into all of the subsequent proxy sections.

Using global logging rules is the most common HAProxy setup, but you can put them directly into a frontend section instead. It can be useful to have a different logging configuration as a one-off. For example, you might want to point to a different target Syslog server, use a different logging facility, or capture different severity levels depending on the use case of the backend application. 

Insetad of using /dev/log interface that is on many distributions heavily used by systemd to store / manage and distribute logs,  many haproxy server sysadmins nowdays prefer to use rsyslogd as a default logging facility that will manage haproxy logs.
Admins prefer to use some kind of mediator service to manage log writting such as rsyslogd or syslog, the reason behind might vary but perhaps most important reason is  by using rsyslogd it is possible to write logs simultaneously locally on disk and also forward logs  to a remote Logging server  running rsyslogd service.

Logging is defined in /etc/haproxy/haproxy.cfg or the respective configuration through global section but could be also configured to do a separate logging based on each of the defined Frontend Backends or default section. 
A sample exceprt from this section looks something like:

#———————————————————————
# Global settings
#———————————————————————
global
    log         127.0.0.1 local2

    chroot      /var/lib/haproxy
    pidfile     /var/run/haproxy.pid
    maxconn     4000
    user        haproxy
    group       haproxy
    daemon

    # turn on stats unix socket
    stats socket /var/lib/haproxy/stats

#———————————————————————
defaults
    mode                    tcp
    log                     global
    option                  tcplog
    #option                  dontlognull
    #option http-server-close
    #option forwardfor       except 127.0.0.0/8
    option                  redispatch
    retries                 7
    #timeout http-request    10s
    timeout queue           10m
    timeout connect         30s
    timeout client          20m
    timeout server          10m
    #timeout http-keep-alive 10s
    timeout check           30s
    maxconn                 3000

 

 

# HAProxy Monitoring Config
#———————————————————————
listen stats 192.168.0.5:8080                #Haproxy Monitoring run on port 8080
    mode http
    option httplog
    option http-server-close
    stats enable
    stats show-legends
    stats refresh 5s
    stats uri /stats                            #URL for HAProxy monitoring
    stats realm Haproxy\ Statistics
    stats auth hproxyauser:Password___          #User and Password for login to the monitoring dashboard

 

#———————————————————————
# frontend which proxys to the backends
#———————————————————————
frontend ft_DKV_PROD_WLPFO
    mode tcp
    bind 192.168.233.5:30000-31050
    option tcplog
    log-format %ci:%cp\ [%t]\ %ft\ %b/%s\ %Tw/%Tc/%Tt\ %B\ %ts\ %ac/%fc/%bc/%sc/%rc\ %sq/%bq
    default_backend Default_Bakend_Name


#———————————————————————
# round robin balancing between the various backends
#———————————————————————
backend bk_DKV_PROD_WLPFO
    mode tcp
    # (0) Load Balancing Method.
    balance source
    # (4) Peer Sync: a sticky session is a session maintained by persistence
    stick-table type ip size 1m peers hapeers expire 60m
    stick on src
    # (5) Server List
    # (5.1) Backend
    server Backend_Server1 10.10.10.1 check port 18088
    server Backend_Server2 10.10.10.2 check port 18088 backup


The log directive in above config instructs HAProxy to send logs to the Syslog server listening at 127.0.0.1:514. Messages are sent with facility local2, which is one of the standard, user-defined Syslog facilities. It’s also the facility that our rsyslog configuration is expecting. You can add more than one log statement to send output to multiple Syslog servers.

Once rsyslog and haproxy logging is configured as a minumum you need to restart rsyslog (assuming that haproxy config is already properly loaded):

# systemctl restart rsyslogd.service

To make sure rsyslog reloaded successfully:

systemctl status rsyslogd.service


Restarting HAproxy

If the rsyslogd logging to 127.0.0.1 port 514 was recently added a HAProxy restart should also be run, you can do it with:
 

# /usr/sbin/haproxy -f /etc/haproxy/haproxy.cfg -D -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)


Or to restart use systemctl script (if haproxy is not used in a cluster with corosync / heartbeat).

# systemctl restart haproxy.service

You can control how much information is logged by adding a Syslog level by

    log         127.0.0.1 local2 info


The accepted values are the standard syslog security level severity:

Value Severity Keyword Deprecated keywords Description Condition
0 Emergency emerg panic System is unusable A panic condition.
1 Alert alert   Action must be taken immediately A condition that should be corrected immediately, such as a corrupted system database.
2 Critical crit   Critical conditions Hard device errors.
3 Error err error Error conditions  
4 Warning warning warn Warning conditions  
5 Notice notice   Normal but significant conditions Conditions that are not error conditions, but that may require special handling.
6 Informational info   Informational messages  
7 Debug debug   Debug-level messages Messages that contain information normally of use only when debugging a program.

 

Logging only errors / timeouts / retries and errors is done with option:

Note that if the rsyslog is configured to listen on different port for some weird reason you should not forget to set the proper listen port, e.g.:
 

  log         127.0.0.1:514 local2 info

option dontlog-normal

in defaults or frontend section.

You most likely want to enable this only during certain times, such as when performing benchmarking tests.

(or log-format-sd for structured-data syslog) directive in your defaults or frontend
 

Haproxy Logging shortly explained


The type of logging you’ll see is determined by the proxy mode that you set within HAProxy. HAProxy can operate either as a Layer 4 (TCP) proxy or as Layer 7 (HTTP) proxy. TCP mode is the default. In this mode, a full-duplex connection is established between clients and servers, and no layer 7 examination will be performed. When in TCP mode, which is set by adding mode tcp, you should also add option tcplog. With this option, the log format defaults to a structure that provides useful information like Layer 4 connection details, timers, byte count and so on.

Below is example of configured logging with some explanations:

Log-format "%ci:%cp [%t] %ft %b/%s %Tw/%Tc/%Tt %B %ts %ac/%fc/%bc/%sc/%rc %sq/%bq"

haproxy-logged-fields-explained
Example of Log-Format configuration as shown above outputted of haproxy config:

Log-format "%ci:%cp [%tr] %ft %b/%s %TR/%Tw/%Tc/%Tr/%Ta %ST %B %CC %CS %tsc %ac/%fc/%bc/%sc/%rc %sq/%bq %hr %hs %{+Q}r"

haproxy_http_log_format-explained1

To understand meaning of this abbreviations you'll have to closely read  haproxy-log-format.txt. More in depth info is to be found in HTTP Log format documentation


haproxy_logging-explained

Logging HTTP request headers

HTTP request header can be logged via:
 

 http-request capture

frontend website
    bind :80
    http-request capture req.hdr(Host) len 10
    http-request capture req.hdr(User-Agent) len 100
    default_backend webservers


The log will show headers between curly braces and separated by pipe symbols. Here you can see the Host and User-Agent headers for a request:

192.168.150.1:57190 [20/Dec/2018:22:20:00.899] website~ webservers/server1 0/0/1/0/1 200 462 – – —- 1/1/0/0/0 0/0 {mywebsite.com|Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Ubuntu Chromium/71.0.3578.80 } "GET / HTTP/1.1"

 

Haproxy Stats Monitoring Web interface


Haproxy is having a simplistic stats interface which if enabled produces some general useful information like in above screenshot, through which
you can get a very basic in browser statistics and track potential issues with the proxied traffic for all configured backends / frontends incoming outgoing
network packets configured nodes
 experienced downtimes etc.

haproxy-statistics-report-picture

The basic configuration to make the stats interface accessible would be like pointed in above config for example to enable network listener on address
 

https://192.168.0.5:8080/stats


with hproxyuser / password config would be:

# HAProxy Monitoring Config
#———————————————————————
listen stats 192.168.0.5:8080                #Haproxy Monitoring run on port 8080
    mode http
    option httplog
    option http-server-close
    stats enable
    stats show-legends
    stats refresh 5s
    stats uri /stats                            #URL for HAProxy monitoring
    stats realm Haproxy\ Statistics
    stats auth hproxyauser:Password___          #User and Password for login to the monitoring dashboard

 

 

Sessions states and disconnect errors on new application setup

Both TCP and HTTP logs include a termination state code that tells you the way in which the TCP or HTTP session ended. It’s a two-character code. The first character reports the first event that caused the session to terminate, while the second reports the TCP or HTTP session state when it was closed.

Here are some essential termination codes to track in for in the log:
 

Here are some termination code examples most commonly to see on TCP connection establishment errors:

Two-character code    Meaning
—    Normal termination on both sides.
cD    The client did not send nor acknowledge any data and eventually timeout client expired.
SC    The server explicitly refused the TCP connection.
PC    The proxy refused to establish a connection to the server because the process’ socket limit was reached while attempting to connect.


To get all non-properly exited codes the easiest way is to just grep for anything that is different from a termination code –, like that:

tail -f /var/log/haproxy.log | grep -v ' — '


This should output in real time every TCP connection that is exiting improperly.

There’s a wide variety of reasons a connection may have been closed. Detailed information about all possible termination codes can be found in the HAProxy documentation.
To get better understanding a very useful reading to haproxy Debug errors with  is in haproxy-logging.txt in that small file are collected all the cryptic error messages codes you might find in your logs when you're first time configuring the Haproxy frontend / backend and the backend application behind.

Another useful analyze tool which can be used to analyze Layer 7 HTTP traffic is halog for more on it just google around.

Why du and df reporting different on a filesystem / How to fix inconsistency between used space on FS and disk showing full strangeness

Wednesday, July 24th, 2019

linux-why-du-and-df-shows-different-result-inconsincy-explained-filesystem-full-oddity

If you're a sysadmin on a large server environment such as a couple of hundred of Virtual Machines running Linux OS on either physical host or OpenXen / VmWare hosted guest Virtual Machine, you might end up sometimes at an odd case where some mounted partition mount point reports its file use different when checked with
df
cmd than when checked with du command, like for example:
 

root@sqlserver:~# df -hT /var/lib/mysql
Filesystem   Type  Size Used Avail Use% Mounted On
/dev/sdb5      ext4    19G  3,4G    14G  20% /var/lib/mysql

Here the '-T' argument is used to show us the filesystem.

root@sqlserver:~# du -hsc /var/lib/mysql
0K    /var/lib/mysql/
0K    total

 

1. Simple debug on what might be the root cause for df / du inconsistency reporting

 

Of course the basic thing to do when in that weird situation is to be totally shocked how this is possible and to investigate a bit what is the biggest first level sub-directories that eat up the space on the mounted location, with du:

 

# du -hkx –max-depth=1 /var/lib/mysql/|uniq|sort -n
4       /var/lib/mysql/test
8       /var/lib/mysql/ezmlm
8       /var/lib/mysql/micropcfreak
8       /var/lib/mysql/performance_schema
12      /var/lib/mysql/mysqltmp
24      /var/lib/mysql/speedtest
64      /var/lib/mysql/yourls
144     /var/lib/mysql/narf
320     /var/lib/mysql/webchat_plus
424     /var/lib/mysql/goodfaithair
528     /var/lib/mysql/moonman
648     /var/lib/mysql/daniel
852     /var/lib/mysql/lessn
1292    /var/lib/mysql/gallery

The given output is in Kilobytes so it is a little bit hard to read, if you're used to Mbytes instead, do

 

 # du -hmx –max-depth=1 /var/lib/mysql/|uniq|sort -n|less

 

I've also investigated on the complete /var directory contents sorted by size with:

 

 # du -akx ./ | sort -n
5152564    ./cache/rsnapshot/hourly.2/localhost
5255788    ./cache/rsnapshot/hourly.2
5287912    ./cache/rsnapshot
7192152    ./cache


Even after finding out the bottleneck dirs and trying to clear up a bit, continued facing that inconsistently shown in two commands and if you're likely to be stunned like me and try … to move some files to a different filesystem to free up space or assigned inodes with a hope that shown inconsitency output will be fixed as it might be caused  due to some kernel / FS caching ?? and this will eventually make the mounted FS to refresh …

But unfortunately, if you try it you'll figure out clearing up a couple of Megas or Gigas will make no difference in cmd output.

In my exact case /var/lib/mysql is a separate mounted ext4 filesystem, however same issue was present also on a Network Filesystem (NFS) and thus, my first thought that this is caused by a network failure problem or NFS bug turned to be wrong.

After further short investigation on the inodes on the Filesystem, it was clear enough inodes are available:
 

# df -i /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5      1221600  2562 1219038   1% /var/lib/mysql

 

So the filled inodes count assumed issue also has been rejected.
P.S. (if you're not well familiar with them read manual, i.e. – man 7 inode).
 

– Remounting the mounted filesystem

To make sure the filesystem shown inconsistency between du and df is not due to some hanging network mount or bug, first logical thing I did is to remount the filesytem showing different in size, in my case this was done with:
 

# mount -o remount,rw -t ext4 /var/lib/mysql

For machines with NFS remote mounted storage locations, used:

# mount -o remount,rw -t nfs /var/www


FS remount did not solved it so I continued to ponder what oddity and of course I thought of a workaround (in case if this issues are caused by kernel bug or OS lib issue) reboot might be the solution, however unfortunately restarting the VMs was not a wanted easy to do solution, thus I continued investigating what is wrong …

Next check of course was to check, what kind of network connections are opened to the affected hosts with:
 

# netstat -tupanl


Did not found anything that might point me to the reported different Megabytes issue, so next step was to check what is the situation with currently opened files by running processes on the weird df / du reported systems with lsof, and boom there I observed oddity such as multiple files

 

# lsof -nP | grep '(deleted)'

COMMAND   PID   USER   FD   TYPE DEVICE    SIZE NLINK  NODE NAME
mysqld   2588  mysql    4u   REG 253,17      52     0  1495 /var/lib/mysql/tmp/ibY0cXCd (deleted)
mysqld   2588  mysql    5u   REG 253,17    1048     0  1496 /var/lib/mysql/tmp/ibOrELhG (deleted)
mysqld   2588  mysql    6u   REG 253,17       777884290     0  1497 /var/lib/mysql/tmp/ibmDFAW8 (deleted)
mysqld   2588  mysql    7u   REG 253,17       123667875     0 11387 /var/lib/mysql/tmp/ib2CSACB (deleted)
mysqld   2588  mysql   11u   REG 253,17       123852406     0 11388 /var/lib/mysql/tmp/ibQpoZ94 (deleted)

 

Notice that There were plenty of '(deleted)' STATE files shown in memory an overall of 438:

 

# lsof -nP | grep '(deleted)' |wc -l
438


As I've learned a bit online about the problem, I found it is also possible to find deleted unlinked files only without any greps (to list all deleted files in memory files with lsof args only):

 

# lsof +L1|less


The SIZE field (fourth column)  shows a number of files that are really hard in size and that are kept in open on filesystem and in memory, totally messing up with the filesystem. In my case this is temp files created by MYSQLD daemon but depending on the server provided service this might be apache's www-data, some custom perl / bash script executed via a cron job, stalled rsync jobs etc.
 

2. Check all the list open files with the mysql / root user as part of the the server filesystem inconsistency debugging with:

 

– Grep opened files on server by user

# lsof |grep mysql
mysqld    1312                       mysql  cwd       DIR               8,21       4096          2 /var/lib/mysql
mysqld    1312                       mysql  rtd       DIR                8,1       4096          2 /
mysqld    1312                       mysql  txt       REG                8,1   20336792   23805048 /usr/sbin/mysqld
mysqld    1312                       mysql  mem       REG               8,21      24576         20 /var/lib/mysql/tc.log
mysqld    1312                       mysql  DEL       REG               0,16                 29467 /[aio]
mysqld    1312                       mysql  mem       REG                8,1      55792   14886933 /lib/x86_64-linux-gnu/libnss_files-2.28.so

 

# lsof | grep root
COMMAND    PID   TID TASKCMD          USER   FD      TYPE             DEVICE   SIZE/OFF       NODE NAME
systemd      1                        root  cwd       DIR                8,1       4096          2 /
systemd      1                        root  rtd       DIR                8,1       4096          2 /
systemd      1                        root  txt       REG                8,1    1489208   14928891 /lib/systemd/systemd
systemd      1                        root  mem       REG                8,1    1579448   14886924 /lib/x86_64-linux-gnu/libm-2.28.so

Other command that helped to track the discrepancy between df and du different file usage on FS is:
 

# du -hxa  / | egrep '^[[:digit:]]{1,1}G[[:space:]]*'
 

 

3. Fixing large files kept in memory filesystem problem


What is the real reason for ending up with this file handlers opened by running backgrounded programs on the Linux OS?
It could be multiple  but most likely it is due to exceeded server / client interactions or breaking up RAM or HDD drive with writing plenty of logs on the FS without ending keeping space occupied or Programming library bugs used by hanged service leaving the FH opened on storage.

What is the solution to file system files left in memory problem?

The best solution is to first fix custom script or hanged service and then if possible to simply restart the server to make the kernel / services reload or if this is not possible just restart the problem creation processes.

Once the process is identified like in my case this was MySQL on systemd enabled newer OS distros, just do:

 

 

# systemctl restart mysqld.service


or on older init.d system V ones:

# /etc/init.d/service restart


For custom hanged scripts being listed in ps axuwef you can grep the pid and do a kill -HUP (if the script is written in a good way to recognize -HUP and restart the sub-running process properly – BE EXTRA CAREFUL IF YOU'RE RESTARTING BROKEN SCRIPTS as this might cause your running service disruptions …).

# pgrep -l script.sh
7977 script.sh


# kill -HUP PID

 

Now finally this should either mitigate or at best case completely solve the reported disagreement between df and du, after which the calculated / reported disk space should be back to normal and show up approximately the same (note that size changes a bit as mysql service is writting data) constantly extending the size between the two checks.

 

# df -hk /var/lib/mysql; du -hskc /var/lib/mysql
Filesystem       Inodes  IUsed   IFree IUse% Mounted on
/dev/sdb5        19097172 3472744 14631296  20% /var/lib/mysql
3427772    /var/lib/mysql
3427772    total

 

What we learned?

What I've explained in this article is why and how it comes that 'zoombie' files reside on a filesystem
appearing to be eating disk space on a mounted local or network partition, giving strange inconsistent
reports, leading to system service disruptions and impossibility to have correctly shown information on used
disk space on mounted drive.

I went through with some standard logic on debugging service / filesystem / inode issues up explainat, that led me to the finding about deleted files being kept in filesystem and producing the filesystem strange sized / showing not correct / filled even after it was extended with tune2fs and was supposed to have extra 50GBs.

Finally it was explained shortly how to HUP / restart hanging script / service to fix it.

Some few good readings that helped to fix the issue:

What to do when du and df report different usage is here
df in linux not showing correct free space after file removal is here
Why do “df” and “du” commands show different disk usage?