Archive for the ‘OS Update’ Category

Debug and Fix QMAIL Mail Server qmail-inject: fatal: qq temporary problem (#4.3.0) and ‘reformime[1648048]: segfault at 0 ip 00007fea608bef28 sp 00007fff3c8d4bc0 error 4’ errors after update from Debian 11 to Debian 12

Monday, September 2nd, 2024

finding-qmail-install-problems-common-reasons-for-unworking-qmail-debugging-qmail

For a legacy reasons and lack of time and fact once Qmail is run on a server it works almost forever if you don't do very major upgrades and you still to the same version I have few Qmail SMTP servers that are nowadays are there for historical reasons.

After the last major version upgrade from Debian 11 to Debian 12, I've got the qmail smtpd not completely running fine and I have to follow some of my previous blog notes on how to recover in that situations as well as some common logic to resolve it.

After the upgrade I started getting every few minutes a repeating really annoying error due to reformime crashing in /var/log/messages as well as in qmail logs, the exact error was as so

Sep  1 22:15:34 pcfr_hware_local_ip kernel: [366799.585663] Code: eb e1 48 8d 15 31 96 13 00 e9 04 00 00 00 0f 1f 40 00 41 55 49 89 d5 41 54 49 89 f4 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 58 <80> 3b 00 74 46 4c 89 e6 48 89 df e8 58 61 f8 ff 48 8d 2c 03 80 7d
Sep  1 22:17:50 pcfr_hware_local_ip kernel: [366935.524185] reformime[1647438]: segfault at 0 ip 00007f0b9beeff28 sp 00007fff4ffd5850 error 4 in libc.so.6[7f0b9be76000+155000] likely on CPU 1 (core 1, socket 0)
Sep  1 22:17:50 pcfr_hware_local_ip kernel: [366935.524207] Code: eb e1 48 8d 15 31 96 13 00 e9 04 00 00 00 0f 1f 40 00 41 55 49 89 d5 41 54 49 89 f4 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 58 <80> 3b 00 74 46 4c 89 e6 48 89 df e8 58 61 f8 ff 48 8d 2c 03 80 7d
Sep  1 22:18:44 pcfr_hware_local_ip kernel: [366989.796532] reformime[1647577]: segfault at 0 ip 00007fe8e14bef28 sp 00007ffc000e9040 error 4 in libc.so.6[7fe8e1445000+155000] likely on CPU 1 (core 1, socket 0)
Sep  1 22:18:44 pcfr_hware_local_ip kernel: [366989.796554] Code: eb e1 48 8d 15 31 96 13 00 e9 04 00 00 00 0f 1f 40 00 41 55 49 89 d5 41 54 49 89 f4 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 58 <80> 3b 00 74 46 4c 89 e6 48 89 df e8 58 61 f8 ff 48 8d 2c 03 80 7d
Sep  1 22:20:08 pcfr_hware_local_ip kernel: [367072.889786] reformime[1647888]: segfault at 0 ip 00007efcaa6bef28 sp 00007ffdfe793560 error 4 in libc.so.6[7efcaa645000+155000] likely on CPU 1 (core 1, socket 0)
Sep  1 22:20:08 pcfr_hware_local_ip kernel: [367072.889809] Code: eb e1 48 8d 15 31 96 13 00 e9 04 00 00 00 0f 1f 40 00 41 55 49 89 d5 41 54 49 89 f4 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 58 <80> 3b 00 74 46 4c 89 e6 48 89 df e8 58 61 f8 ff 48 8d 2c 03 80 7d
Sep  1 22:21:14 pcfr_hware_local_ip kernel: [367139.010116] reformime[1648048]: segfault at 0 ip 00007fea608bef28 sp 00007fff3c8d4bc0 error 4 in libc.so.6[7fea60845000+155000] likely on CPU 1 (core 1, socket 0)
Sep  1 22:21:14 pcfr_hware_local_ip kernel: [367139.010139] Code: eb e1 48 8d 15 31 96 13 00 e9 04 00 00 00 0f 1f 40 00 41 55 49 89 d5 41 54 49 89 f4 55 53 48 89 fb 48 83 ec 08 48 85 ff 74 58 <80> 3b 00 74 46 4c 89 e6 48 89 df e8 58 61 f8 ff 48 8d 2c 03 80 7d
Sep  1 22:22:43 pcfr_hware_local_ip rsyslogd: — MARK —

To debug more concretely what exactly was happening with reformime and why it was crashing with the libc segfault error, I've used the journalctl log with this cmd:

# journalctl -p 3 -xb

сеп 01 22:10:27 pcfrxen qmail-scanner-queue.pl[2170438]: X-Qmail-Scanner-2.10st:[pcfrxen17252178278122170438] d_m: output spotted from /usr/bin/reformime  -x/var/spool/qscan/tmp/pcfrxen17252178278122170438/ (Segmentation fau>
                                                            ) – that shouldn't happen!
сеп 01 22:11:11 pcfrxen qmail-scanner-queue.pl[2170631]: X-Qmail-Scanner-2.10st:[pcfrxen17252178718122170631] d_m: output spotted from /usr/bin/reformime  -x/var/spool/qscan/tmp/pcfrxen17252178718122170631/ (Segmentation fau>
                                                            ) – that shouldn't happen!
сеп 01 22:15:32 pcfrxen qmail-scanner-queue.pl[2171777]: X-Qmail-Scanner-2.10st:[pcfrxen17252181328122171777] d_m: output spotted from /usr/bin/reformime  -x/var/spool/qscan/tmp/pcfrxen17252181328122171777/ (Segmentation fau>
                                                            ) – that shouldn't happen!
сеп 01 22:15:35 pcfrxen qmail-scanner-queue.pl[2171793]: X-Qmail-Scanner-2.10st:[pcfrxen17252181358122171793] d_m: output spotted from /usr/bin/reformime  -x/var/spool/qscan/tmp/pcfrxen17252181358122171793/ (Segmentation fau>
                                                            ) – that shouldn't happen!
сеп 01 22:21:21 pcfrxen qmail-scanner-queue.pl[2173427]: X-Qmail-Scanner-2.10st:[pcfrxen17252184788122173427] d_m: output spotted from /usr/bin/reformime  -x/var/spool/qscan/tmp/pcfrxen17252184788122173427/ (Segmentation fau>
                                                            ) – that shouldn't happen!


As you can see this showed that the problem is with reformime's passing on -x argument, and some temporary directory, thus to make sure the crash is not a cause of some mixed permissions, I've had to check the /var/spool/qscan permissions, and clamd permissions and few other permissions of the qmail install, and the wrong permissions (perhaps after the update of clamav after the Debian Linux migration was with /var/lib/clamav which was incorrectly owned by user clamav group clamav instead of the qscand / qscand user group, thus to resolve, I've run:

chown qscand:qscand /var/lib/clamav/ -R


Another thing I've had to correct was the /var/log/qmail permissions which was too permissive (perhaps due to some old install time hurry up stupidity done), so to correct, them:

# chmod 750 /var/log/qmail/


First thing i tried to resolve is of course to reinstall maildrop debian package that provides /usr/bin/reformime binary. 

root@pcfreak:/usr/local/bin# dpkg -l |grep -i maildrop
rc  courier-maildrop                      0.68.2-1                                                                   amd64        Courier mail server – mail delivery agent
ii  maildrop                              2.9.3-2.1                                                                  amd64        mail delivery agent with filtering abilities (set-GID=mail)

In an old post of mine on a similar error Fixing Qmail 451 qq temporary problem (#4.3.0) / @4000000050587780174c60dc status: qmail-todo stop processing asap / status: exiting, part of the solution was to reinstall maildrop, so tried this one:

root@pcfreak:/usr/local/bin# apt install –reinstall maildrop


Of course to try it out restarted qmail with the usual 

# qmailctl restart

Sadly enough this doesn't solve it, so I had to look up for other solutions and spend about 3 / 4 hours reading online just to convince myself that finding any meaningful in the classical human way, is becoming pretty much impossible task. As the content of information on the Internet has grown tremendously over the last years, it seems the quality of posts and commited data is exponentially detereorating. So the only way to solve crashes of binaries is either to stick to a debugger such as gdb or simply try rebuild the .deb binary from scratch and see whether a recompile from source might makes a difference.

After even more digging up online, found out some Gentoo forums threads, where people described thethe issue was also connected to the failing reformime libc use bug, with an applied C patch, found threads on Ubuntu and Debian users complaining about mysterious errors with libc with maildrop and even a bug report that this is some kind of libc bug, related to the precompiled version of maildrop shipped by default deb based repos.

Hence, My approach to resolve it was to recompile maildrop from source code, which even though looking a tedious task came with plenty of dependencies, I had to install plenty of developlment libraries and tools, compilers etc. as well as the following libs.

# apt install pcre2-utils
# apt install libpcre2-dev
# apt install libidn11
# apt install libidn2-dev
# apt install libcourier-unicode-dev
# apt install libcourier-unicode4

Then had to download and install from source the latest available versions of courier-authlib and its dependencies courier-unicode and once having those two recompiled with

# links https://sourceforge.net/projects/courier/files/courier/1.3.12/courier-1.3.12.tar.bz2/download
# links https://sourceforge.net/projects/courier/files/maildrop/3.1.8/maildrop-3.1.8.tar.bz2/download
# links https://sourceforge.net/projects/courier/files/authlib/0.72.3/courier-authlib-0.72.3.tar.bz2/download
# links https://sourceforge.net/projects/courier/files/courier-unicode/2.3.1/courier-unicode-2.3.1.tar.bz2/download

# tar -jxvf courier-unicode-2.3.1.tar.bz2
# tar -jxvvf courier-authlib-0.72.3.tar.bz2
# tar -jxvvf maildrop-3.1.8.tar.bz2

# cd courier-unicode-2.0/
# ./configure && make && make install

# cd ..
# cd courier-authlib-0.72.3
​# ./configure && make && make install
# cd ..

# cd maildrop-3.1.8/
​# ./configure && make && make install

I've took the time to also preinstall a bunch of perl modules deb packages which rawly are the ones found in file, i've built with the binaries perl-modules-for-qmail-needed.txt

To reinstall the binaries, run a small shell loop:

# for i in $(cat perl-modules-for-qmail-needed.txt); do apt install –reinstall $i –yes; done


Have to say also identified an issue with /var/qmail/bin/qmail-scanner-queue.pl with qmail-inject failing after testing qmail-scanner-queue installation with:

# /downloads/qmail-scanner-2.11st/contrib/test_installation.sh -doit
 

# ./test_installation.sh -doit

Sending standard test message – no viruses… 1/4
qmail-inject: fatal: qq temporary problem (#4.3.0)
Bad error. qmail-inject died


Anyone who ever administrated Qmail Mail server knows pretty well, about the Terrible error:

qmail-inject: fatal: qq temporary problem (#4.3.0)


and that it could be mostly anything, thus anyways to find out what might be the cause I've continued to Debug.

# ldd qmail-inject
        linux-vdso.so.1 (0x00007ffc43f5a000)
        libc.so.6 => /lib/x86_64-linux-gnu/libc.so.6 (0x00007fcc2099b000)
        /lib64/ld-linux-x86-64.so.2 (0x00007fcc20ba1000)


To succed debug the QMAIL issues with qmail I found as very useful something from another old post of mine on debugging qmail errors – Testing Qmail installation for problems: Common reasons for unworking qmail / How to debug Qmail mail server failing to delivery or send emails, the following log debug loop:

# for i in $(ls -d /var/log/qmail/*qmail*/); do tail -n 10 $i/current|tai64nlocal; sleep 5; done


Finally the last step resolve the qmail-inject error, was to modify /var/qmail/bin/qmail-scanner-queue.pl and exchange PATH of /usr/bin/reformime default shipped debian repository to new custom built /usr/local/bin/reformime.


After retesting the qmail-scanner installation all seemed fine onwards:
 

# /downloads/qmail-scanner-2.11st/contrib/test_installation.sh -doit

Sending standard test message – no viruses… 1/4
done!

Sending eicar test virus – should be caught by perlscanner module… 2/4
done!

Sending eicar test virus with altered filename – should only be caught by commercial anti-virus modules (if you have any)… 3/4
done!

Sending bad spam message for anti-spam testing – In case you are using SpamAssassin… 4/4


If you have enabled $sa_quarantine, $sa_delete or $sa_reject the
spam-message wont't arrive to the recipients. But if you have enabled
(good idea!) 'debug' you should check
/var/spool/qscan/qmail-queue.log (or where ever you have the log).


        Done!

Finished test. Now go and check Email sent to postmaster@mail.pc-freak.net and/or the log..

Thibs Qmail install qmr_inst_check script also reported my server qmail install scripts as in good state:
 

# /downloads/scripts/qmr_inst_check
! vpopmail database do not exist!

So Hip Hip Hooray my Qmails works again ! Me fixed it again ! 
if you need help with fixing your company Professional Mail QMAIL server or Postfix, contact me via the contact form. Enjoy

 

yum search file in all installable RPM, find out which rpm package provides binary file or missing library dependency on CentOS / RHEL / Fedora

Friday, August 23rd, 2024

images/centos-rhel-yum-clean-var-cache-yum

Sometimes if you have a missing library or a file you know should be available via an rpm but you're not sure which RPM you have to install you have to look up for library or binary file amongs all available installable r[ms on Redhat Linux / CentOS / Fedora or other RPM based distro.

It is really annoying especially, if you try to install an rpm binary and the package does not install due to missing dependency library. Having a missing dependency package could happen, if you use some custom internal prepared repository that is mirroring from original rpm repositories and the RPM Repositories are situated behind a DMZ firewall network (such scenarios are common for corporations and IT companies).
 
Finding out which file is provided by which package on Debian / Ubuntu and other deb based linux distributions is easy and done via the

# apt-file search filename

Thus if you're a system administrator coming from a Debian GNU / Linux sysadmin realm into the wonderful world of redhats, you will want to have an alternative to apt-file tool. You will be happy to find out that that this tedious task is easily done in RPM based Linux and is integrated straight into yum package manager too.

The command to search which rpm package provides a file is:

# yum whatprovides 'nc'

[root@rhel-linux ~]# yum whatprovides nc
Loaded plugins: fastestmirror, versionlock
Loading mirror speeds from cached hostfile
2:nmap-ncat-6.40-19.el7.x86_64 : Nmap's Netcat replacement
Repo        : base
Matched from:
Provides    : nc

 

2:nmap-ncat-6.40-19.el7.x86_64 : Nmap's Netcat replacement
Repo        : @base
Matched from:
Provides    : nc

 

yum whatprovides search_file_name can be also invoked with its shortcut yum provides 'search_file_name'

[root@rhel-server ~]# yum provides '/bin/ls'
Loaded plugins: fastestmirror, versionlock
Loading mirror speeds from cached hostfile
coreutils-8.22-24.el7.x86_64 : A set of basic GNU tools commonly used in shell scripts
Repo        : base
Matched from:
Filename    : /bin/ls

coreutils-8.22-24.el7_9.2.x86_64 : A set of basic GNU tools commonly used in shell scripts
Repo        : updates
Matched from:
Filename    : /bin/ls

 Here is another example:

[root@rhel-server ~]# yum -q provides '*lesspipe.sh*'
less-458-9.el7.x86_64 : A text file browser similar to more, but better
Repo        : base
Matched from:
Filename    : /usr/bin/lesspipe.sh

source-highlight-3.1.6-6.el7.i686 : Produces a document with syntax highlighting
Repo        : base
Matched from:
Filename    : /usr/bin/src-hilite-lesspipe.sh

source-highlight-3.1.6-6.el7.x86_64 : Produces a document with syntax highlighting
Repo        : base
Matched from:
Filename    : /usr/bin/src-hilite-lesspipe.sh

spirv-tools-2019.1-4.el7.x86_64 : API and commands for processing SPIR-V modules
Repo        : epel
Matched from:
Filename    : /usr/bin/spirv-lesspipe.sh

You can search for any file and if the RPm repository is defined under /etc/yum/repos.d/* and enabled, yum whatprovides command should be able to find it and tell you which RPM package you have to install to have the file installed Redhat way.

  • You can list all enabled RPM repositories with cmd:
     

[root@rhel-server ~]# yum repolist enabled
Loaded plugins: fastestmirror, versionlock
Loading mirror speeds from cached hostfile
repo id                                                   repo name                                                                      status
3party                                                    Third party packages – x86_64                                                   2,631
base/7/x86_64                                             CentOS-7 – Base                                                                10,072
cr/7/x86_64                                               CentOS-7 – CR                                                                       0
epel/7/x86_64                                             EPEL packages for RedCent 7 – x86_64                                           13,791
extras/7/x86_64                                           CentOS-7 – Extras                                                                 526
updates/7/x86_64                                          CentOS-7 – Updates                                                              5,802
zabbix-6.0                                                Zabbix 6.0 repo                                                                   429
repolist: 33,251
 

  • To list disable RPM repositories:
     

# yum repolist disabled


To list all present available repositories that could be enabled and are set via the /etc/yum.repos.d/* configs

# yum repolist all

How to Update / Migrate zabbix-agent 5 to zabbix-agent2 6 on Redhat / CentOS / Fedora Linux

Friday, August 9th, 2024

Upgrade-zabbix-agent1-5-to-zabbix-agent2-6-on-RHEL-CentOS-Fedora-Linux-howto-logo

If you have servers reporting monitoring with Zabbix running still on Zabbix-Agent 1 version 5.0.X but already migrated the Zabbix-server to Zabbix 6, it is a good idea to update the Agent to Zabbix Agent 6 As sson as possible, as you know lacking behind in version makes updating harder and more complicated task.

Mine and I guess most system administrators experience points that Keeping at the same level of versioning on many applications historically has shown to reduce unexpected errors and bugs but nowadays, the rule of keeping local and remote application ( programs )  at the same version level is regularly broken.

Theoretically Zabbix-Agent (Client) and Zabbix (Server) has a compitability for a certain range of versions (Zabbix agents 2 from version 4.4 onwards are compatible with Zabbix 7.0; Zabbix agent 2 must not be newer than 7.0 – for more on zabbix agent – > server version compitability check here ) and having a slight version difference should not be really a problem but often you might have a third party proxies in between such as haproxy or zabbix-proxy or other network oddities and thus my personal opinion is that for interoperability it is better to keep the Zabbix Clients and Zabbix Servers across the DMZ-ed networks running at same version level.

Some would say I have an old fashion thinking as software and technology is moving forward, but as I see how programming code writing and even software is constantly degradating just a reflection of degradation of human element, I prefer to keep my old know how and always stick to same versioning whenever possible.

Some would wonder then why would I upgrade to Zabbix-agent2 ? , if have to keep the same versioning, the reason is zabbix-agent2 is written in GO Language and is much faster and supposably better piece of software than Zabbix Agent1 that is written in Python.

Moreover having Zabbix agent 2 instead of 1 gives also benefits as you can do a bit more with zabbix and on the other hand the machines are more ready for monitoring in terms of future. To know more about the Benefits of Zabbix Agent2 compared to Zabbix Agent 1 read the Agent vs Agent2 comparison on zabbix website.

 

With this little introduction, lets proceed with the exact steps to take to upgrade zabbix-agent1 to zabbix-agent2.

1. Check the current installed Zabbix-Agent version 

[user@monitored-server ~]$ rpm -qa |grep -i zabb
zabbix-get-5.0.42-1.el8.x86_64
zabbix-sender-5.0.42-1.el8.x86_64
zabbix-agent-5.0.42-1.el8.x86_64

[user@server ~]$ 

 

2. Create backup copy of current system working zabbix_agentd.conf
 

Before messing up with the working zabbix-agent as usual create the necessery backup to prevent later suprises

[user@monitored-server ~]$ cp -vrpf /etc/zabbix/zabbix_agentd.conf /etc/zabbix/zabbix_agentd.conf.bak-$(date '+%Y-%m-%d_%H-%M-%S')

3. Check current configured Zabbix repos

 

[user@monitored-server ~]$ vim /etc/yum.repos.d/zabbix.repo
 

[zabbix-4.0]
name = zabbix-4.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-4.0/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-4.0/zabbix-official-repo.key
gpgcheck = 1

[zabbix-4.4]
name = zabbix-4.4 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-4.4/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-4.4/zabbix-official-repo.key
gpgcheck = 1

[zabbix-5.0]
name = zabbix-5.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-5.0/8/$basearch
enabled = 1
gpgkey = http://zabbix-repo-server.com/external/zabbix-5.0/zabbix-official-repo.key
gpgcheck = 1

[zabbix-5.4]
name = zabbix-5.4 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-5.4/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-5.4/zabbix-official-repo.key
gpgcheck = 1

[zabbix-6.0]
name = zabbix-6.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-6.0/8/$basearch
enabled = 0
gpgkey = http://zabbix-repo-server.com/external/zabbix-6.0/zabbix-official-repo.key
gpgcheck = 1


4. Modify repositories and include the Zabbix Agent6 yum repos 
 

[user@monitored-server ~]$ cp -rpf zabbix.repo zabbix.repo.5.0.rpmsave

As we want to keep only the 6.0 version, leave only the zabbix-6.0 section and enable the repo:
 

[user@monitored-server ~]$ vim /etc/yum.repos.d/zabbix.repo

[zabbix-6.0]
name = zabbix-6.0 – 8
baseurl = http://zabbix-repo-server.com/external/zabbix-6.0/8/$basearch
enabled = 1
gpgkey = http://zabbix-repo-server.com/external/zabbix-6.0/zabbix-official-repo.key
gpgcheck = 1


5. Update zabbix-agent to zabbix-agent2 and update zabbix-get zabbix-sender versions

To not disrupt reported monitoring for zabbix-agent, don't delete zabbix-agent1 but instead in pararallel install and configure
zabbix-agent2 and then once configuration is migrated from Agent 1 to 2, stop the old zabbix-agent and bring up the new one.

[user@monitored-server ~]$ yum check-update

[user@monitored-server ~]$ yum install zabbix-agent2 zabbix-get zabbix-sender -y

Note that if you want to have a precise version number of zabbix-agent that is lets say 6.0.31 to correspond to zabbix-server 6.0.31 (even though in the repositories newer RPM versions are available), run:
 

[user@monitored-server ~]$ yum upgrade zabbix-agent2-6.0.31-release1.el8

 

  • Check new zabbix_agent2 installed version 


# zabbix_agent2 -V
zabbix_agent2 (Zabbix) 6.0.31
Revision b6d93755a1b 17 June 2024, compilation time: {undefined} {undefined}, built with: go1.21.3
Plugin communication protocol version is 6.0.13

Copyright (C) 2024 Zabbix SIA
License GPLv2+: GNU GPL version 2 or later <https://www.gnu.org/licenses/>.
This is free software: you are free to change and redistribute it according to
the license. There is NO WARRANTY, to the extent permitted by law.

This product includes software developed by the OpenSSL Project
for use in the OpenSSL Toolkit (http://www.openssl.org/).

Compiled with OpenSSL 1.1.1k  FIPS 25 Mar 2021
Running with OpenSSL 1.1.1k  FIPS 25 Mar 2021

We use the library Eclipse Paho (eclipse/paho.mqtt.golang), which is
distributed under the terms of the Eclipse Distribution License 1.0 (The 3-Clause BSD License)
available at https://www.eclipse.org/org/documents/edl-v10.php

We use the library go-modbus (goburrow/modbus), which is
distributed under the terms of the 3-Clause BSD License
available at https://github.com/goburrow/modbus/blob/master/LICENSE

 

6. Migrate old /etc/zabbix/zabbix_agentd.conf to /etc/zabbix/zabbix-agent2.conf

For readability to show the main configured variables for zabbix-agent without the tons of comments, to later include in agent2
 

[root@monitored-server ~]# cat /etc/zabbix/zabbix_agentd.conf | grep -v '\#' | sed '/^$/d' 
PidFile=/var/run/zabbix/zabbix_agentd.pid
LogFile=/var/log/zabbix/zabbix_agentd.log
LogFileSize=0
Server=10.50.37.8,127.0.0.1
ServerActive=10.50.37.8,127.0.0.1
Hostname=fqdn-of-monitored-host.domain.com
Timeout=20
Include=/etc/zabbix/zabbix_agentd.d/*.conf

The default zabbix-agent2 installed config would like similar to:

[root@monitored-server ~]# cat /etc/zabbix/zabbix_agent2.conf | grep -v '\#' | sed '/^$/d'
PidFile=/run/zabbix/zabbix_agent2.pid
LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=0
Server=127.0.0.1
# Specify the location of the Zabbix server host.
ServerActive=127.0.0.1
Hostname=Zabbix server
Include=/etc/zabbix/zabbix_agent2.d/*.conf
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=./zabbix_agent2.d/plugins.d/*.conf

The new migrate one, should be like:

[root@monitored-server ~]# vim /etc/zabbix/zabbix_agent2.conf
PidFile=/run/zabbix/zabbix_agent2.pid
LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=10
Server=10.34.89.7,127.0.0.1
ServerActive=10.34.89.7,127.0.0.1
Hostname=lqgblu02f.ffm.de.int.atosorigin.com
Timeout=20
Include=/etc/zabbix/zabbix_agent2.d/*.conf
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf


7. Add few Optimization variables for better zabbix-server -> zabbix-proxy -> zabbix-server interactions 

If you have sometimes a network delays between zabbix server -> zabbix client and vice versa (depending on whether Zabbix agent is configured as Active or Passive mode), it is often useful 
to add those 2 variables:

# How often list of active checks is refreshed, in seconds
RefreshActiveChecks=60
# Refresh the active checks on start.ForceActiveChecksOnStart=1
ForceActiveChecksOnStart=1


Also it might be a good practice to add zabbix_agent2.log monitoring with the agent itself, if the log exceeds certain amount, instead of calling it via logrotate.
 

# Perform log file rotation at the 1 MB point for the specified filepath
LogFileSize=1

 

[root@monitored-server ~]# vim /etc/zabbix/zabbix_agent2.conf
PidFile=/run/zabbix/zabbix_agent2.pid
LogFile=/var/log/zabbix/zabbix_agent2.log
LogFileSize=10
Server=10.34.89.7,127.0.0.1
ServerActive=10.34.89.7,127.0.0.1
Hostname=lqgblu02f.ffm.de.int.atosorigin.com
RefreshActiveChecks=60
ForceActiveChecksOnStart=1
Timeout=20
Include=/etc/zabbix/zabbix_agent2.d/*.conf
PluginSocket=/run/zabbix/agent.plugin.sock
ControlSocket=/run/zabbix/agent.sock
Include=/etc/zabbix/zabbix_agent2.d/plugins.d/*.conf

 

8. Stop the old zabbix agent process and run the new one

# systemctl status –full zabbix-agent2
# systemctl stop zabbix-agent


Assuming that the configuratoin of zabbix-agent is correct, execute zabbix-agent2 via system control.and check its status
 

# systemctl start zabbix-agent2
# systemctl status –full zabbix-agent2


If no errors in the configuration, the zabbix_agent2 process should be up and running and the status of above systemctl cmd should report fine.
If you need concretics regarding exact Zabbix checks or whther current conigured Userparameter scripts errors, or any other warnings or errors
of zabbix_agent2 interacting to the server, check further the logs

[root@monitored-server ~]# tail -n 10 /var/log/zabbix/zabbix_agent2.log  
2024/08/06 17:26:52.998749 using plugin 'WebPage' (built-in) providing following interfaces: exporter, configurator
2024/08/06 17:26:52.998760 using plugin 'ZabbixAsync' (built-in) providing following interfaces: exporter
2024/08/06 17:26:52.998794 using plugin 'ZabbixStats' (built-in) providing following interfaces: exporter, configurator
2024/08/06 17:26:52.998804 lowering the plugin ZabbixSync capacity to 1 as the configured capacity 100 exceeds limits
2024/08/06 17:26:52.998820 using plugin 'ZabbixSync' (built-in) providing following interfaces: exporter
2024/08/06 17:26:52.998993 Plugin communication protocol version is 6.0.13
2024/08/06 17:26:52.999018 Zabbix Agent2 hostname: [lqgblu02f.ffm.de.int.atosorigin.com]
2024/08/06 17:26:54.000667 [102] cannot connect to [127.0.0.1:10051]: dial tcp :0->127.0.0.1:10051: connect: connection refused
2024/08/06 17:26:54.000836 [102] active check configuration update from host [lqgblu02f.ffm.de.int.atosorigin.com] started to fail
2024/08/06 17:26:59.344837 Zabbix Agent 2 stopped. (6.0.31)

All Debian Linux package repository apt sources.list file for Debian versions 6, 7, 8, 9, 10, 11 and 12

Friday, May 31st, 2024

debian-package-management-repositories-for-all-distributions

If you have to administrate legacy Debian servers, that keeps hanging either for historical reasons or just because you didn't have time to upgrade it up to latest versions, machines that are hanging in the hangar or a mid office building Old server room, doing nothing but simply NAT (Network Address Translation), Proxying, serving  traffic via Squid / Haproxy / Apache / Varnish or Nginx server but you still want to have the possibility to extend the OS even though it is out of date / End of Life reached and out of support as well as perhaps full of security holes, but due to its unvisibility on the Internet hanging in a Demilitarized network the machine stayed on the Local (DMZ)-ed network and still for example you need to install simple things for administration reasons locally on the machine, for example nmap or netcat or some of the network tools for monitoring such as iftop or iptraf etc. you might find out unfortunately that this is not possible anymore, because the configured /etc/apt/sources.list repository mirror is no longer available at its URL. Thus to restore the functioning of apt and apt-get pkg management tools on Debian you need to correct the broken missing package mirrors due to resructurings on the network with a correct ones, originally provided by Debian or eventually if this doesn't work a possible Debian package archive URL. 

In this article, I'll simply provide such URLs you might use to correct your no longer functioning package manager due to package repositoriy unavailibility, below are the URLs (most of which that should be working as of year 2024). To resolve the issues edit and place the correct Debian version you're using.

1. Check the version of the Debian Linux

# cat /etc/debian_version


or use the universal way to check the linux OS, that should be working on almost all Linux distributions

# cat /etc/issue
Debian GNU/Linux 9 \n \l

2. Modify /etc/apt/sources.list and place URL according to Debian distro version

# vim /etc/apt/sources.list


3. Repositories URL list Original and Archived for .deb packages according to Debian distro release
Debian 6 (Wheezy)

Original repostiroes (Not Available and Not working anymore as of year 2024)

 

Old Archived .deb repository for 6 Squeeze

deb http://archive.debian.org/debian squeeze main
deb http://archive.debian.org/debian squeeze-lts main


​Debian 7 (Wheezy)

Original repostiroes (Not Available and Not working anymore as of year 2024)

Old Archived .deb repository for Jessie (still working as of 2024) :

deb http://archive.debian.org/debian wheezy main contrib non-free
deb http://archive.debian.org/debian-security wheezy/updates main

( Security updates are not provided anymore.)

NOTE:  If you get an error about keyrings, just install it
 

# apt-get install debian-archive-keyring


Debian 8 (Jessie)
Original .deb package repository with non-free included for Debian 8 "Jessie"

deb http://deb.debian.org/debian/ jessie main contrib non-free
deb http://ftp.debian.org/debian/ jessie-updates main contrib
deb http://security.debian.org/ jessie/updates main contrib non-free

Old Archived .deb repository for 8 Jessie (still working as of 2024):

deb http://archive.debian.org/debian/ jessie main non-free contrib
deb-src http://archive.debian.org/debian/ jessie main non-free contrib
deb http://archive.debian.org/debian-security/ jessie/updates main non-free contrib
deb-src http://archive.debian.org/debian-security/ jessie/updates main non-free contrib

 

# echo "Acquire::Check-Valid-Until false;" | tee -a /etc/apt/apt.conf.d/10-nocheckvalid

# apt-get update

# apt-get update && apt-get upgrade

 

 If you need backports, first be warned that these are archived and no longer being updated; they may have security bugs or other major issues. They are not supported in any way.

deb http://archive.debian.org/debian/ jessie-backports main


Debian 9 (Stretch)
Original .deb package repository with non-free included for Debian 9 "Stretch":

 

deb http://deb.debian.org/debian/ stretch main contrib non-free
deb http://deb.debian.org/debian/ stretch-updates main contrib non-free
deb http://security.debian.org/ stretch/updates main contrib non-free

Archived old repository .deb for Stretch :

deb http://archive.debian.org/debian/ stretch main contrib non-free
deb http://archive.debian.org/debian/ stretch-proposed-updates main contrib non-free
deb http://archive.debian.org/debian-security stretch/updates main contrib non-free


Debian 10 (Buster)
Origian repository URL:

deb http://deb.debian.org/debian/ buster main non-free contrib
deb http://deb.debian.org/debian/ buster-updates main non-free contrib
deb http://security.debian.org/ buster/updates main non-free contrib

 

Fixing unworking backports for Debian 10 Buster


Change the /etc/apt/sources.list URL with this one

deb http://archive.debian.org/debian buster-backports main contrib non-free


If you want to list packages installed via the backports repository only, that needs to be replaced with newer versions (if such available from the repository)

# apt list –installed | grep backports
# dpkg –list | grep bpo
# dpkg –list | grep -E '^ii.*bpo.*'

ii  libpopt0:amd64                        1.18-2                         amd64        lib for parsing cmdline parameters
ii  libuutil3linux                        2.0.3-9~bpo10+1                amd64        Solaris userland utility library for Linux
ii  libzfs4linux                          2.0.3-9~bpo10+1                amd64        OpenZFS filesystem library for Linux


Debian 11 (Bullseye)
Origianl repository address:

deb http://deb.debian.org/debian bullseye main contrib non-free
deb http://deb.debian.org/debian bullseye-updates main contrib non-free
deb http://security.debian.org/debian-security bullseye-security main contrib non-free

Debian 12 (Bookworm)
Original Repository :

 

deb http://deb.debian.org/debian bookworm main contrib non-free-firmware non-free
deb http://deb.debian.org/debian bookworm-updates main contrib non-free-firmware non-free
deb http://security.debian.org/debian-security bookworm-security main contrib non-free-firmware non-free

Add Backports to sources.list

deb http://deb.debian.org/debian bookworm-backports main


Thats all, hopefully that would help some sysadmin out there. Enjoy !

Fix ruby: /usr/lib/libcrypt.so.1: version `XCRYPT_2.0′ not found in apt upgrade on Debian Linux 10

Saturday, August 5th, 2023

I've an old legacy Thinkpad Laptop that is for simplicty running Window Maker Wmaker which was laying on my home desk for almost an year and I remembered since i'm for few days in my parents home in Dobrich that it will be a good idea to update its software to the latest Debian packages to patch security issues with it. Thus if you're like me and  you tried to update your Debian 10 Linux to the latest Stable release debian packages  and you end up into a critical error that is preventing apt to to resolve conflicts (fix it with) cmds like:

# apt-get update –fix-missing

# apt –fix-broken install

As usual I looked into Google to see about solution and found few articles, claiming to have scripts that fix it but at the end nothing worked.
And the shitty error occured during the standard:

# apt-get update && apt-get upgrade

ruby: /usr/lib/libcrypt.so.1: version `XCRYPT_2.0' not found

Hence the cause and work around seemed to be very unexpected.
For some reason debian makes a link

root@noah:/lib# ls -al /lib/libcrypt.so.1
lrwxrwxrwx 1 root root 19 Aug  3 16:53 /lib/libcrypt.so.1 ->
libcrypt.so.1.bak

root@noah:/lib# ls -al /lib/libcrypt.so.1.bak
lrwxrwxrwx 1 root root 16 Jun 15  2017 /lib/libcrypt.so.1.bak -> libcrypt-2.24.so

Thus to resolve it and force the .deb upgrade package to continue it is up to simply deleting the strange simlink and re-run the

# apt-get update && apt-get upgrade

Setting up libc6:i386 (2.31-13+deb11u6) …
/usr/bin/perl: /lib/libcrypt.so.1: version `XCRYPT_2.0' not found (required by /usr/bin/perl)
dpkg: error processing package libc6:i386 (–configure):
 installed libc6:i386 package post-installation script subprocess returned error exit status 1
Errors were encountered while processing:
 libc6:i386

Few more times. If you get some critical apt failures still, each time make sure to rerun the command after doing a simple removal of the strange simbolic link with cmd:

# rm -f /lib/libcrypt.so.1

That's all folks after a short while your Debian will be updated to latest Enoy folks ! 🙂

List and fix failed systemd failed services after Linux OS upgrade and how to get full info about systemd service from jorunal log

Friday, February 25th, 2022

systemd-logo-unix-linux-list-failed-systemd-services

I have recently upgraded a number of machines from Debian 10 Buster to Debian 11 Bullseye. The update as always has some issues on some machines, such as problem with package dependencies, changing a number of external package repositories etc. to match che Bullseye deb packages. On some machines the update was less painful on others but the overall line was that most of the machines after the update ended up with one or more failed systemd services. It could be that some of the machines has already had this failed services present and I never checked them from the previous time update from Debian 9 -> Debian 10 or just some mess I've left behind in the hurry when doing software installation in the past. This doesn't matter anyways the fact was that I had to deal to a number of systemctl services which I managed to track by the Failed service mesage on system boot on one of the physical machines and on the OpenXen VTY Console the rest of Virtual Machines after update had some Failed messages. Thus I've spend some good amount of time like an overall of a day or two fixing strange failed services. This is how this small article was born in attempt to help sysadmins or any home Linux desktop users, who has updated his Debian Linux / Ubuntu or any other deb based distribution but due to the chaotic nature of Linux has ended with same strange Failed services and look for a way to find the source of the failures and get rid of the problems. 
Systemd is a very complicated system and in my many sysadmin opinion it makes more problems than it solves, but okay for today's people's megalomania mindset it matches well.

Systemd_components-systemd-journalctl-cgroups-loginctl-nspawn-analyze.svg

 

1. Check the journal for errors, running service irregularities and so on
 

First thing to do to track for errors, right after the update is to take some minutes and closely check,, the journalctl for any strange errors, even on well maintained Unix machines, this journal log would bring you to a problem that is not fatal but still some process or stuff is malfunctioning in the background that you would like to solve:
 

root@pcfreak:~# journalctl -x
Jan 10 10:10:01 pcfreak CRON[17887]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17887]: USER_END pid=17887 uid=0 auid=0 ses=340858 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17888]: CRED_DISP pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17888]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17888]: USER_END pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17884]: CRED_DISP pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17884]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17884]: USER_END pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17886]: CRED_DISP pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="www-data" exe="/usr/sbin/c>
Jan 10 10:10:01 pcfreak CRON[17886]: pam_unix(cron:session): session closed for user www-data
Jan 10 10:10:01 pcfreak audit[17886]: USER_END pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permi>
Jan 10 10:10:08 pcfreak NetworkManager[696]:  [1641802208.0899] device (eth1): carrier: link connected
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:19 pcfreak NetworkManager[696]:
 [1641802219.7920] device (eth1): carrier: link connected
Jan 10 10:10:19 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:20 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:22 pcfreak NetworkManager[696]:
 [1641802222.2772] device (eth1): carrier: link connected
Jan 10 10:10:22 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:23 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:33 pcfreak sshd[18142]: Unable to negotiate with 66.212.17.162 port 19255: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1,diff>
Jan 10 10:10:41 pcfreak NetworkManager[696]:
 [1641802241.0186] device (eth1): carrier: link connected
Jan 10 10:10:41 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx

If you want to only check latest journal log messages use the -x -e (pager catalog) opts

root@pcfreak;~# journalctl -xe

Feb 25 13:08:29 pcfreak audit[2284920]: USER_LOGIN pid=2284920 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='op=login acct=28696E76616C>
Feb 25 13:08:29 pcfreak sshd[2284920]: Received disconnect from 177.87.57.145 port 40927:11: Bye Bye [preauth]
Feb 25 13:08:29 pcfreak sshd[2284920]: Disconnected from invalid user ubuntuuser 177.87.57.145 port 40927 [preauth]

Next thing to after the update was to get a list of failed service only.


2. List all systemd failed check services which was supposed to be running

root@pcfreak:/root # systemctl list-units | grep -i failed
● certbot.service                                                                                                       loaded failed failed    Certbot
● logrotate.service                                                                                                     loaded failed failed    Rotate log files
● maldet.service                                                                                                        loaded failed failed    LSB: Start/stop maldet in monitor mode
● named.service                                                                                                         loaded failed failed    BIND Domain Name Server


Alternative way is with the –failed option

hipo@jeremiah:~$ systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
7 loaded units listed.

 

root@jeremiah:/etc/apt/sources.list.d#  systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon


To get a full list of objects of systemctl you can pass as state:
 

# systemctl –state=help
Full list of possible load states to pass is here
Show service properties


Check whether a service is failed or has other status and check default set systemd variables for it.

root@jeremiah~:# systemctl is-failed vboxweb.service
inactive

# systemctl show haproxy
Type=notify
Restart=always
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
SuccessExitStatus=143
MainPID=304858
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success

Full output of the above command is dumped in show_systemctl_properties.txt


3. List all running systemd services for a better overview on what's going on on machine
 

To get a list of all properly systemd loaded services you can use –state running.

hipo@jeremiah:~$ systemctl list-units –state running|head -n 10
  UNIT                              LOAD   ACTIVE SUB     DESCRIPTION
  proc-sys-fs-binfmt_misc.automount loaded active running Arbitrary Executable File Formats File System Automount Point
  cups.path                         loaded active running CUPS Scheduler
  init.scope                        loaded active running System and Service Manager
  session-2.scope                   loaded active running Session 2 of user hipo
  accounts-daemon.service           loaded active running Accounts Service
  anydesk.service                   loaded active running AnyDesk
  apache-htcacheclean.service       loaded active running Disk Cache Cleaning Daemon for Apache HTTP Server
  apache2.service                   loaded active running The Apache HTTP Server
  avahi-daemon.service              loaded active running Avahi mDNS/DNS-SD Stack

 

It is useful thing is to list all unit-files configured in systemd and their state, you can do it with:

 


root@pcfreak:~# systemctl list-unit-files
UNIT FILE                                                                 STATE           VENDOR PRESET
proc-sys-fs-binfmt_misc.automount                                         static          –            
-.mount                                                                   generated       –            
backups.mount                                                             generated       –            
dev-hugepages.mount                                                       static          –            
dev-mqueue.mount                                                          static          –            
media-cdrom0.mount                                                        generated       –            
mnt-sda1.mount                                                            generated       –            
proc-fs-nfsd.mount                                                        static          –            
proc-sys-fs-binfmt_misc.mount                                             disabled        disabled     
run-rpc_pipefs.mount                                                      static          –            
sys-fs-fuse-connections.mount                                             static          –            
sys-kernel-config.mount                                                   static          –            
sys-kernel-debug.mount                                                    static          –            
sys-kernel-tracing.mount                                                  static          –            
var-www.mount                                                             generated       –            
acpid.path                                                                masked          enabled      
cups.path                                                                 enabled         enabled      

 

 


root@pcfreak:~# systemctl list-units –type service –all
  UNIT                                   LOAD      ACTIVE   SUB     DESCRIPTION
  accounts-daemon.service                loaded    inactive dead    Accounts Service
  acct.service                           loaded    active   exited  Kernel process accounting
● alsa-restore.service                   not-found inactive dead    alsa-restore.service
● alsa-state.service                     not-found inactive dead    alsa-state.service
  apache2.service                        loaded    active   running The Apache HTTP Server
● apparmor.service                       not-found inactive dead    apparmor.service
  apt-daily-upgrade.service              loaded    inactive dead    Daily apt upgrade and clean activities
 apt-daily.service                      loaded    inactive dead    Daily apt download activities
  atd.service                            loaded    active   running Deferred execution scheduler
  auditd.service                         loaded    active   running Security Auditing Service
  auth-rpcgss-module.service             loaded    inactive dead    Kernel Module supporting RPCSEC_GSS
  avahi-daemon.service                   loaded    active   running Avahi mDNS/DNS-SD Stack
  certbot.service                        loaded    inactive dead    Certbot
  clamav-daemon.service                  loaded    active   running Clam AntiVirus userspace daemon
  clamav-freshclam.service               loaded    active   running ClamAV virus database updater
..

 


linux-systemd-components-diagram-linux-kernel-system-targets-systemd-libraries-daemons

 

4. Finding out more on why a systemd configured service has failed


Usually getting info about failed systemd service is done with systemctl status servicename.service
However, in case of troubles with service unable to start to get more info about why a service has failed with (-l) or (–full) options


root@pcfreak:~# systemctl -l status logrotate.service
● logrotate.service – Rotate log files
     Loaded: loaded (/lib/systemd/system/logrotate.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 00:00:06 EET; 13h ago
TriggeredBy: ● logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
    Process: 2045320 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=1/FAILURE)
   Main PID: 2045320 (code=exited, status=1/FAILURE)
        CPU: 2.479s

Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: For now we will assume you meant to write /32
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| ERROR: '0.0.0.0/0.0.0.0' needs to be replaced by the term 'all'.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| SECURITY NOTICE: Overriding config setting. Using 'all' instead.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: (B) '::/0' is a subnetwork of (A) '::/0'
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: because of this '::/0' is ignored to keep splay tree searching predictable
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: You should probably remove '::/0' from the ACL named 'all'
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Failed with result 'exit-code'.
Feb 25 00:00:06 pcfreak systemd[1]: Failed to start Rotate log files.
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Consumed 2.479s CPU time.


systemctl -l however is providing only the last log from message a started / stopped or whatever status service has generated. Sometimes systemctl -l servicename.service is showing incomplete the splitted error message as there is a limitation of line numbers on the console, see below

 

root@pcfreak:~# systemctl status -l certbot.service
● certbot.service – Certbot
     Loaded: loaded (/lib/systemd/system/certbot.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 09:28:33 EET; 4h 0min ago
TriggeredBy: ● certbot.timer
       Docs: file:///usr/share/doc/python-certbot-doc/html/index.html
             https://certbot.eff.org/docs
    Process: 290017 ExecStart=/usr/bin/certbot -q renew (code=exited, status=1/FAILURE)
   Main PID: 290017 (code=exited, status=1/FAILURE)
        CPU: 9.771s

Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using th>
Feb 25 09:28:33 pcfrxen certbot[290017]: All renewals failed. The following certificates could not be renewed:
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/mail.pcfreak.org-0003/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/www.eforia.bg-0005/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/zabbix.pc-freak.net/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]: 3 renew failure(s), 5 parse failure(s)
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Failed with result 'exit-code'.
Feb 25 09:28:33 pcfrxen systemd[1]: Failed to start Certbot.
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Consumed 9.771s CPU time.

 

5. Get a complete log of journal to make sure everything configured on server host runs as it should

Thus to get more complete list of the message and be able to later google and look if has come with a solution on the internet  use:

root@pcfrxen:~#  journalctl –catalog –unit=certbot

— Journal begins at Sat 2022-01-22 21:14:05 EET, ends at Fri 2022-02-25 13:32:01 EET. —
Jan 23 09:58:18 pcfrxen systemd[1]: Starting Certbot…
░░ Subject: A start job for unit certbot.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A start job for unit certbot.service has begun execution.
░░ 
░░ The job identifier is 5754.
Jan 23 09:58:20 pcfrxen certbot[124996]: Traceback (most recent call last):
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/renewal.py", line 71, in _reconstitute
Jan 23 09:58:20 pcfrxen certbot[124996]:     renewal_candidate = storage.RenewableCert(full_path, config)
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 471, in __init__
Jan 23 09:58:20 pcfrxen certbot[124996]:     self._check_symlinks()
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 537, in _check_symlinks

root@server:~# journalctl –catalog –unit=certbot|grep -i pluginerror|tail -1
Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using the manual plugin non-interactively.')


Or if you want to list and read only the last messages in the journal log regarding a service

root@server:~# journalctl –catalog –pager-end –unit=certbot


If you have disabled a failed service because you don't need it to run at all on the machine with:

root@rhel:~# systemctl stop rngd.service
root@rhel:~# systemctl disable rngd.service

And you want to clear up any failed service information that is kept in the systemctl service log you can do it with:
 

root@rhel:~# systemctl reset-failed

Another useful systemctl option is cat, you can use it to easily list a service it is useful to quickly check what is a service, an actual shortcut to save you from giving a full path to the service e.g. cat /lib/systemd/system/certbot.service

root@server:~# systemctl cat certbot
# /lib/systemd/system/certbot.service
[Unit]
Description=Certbot
Documentation=file:///usr/share/doc/python-certbot-doc/html/index.html
Documentation=https://certbot.eff.org/docs
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot -q renew
PrivateTmp=true


After failed SystemD services are fixed, it is best to reboot the machine and check put some more time to inspect rawly the complete journal log to make sure, no error  was left behind.


Closure
 

As you can see updating a machine from a major to a major version even if you follow the official documentation and you have plenty of experience is always more or a less a pain in the ass, which can eat up much of your time banging your head solving problems with failed daemons issues with /etc/rc.local (which I have faced becase of #/bin/sh -e (which would make /etc/rc.local) to immediately quit if any error from command $? returns different from 0 etc.. The  logical questions comes then;
1. Is it really worthy to update at all regularly, especially if you don't know of a famous major Vulnerability 🙂 ?
2. Or is it worthy to update from OS major release to OS major release at all?  
3. Or should you only try to patch the service that is exposed to an external reachable computer network or the internet only and still the the same OS release until End of Life (LTS = Long Term Support) as called in Debian or  End Of Life  (EOL) Cycle as called in RPM based distros the period until the OS major release your software distro has official security patches is reached.

Anyone could take any approach but for my own managed systems small network at home my practice was always to try to keep up2date everything every 3 or 6 months maximum. This has caused me multiple days of irritation and stress and perhaps many white hairs and spend nerves on shit.


4. Based on the company where I'm employed the better strategy is to patch to the EOL is still offered and keep the rule First Things First (FTF), once the EOL is reached, just make a copy of all servers data and configuration to external Data storage, bring up a new Physical or VM and migrate the services.
Test after the migration all works as expected if all is as it should be change the DNS records or Leading Infrastructure Proxies whatever to point to the new service and that's it! Yes it is true that migration based on a full OS reinstall is more time consuming and requires much more planning, but usually the result is much more expected, plus it is much less stressful for the guy doing the job.