Posts Tagged ‘systemctl’

List and fix failed systemd failed services after Linux OS upgrade and how to get full info about systemd service from jorunal log

Friday, February 25th, 2022

systemd-logo-unix-linux-list-failed-systemd-services

I have recently upgraded a number of machines from Debian 10 Buster to Debian 11 Bullseye. The update as always has some issues on some machines, such as problem with package dependencies, changing a number of external package repositories etc. to match che Bullseye deb packages. On some machines the update was less painful on others but the overall line was that most of the machines after the update ended up with one or more failed systemd services. It could be that some of the machines has already had this failed services present and I never checked them from the previous time update from Debian 9 -> Debian 10 or just some mess I've left behind in the hurry when doing software installation in the past. This doesn't matter anyways the fact was that I had to deal to a number of systemctl services which I managed to track by the Failed service mesage on system boot on one of the physical machines and on the OpenXen VTY Console the rest of Virtual Machines after update had some Failed messages. Thus I've spend some good amount of time like an overall of a day or two fixing strange failed services. This is how this small article was born in attempt to help sysadmins or any home Linux desktop users, who has updated his Debian Linux / Ubuntu or any other deb based distribution but due to the chaotic nature of Linux has ended with same strange Failed services and look for a way to find the source of the failures and get rid of the problems. 
Systemd is a very complicated system and in my many sysadmin opinion it makes more problems than it solves, but okay for today's people's megalomania mindset it matches well.

Systemd_components-systemd-journalctl-cgroups-loginctl-nspawn-analyze.svg

 

1. Check the journal for errors, running service irregularities and so on
 

First thing to do to track for errors, right after the update is to take some minutes and closely check,, the journalctl for any strange errors, even on well maintained Unix machines, this journal log would bring you to a problem that is not fatal but still some process or stuff is malfunctioning in the background that you would like to solve:
 

root@pcfreak:~# journalctl -x
Jan 10 10:10:01 pcfreak CRON[17887]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17887]: USER_END pid=17887 uid=0 auid=0 ses=340858 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17888]: CRED_DISP pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17888]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17888]: USER_END pid=17888 uid=0 auid=0 ses=340860 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17884]: CRED_DISP pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="root" exe="/usr/sbin/cron" >
Jan 10 10:10:01 pcfreak CRON[17884]: pam_unix(cron:session): session closed for user root
Jan 10 10:10:01 pcfreak audit[17884]: USER_END pid=17884 uid=0 auid=0 ses=340855 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permit>
Jan 10 10:10:01 pcfreak audit[17886]: CRED_DISP pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:setcred grantors=pam_permit acct="www-data" exe="/usr/sbin/c>
Jan 10 10:10:01 pcfreak CRON[17886]: pam_unix(cron:session): session closed for user www-data
Jan 10 10:10:01 pcfreak audit[17886]: USER_END pid=17886 uid=0 auid=33 ses=340859 subj==unconfined msg='op=PAM:session_close grantors=pam_loginuid,pam_env,pam_env,pam_permi>
Jan 10 10:10:08 pcfreak NetworkManager[696]:  [1641802208.0899] device (eth1): carrier: link connected
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:08 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:19 pcfreak NetworkManager[696]:
 [1641802219.7920] device (eth1): carrier: link connected
Jan 10 10:10:19 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:20 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:22 pcfreak NetworkManager[696]:
 [1641802222.2772] device (eth1): carrier: link connected
Jan 10 10:10:22 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx
Jan 10 10:10:23 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Down
Jan 10 10:10:33 pcfreak sshd[18142]: Unable to negotiate with 66.212.17.162 port 19255: no matching key exchange method found. Their offer: diffie-hellman-group14-sha1,diff>
Jan 10 10:10:41 pcfreak NetworkManager[696]:
 [1641802241.0186] device (eth1): carrier: link connected
Jan 10 10:10:41 pcfreak kernel: r8169 0000:03:00.0 eth1: Link is Up – 100Mbps/Full – flow control rx/tx

If you want to only check latest journal log messages use the -x -e (pager catalog) opts

root@pcfreak;~# journalctl -xe

Feb 25 13:08:29 pcfreak audit[2284920]: USER_LOGIN pid=2284920 uid=0 auid=4294967295 ses=4294967295 subj==unconfined msg='op=login acct=28696E76616C>
Feb 25 13:08:29 pcfreak sshd[2284920]: Received disconnect from 177.87.57.145 port 40927:11: Bye Bye [preauth]
Feb 25 13:08:29 pcfreak sshd[2284920]: Disconnected from invalid user ubuntuuser 177.87.57.145 port 40927 [preauth]

Next thing to after the update was to get a list of failed service only.


2. List all systemd failed check services which was supposed to be running

root@pcfreak:/root # systemctl list-units | grep -i failed
● certbot.service                                                                                                       loaded failed failed    Certbot
● logrotate.service                                                                                                     loaded failed failed    Rotate log files
● maldet.service                                                                                                        loaded failed failed    LSB: Start/stop maldet in monitor mode
● named.service                                                                                                         loaded failed failed    BIND Domain Name Server


Alternative way is with the –failed option

hipo@jeremiah:~$ systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon

LOAD   = Reflects whether the unit definition was properly loaded.
ACTIVE = The high-level unit activation state, i.e. generalization of SUB.
SUB    = The low-level unit activation state, values depend on unit type.
7 loaded units listed.

 

root@jeremiah:/etc/apt/sources.list.d#  systemctl list-units –failed
  UNIT                        LOAD   ACTIVE SUB    DESCRIPTION
● haproxy.service             loaded failed failed HAProxy Load Balancer
● libvirt-guests.service      loaded failed failed Suspend/Resume Running libvirt Guests
● libvirtd.service            loaded failed failed Virtualization daemon
● nvidia-persistenced.service loaded failed failed NVIDIA Persistence Daemon
● sqwebmail.service           masked failed failed sqwebmail.service
● tpm2-abrmd.service          loaded failed failed TPM2 Access Broker and Resource Management Daemon
● wd_keepalive.service        loaded failed failed LSB: Start watchdog keepalive daemon


To get a full list of objects of systemctl you can pass as state:
 

# systemctl –state=help
Full list of possible load states to pass is here
Show service properties


Check whether a service is failed or has other status and check default set systemd variables for it.

root@jeremiah~:# systemctl is-failed vboxweb.service
inactive

# systemctl show haproxy
Type=notify
Restart=always
NotifyAccess=main
RestartUSec=100ms
TimeoutStartUSec=1min 30s
TimeoutStopUSec=1min 30s
TimeoutAbortUSec=1min 30s
TimeoutStartFailureMode=terminate
TimeoutStopFailureMode=terminate
RuntimeMaxUSec=infinity
WatchdogUSec=0
WatchdogTimestampMonotonic=0
RootDirectoryStartOnly=no
RemainAfterExit=no
GuessMainPID=yes
SuccessExitStatus=143
MainPID=304858
ControlPID=0
FileDescriptorStoreMax=0
NFileDescriptorStore=0
StatusErrno=0
Result=success
ReloadResult=success
CleanResult=success

Full output of the above command is dumped in show_systemctl_properties.txt


3. List all running systemd services for a better overview on what's going on on machine
 

To get a list of all properly systemd loaded services you can use –state running.

hipo@jeremiah:~$ systemctl list-units –state running|head -n 10
  UNIT                              LOAD   ACTIVE SUB     DESCRIPTION
  proc-sys-fs-binfmt_misc.automount loaded active running Arbitrary Executable File Formats File System Automount Point
  cups.path                         loaded active running CUPS Scheduler
  init.scope                        loaded active running System and Service Manager
  session-2.scope                   loaded active running Session 2 of user hipo
  accounts-daemon.service           loaded active running Accounts Service
  anydesk.service                   loaded active running AnyDesk
  apache-htcacheclean.service       loaded active running Disk Cache Cleaning Daemon for Apache HTTP Server
  apache2.service                   loaded active running The Apache HTTP Server
  avahi-daemon.service              loaded active running Avahi mDNS/DNS-SD Stack

 

It is useful thing is to list all unit-files configured in systemd and their state, you can do it with:

 


root@pcfreak:~# systemctl list-unit-files
UNIT FILE                                                                 STATE           VENDOR PRESET
proc-sys-fs-binfmt_misc.automount                                         static          –            
-.mount                                                                   generated       –            
backups.mount                                                             generated       –            
dev-hugepages.mount                                                       static          –            
dev-mqueue.mount                                                          static          –            
media-cdrom0.mount                                                        generated       –            
mnt-sda1.mount                                                            generated       –            
proc-fs-nfsd.mount                                                        static          –            
proc-sys-fs-binfmt_misc.mount                                             disabled        disabled     
run-rpc_pipefs.mount                                                      static          –            
sys-fs-fuse-connections.mount                                             static          –            
sys-kernel-config.mount                                                   static          –            
sys-kernel-debug.mount                                                    static          –            
sys-kernel-tracing.mount                                                  static          –            
var-www.mount                                                             generated       –            
acpid.path                                                                masked          enabled      
cups.path                                                                 enabled         enabled      

 

 


root@pcfreak:~# systemctl list-units –type service –all
  UNIT                                   LOAD      ACTIVE   SUB     DESCRIPTION
  accounts-daemon.service                loaded    inactive dead    Accounts Service
  acct.service                           loaded    active   exited  Kernel process accounting
● alsa-restore.service                   not-found inactive dead    alsa-restore.service
● alsa-state.service                     not-found inactive dead    alsa-state.service
  apache2.service                        loaded    active   running The Apache HTTP Server
● apparmor.service                       not-found inactive dead    apparmor.service
  apt-daily-upgrade.service              loaded    inactive dead    Daily apt upgrade and clean activities
 apt-daily.service                      loaded    inactive dead    Daily apt download activities
  atd.service                            loaded    active   running Deferred execution scheduler
  auditd.service                         loaded    active   running Security Auditing Service
  auth-rpcgss-module.service             loaded    inactive dead    Kernel Module supporting RPCSEC_GSS
  avahi-daemon.service                   loaded    active   running Avahi mDNS/DNS-SD Stack
  certbot.service                        loaded    inactive dead    Certbot
  clamav-daemon.service                  loaded    active   running Clam AntiVirus userspace daemon
  clamav-freshclam.service               loaded    active   running ClamAV virus database updater
..

 


linux-systemd-components-diagram-linux-kernel-system-targets-systemd-libraries-daemons

 

4. Finding out more on why a systemd configured service has failed


Usually getting info about failed systemd service is done with systemctl status servicename.service
However, in case of troubles with service unable to start to get more info about why a service has failed with (-l) or (–full) options


root@pcfreak:~# systemctl -l status logrotate.service
● logrotate.service – Rotate log files
     Loaded: loaded (/lib/systemd/system/logrotate.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 00:00:06 EET; 13h ago
TriggeredBy: ● logrotate.timer
       Docs: man:logrotate(8)
             man:logrotate.conf(5)
    Process: 2045320 ExecStart=/usr/sbin/logrotate /etc/logrotate.conf (code=exited, status=1/FAILURE)
   Main PID: 2045320 (code=exited, status=1/FAILURE)
        CPU: 2.479s

Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: For now we will assume you meant to write /32
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| ERROR: '0.0.0.0/0.0.0.0' needs to be replaced by the term 'all'.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| SECURITY NOTICE: Overriding config setting. Using 'all' instead.
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: (B) '::/0' is a subnetwork of (A) '::/0'
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: because of this '::/0' is ignored to keep splay tree searching predictable
Feb 25 00:00:06 pcfreak logrotate[2045577]: 2022/02/25 00:00:06| WARNING: You should probably remove '::/0' from the ACL named 'all'
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Failed with result 'exit-code'.
Feb 25 00:00:06 pcfreak systemd[1]: Failed to start Rotate log files.
Feb 25 00:00:06 pcfreak systemd[1]: logrotate.service: Consumed 2.479s CPU time.


systemctl -l however is providing only the last log from message a started / stopped or whatever status service has generated. Sometimes systemctl -l servicename.service is showing incomplete the splitted error message as there is a limitation of line numbers on the console, see below

 

root@pcfreak:~# systemctl status -l certbot.service
● certbot.service – Certbot
     Loaded: loaded (/lib/systemd/system/certbot.service; static)
     Active: failed (Result: exit-code) since Fri 2022-02-25 09:28:33 EET; 4h 0min ago
TriggeredBy: ● certbot.timer
       Docs: file:///usr/share/doc/python-certbot-doc/html/index.html
             https://certbot.eff.org/docs
    Process: 290017 ExecStart=/usr/bin/certbot -q renew (code=exited, status=1/FAILURE)
   Main PID: 290017 (code=exited, status=1/FAILURE)
        CPU: 9.771s

Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using th>
Feb 25 09:28:33 pcfrxen certbot[290017]: All renewals failed. The following certificates could not be renewed:
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/mail.pcfreak.org-0003/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/www.eforia.bg-0005/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]:   /etc/letsencrypt/live/zabbix.pc-freak.net/fullchain.pem (failure)
Feb 25 09:28:33 pcfrxen certbot[290017]: 3 renew failure(s), 5 parse failure(s)
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Main process exited, code=exited, status=1/FAILURE
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Failed with result 'exit-code'.
Feb 25 09:28:33 pcfrxen systemd[1]: Failed to start Certbot.
Feb 25 09:28:33 pcfrxen systemd[1]: certbot.service: Consumed 9.771s CPU time.

 

5. Get a complete log of journal to make sure everything configured on server host runs as it should

Thus to get more complete list of the message and be able to later google and look if has come with a solution on the internet  use:

root@pcfrxen:~#  journalctl –catalog –unit=certbot

— Journal begins at Sat 2022-01-22 21:14:05 EET, ends at Fri 2022-02-25 13:32:01 EET. —
Jan 23 09:58:18 pcfrxen systemd[1]: Starting Certbot…
░░ Subject: A start job for unit certbot.service has begun execution
░░ Defined-By: systemd
░░ Support: https://www.debian.org/support
░░ 
░░ A start job for unit certbot.service has begun execution.
░░ 
░░ The job identifier is 5754.
Jan 23 09:58:20 pcfrxen certbot[124996]: Traceback (most recent call last):
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/renewal.py", line 71, in _reconstitute
Jan 23 09:58:20 pcfrxen certbot[124996]:     renewal_candidate = storage.RenewableCert(full_path, config)
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 471, in __init__
Jan 23 09:58:20 pcfrxen certbot[124996]:     self._check_symlinks()
Jan 23 09:58:20 pcfrxen certbot[124996]:   File "/usr/lib/python3/dist-packages/certbot/_internal/storage.py", line 537, in _check_symlinks

root@server:~# journalctl –catalog –unit=certbot|grep -i pluginerror|tail -1
Feb 25 09:28:33 pcfrxen certbot[290017]: The error was: PluginError('An authentication script must be provided with –manual-auth-hook when using the manual plugin non-interactively.')


Or if you want to list and read only the last messages in the journal log regarding a service

root@server:~# journalctl –catalog –pager-end –unit=certbot


If you have disabled a failed service because you don't need it to run at all on the machine with:

root@rhel:~# systemctl stop rngd.service
root@rhel:~# systemctl disable rngd.service

And you want to clear up any failed service information that is kept in the systemctl service log you can do it with:
 

root@rhel:~# systemctl reset-failed

Another useful systemctl option is cat, you can use it to easily list a service it is useful to quickly check what is a service, an actual shortcut to save you from giving a full path to the service e.g. cat /lib/systemd/system/certbot.service

root@server:~# systemctl cat certbot
# /lib/systemd/system/certbot.service
[Unit]
Description=Certbot
Documentation=file:///usr/share/doc/python-certbot-doc/html/index.html
Documentation=https://certbot.eff.org/docs
[Service]
Type=oneshot
ExecStart=/usr/bin/certbot -q renew
PrivateTmp=true


After failed SystemD services are fixed, it is best to reboot the machine and check put some more time to inspect rawly the complete journal log to make sure, no error  was left behind.


Closure
 

As you can see updating a machine from a major to a major version even if you follow the official documentation and you have plenty of experience is always more or a less a pain in the ass, which can eat up much of your time banging your head solving problems with failed daemons issues with /etc/rc.local (which I have faced becase of #/bin/sh -e (which would make /etc/rc.local) to immediately quit if any error from command $? returns different from 0 etc.. The  logical questions comes then;
1. Is it really worthy to update at all regularly, especially if you don't know of a famous major Vulnerability 🙂 ?
2. Or is it worthy to update from OS major release to OS major release at all?  
3. Or should you only try to patch the service that is exposed to an external reachable computer network or the internet only and still the the same OS release until End of Life (LTS = Long Term Support) as called in Debian or  End Of Life  (EOL) Cycle as called in RPM based distros the period until the OS major release your software distro has official security patches is reached.

Anyone could take any approach but for my own managed systems small network at home my practice was always to try to keep up2date everything every 3 or 6 months maximum. This has caused me multiple days of irritation and stress and perhaps many white hairs and spend nerves on shit.


4. Based on the company where I'm employed the better strategy is to patch to the EOL is still offered and keep the rule First Things First (FTF), once the EOL is reached, just make a copy of all servers data and configuration to external Data storage, bring up a new Physical or VM and migrate the services.
Test after the migration all works as expected if all is as it should be change the DNS records or Leading Infrastructure Proxies whatever to point to the new service and that's it! Yes it is true that migration based on a full OS reinstall is more time consuming and requires much more planning, but usually the result is much more expected, plus it is much less stressful for the guy doing the job.

Linux script to periodically log enabled systemctl services, configured network IPs and routings, server established connections and iptables firewall rules

Tuesday, January 25th, 2022

bash-script-command-line-script-logo

For those who are running some kind of server be it virtual or physical, where multiple people or many systemins have access, sometimes it could be quite a mess as someone due to miscommunication or whatever could change something on the configured Network Ethernet interfaces, or configured routing tables, or simply issue an update which might change the set of automatically set to run systemctl services due to update. Such changes on a Linux server Operating system often can remain unnoticed and could cause quite a harm. Even when the change is noticed the logical question occurs what was the previous network route on the server or what kind of network was configured on Ethernet interface ethX etc. 
Problems like the described where, pretty common in many public Private Clouds or VMWare / XEN based Hypervisors that host multiple  Virtual machines, for that reason I've developed a small script which is pretty dumb on the first glimpse but mostly useful as it keeps historical records of such important information.
 

#!/bin/sh
# script to show configured services on system, configured IPs, netstat state and network routes
# Script to be used during CentOS and Redhat Enterprise Linux RPM package updates with yum

output_file=network_ip_routes_services_status;
ddate=$(date '+%Y-%m-%d_%H-%M-%S');
iptables=$(which iptables);
if [ ! -d /root/logs/ ]; then
mkdir /root/logs/;
fi

echo "STARTED: $(date '+%Y-%m-%d_%H-%M-%S'):" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e "# systemctl list-unit-files\n" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
systemctl list-unit-files –type=service | grep enabled | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e '# systemctl | grep ".service" | grep "running"\n' | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
systemctl | grep ".service" | grep "running" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e "# netstat -tulpn\n" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
netstat -tulpn | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e "# netstat -r\n" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
netstat -r | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e "# ip a s\n" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
ip a s | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e "# /sbin/route -n\n" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
/sbin/route -n | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e "# $iptables -L -n\n" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo -e "# $iptables -t nat -L" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
$iptables -L -n | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
$iptables -t nat -L | tee -a /root/logs/$output_file-$(hostname)-$ddate.log
echo "ENDED $(date '+%Y-%m-%d_%H-%M-%S'):" | tee -a /root/logs/$output_file-$(hostname)-$ddate.log

 

Script produces its logs inside  /root/logs/network_ip_routes_services_status*hostname*currentdate*.log, put the script inside /root/ or wherever you like.

To keep an eye how network routing or ip configuration or firewall changed or there was a peak with the established connections towards daemons running on host (lets say requiring a machine upgrade), I've set the script to run as usually via cron job at the end of the predefined cron job tasks, like so:

# crontab -u root -e
# periodic dump and log network routing tables, netstat and systemctl list-unit-files
*/1 01 01,25 * * /root/show_running_services_netstat_ips_route1.sh 2>&1 >/dev/null

You can download a copy of show_running_services_netstat_ips_route1.sh script here.
The script is written without much of efficiency on mind, as you can see the with the multiple tee -a and for critical hosts it might be a good idea to rewrite it to use '>>' OPERAND instead, anyhows as most machines today are pretty powerful it doesn't really matter much.

Of course today such a script is quite archaic, as most big corporations are using much more complex monitoring software such as Zabbix, Prometheus or if some kind of Elastic Search is used Kibana etc. but for a basic needs and even for a double checking and comparing with other more advanced monitoring tools  (in case if monitoring tools  database gets damaged or temporary down until backupped), still I think such an oldschool simple monitoring script can be of use.

A good addition to that if you use a central logging server is to set another cron to periodically synchronize produced /root/logs/* to somewhere, here is how to do it with simple rsync (considering your host is configured to login with a user without password with ssh key authentication).

# HOSTNAME=$(hostname); rsync -axHv –ignore-existing -e 'ssh -p 22' /bashscripts/  -q -i –out-format="%t %f %b" –log-file=/var/log/rsync_sync_jobs.log –info=progress2 root@BACKUP_SERVER_HOST:/$(HOSTNAME)-logs/

Once something strange occurs with the machine, like the machine needs to be rebuild

I would be glad to hear if some of my readers uses some useful script which I can adopt myself. Cheers  🙂

How to mask rpcbind on CentOS to prevent rpcbind service from auto start new local server port listener triggered by Security audit port scanner software

Wednesday, December 1st, 2021

how to mute rpcbind on CentOS to prevent rpcbind service from auto start new local server port rpc-remote-procedure-call-picture

 

Introduction to  THE PROBLEM :
rpcbind TCP/UDP port 111 automatically starting itself out of nothing on CentOS 7 Linux

For server environments that are being monitored regularly for CVI security breaches based on opened TCP / UDP ports with like Qualys (a proprietary business software that helps automate the full spectrum of auditing, compliance and protection of your IT systems and web applications.), perhaps the closest ex-open source equivallent was Nessus Security Scanner or the more modern security audit Linux tools – Intruder (An Effortless Vulnerability Scanner), OpenVAS (Open Vulnerability Assessment Scanner) or even a simple nmap command port scan on TCP IP / UDP protocol for SunRPC default predefined machine port 111.

 

[root@centos~]# cat /etc/redhat-release
CentOS Linux release 7.9.2009 (Core)

 

[root@centos~]# grep -i rpcbind /etc/services
sunrpc          111/tcp         portmapper rpcbind      # RPC 4.0 portmapper TCP
sunrpc          111/udp         portmapper rpcbind      # RPC 4.0 portmapper UDP


Note! For those who don't know it or newer to Linux 
/etc/services file
used to be a file with predefiend well known services and their ports in Linux as well as other UNIXes for years now.

So once this scan is triggered you might end up in a very strange situation that the amount of processes on the CentOS Linux server misterously change with +1 as even though disabled systemctl rpcbind.service process will appear running again.
 

[root@centos~]# ps -ef|grep -i rpcbind
rpc        100     1  0 Nov11 ?        00:00:02 /sbin/rpcbind -w
root     29099 22060  0 13:07 pts/0    00:00:00 grep –color=auto -i rpcbind
[root@centos ~]#

By the wayit took us a while to me and my colleagues to identify what was the mysterious reason for triggering rpcbind process on a  gets triggered and rpcbind process appears in process list even though the machine is in a very secured DMZ Lan and there is no cron jobs or any software that does any kind of scheduling that might lead rpcbind to start up like it does.

[root@centos ~]# systemctl list-unit-files|grep -i rpcbind
rpcbind.service                               disabled
rpcbind.socket                                disabled
rpcbind.target                                static


There is absoultely no logic in that a service whose stopped on TCP / UDP 111 on a machine that is lacking no firewall rules such as iptables CHAINs or whatever.

[root@centos~]# systemctl status rpcbind
● rpcbind.service – RPC bind service
   Loaded: loaded (/usr/lib/systemd/system/rpcbind.service; disabled; vendor preset: enabled)
   Active: inactive (dead)


A you can see the service after all seems to have been disabled originally but after some time this output auto-magically was turning to rpcbind.socket enabled:

root@centos ~]# systemctl list-unit-files|grep -i rpcbind
rpcbind.service                               disabled
rpcbind.socket                                enabled
rpcbind.target                                static

Hence to prevent the rpcbind.socket to automatically respawn itself and lead to resurrection of the dead and disabled /sbin/rpcbind


1. Disable listener in  /usr/lib/systemd/system/rpcbind.socket file


And comment all Listen* rows there

[root@centos ~]# vi /usr/lib/systemd/system/rpcbind.socket

[Unit]

Description=RPCbind Server Activation Socket

 

[Socket]

ListenStream=/var/run/rpcbind.sock

 

# RPC netconfig can't handle ipv6/ipv4 dual sockets

BindIPv6Only=ipv6-only

#ListenStream=0.0.0.0:111

#ListenDatagram=0.0.0.0:111

#ListenStream=[::]:111

#ListenDatagram=[::]:111

 

[Install]

WantedBy=sockets.target

2. Mask rpcbind.socket and, sure /etc/systemd/system/rpcbind.socket links to /dev/null

Mute completely rpcbind.socket (this is systemd option "feature" to link service to /dev/null)

[root@centos ~]# systemctl mask rpcbind.socket

 

Hence, the link from /etc/systemd/system/rpcbind.socket must be linked to /dev/null

[root@centos ~]# ls -l /etc/systemd/system/rpcbind.socket
lrwxrwxrwx 1 root root 9 Jan 27  2020 /etc/systemd/system/rpcbind.socket -> /dev/null


Voila ! That should be it rpcbind should not hang around anymore among other processes.

Install and enable Sysstats IO / DIsk / CPU / Network monitoring console suite on Redhat 8.3, Few sar useful command examples

Tuesday, September 28th, 2021

linux-sysstat-monitoring-logo

 

Why to monitoring CPU, Memory, Hard Disk, Network usage etc. with sysstats tools?
 

Using system monitoring tools such as Zabbix, Nagios Monit is a good approach, however sometimes due to zabbix server interruptions you might not be able to track certain aspects of system performance on time. Thus it is always a good idea to 
Gain more insights on system peroformance from command line. Of course there is cmd tools such as iostat and top, free, vnstat that provides plenty of useful info on system performance issues or bottlenecks. However from my experience to have a better historical data that is systimized and all the time accessible from console it is a great thing to have sysstat package at place. Since many years mostly on every server I administer, I've been using sysstats to monitor what is going on servers over a short time frames and I'm quite happy with it. In current company we're using Redhats and CentOS-es and I had to install sysstats on Redhat 8.3. I've earlier done it multiple times on Debian / Ubuntu Linux and while I've faced on some .deb distributions complications of making sysstat collect statistics I've come with an article on Howto fix sysstat Cannot open /var/log/sysstat/sa no such file or directory” on Debian / Ubuntu Linux
 

Sysstat contains the following tools related to collecting I/O and CPU statistics:
iostat
Displays an overview of CPU utilization, along with I/O statistics for one or more disk drives.
mpstat
Displays more in-depth CPU statistics.
Sysstat also contains tools that collect system resource utilization data and create daily reports based on that data. These tools are:
sadc
Known as the system activity data collector, sadc collects system resource utilization information and writes it to a file.
sar
Producing reports from the files created by sadc, sar reports can be generated interactively or written to a file for more intensive analysis.

My experience with CentOS 7 and Fedora to install sysstat it was pretty straight forward, I just had to install it via yum install sysstat wait for some time and use sar (System Activity Reporter) tool to report collected system activity info stats over time.
Unfortunately it seems on RedHat 8.3 as well as on CentOS 8.XX instaling sysstats does not work out of the box.

To complete a successful installation of it on RHEL 8.3, I had to:

[root@server ~]# yum install -y sysstat


To make sysstat enabled on the system and make it run, I've enabled it in sysstat

[root@server ~]# systemctl enable sysstat


Running immediately sar command, I've faced the shitty error:


Cannot open /var/log/sysstat/sa18:
No such file or directory. Please check if data collecting is enabled”

 

Once installed I've waited for about 5 minutes hoping, that somehow automatically sysstat would manage it but it didn't.

To solve it, I've had to create additionally file /etc/cron.d/sysstat (weirdly RPM's post install instructions does not tell it to automatically create it)

[root@server ~]# vim /etc/cron.d/sysstat

# run system activity accounting tool every 10 minutes
0 * * * * root /usr/lib64/sa/sa1 60 59 &
# generate a daily summary of process accounting at 23:53
53 23 * * * root /usr/lib64/sa/sa2 -A &

 

  • /usr/local/lib/sa1 is a shell script that we can use for scheduling cron which will create daily binary log file.
  • /usr/local/lib/sa2 is a shell script will change binary log file to human-readable form.

 

[root@server ~]# chmod 600 /etc/cron.d/sysstat

[root@server ~]# systemctl restart sysstat


In a while if sysstat is working correctly you should get produced its data history logs inside /var/log/sa

[root@server ~]# ls -al /var/log/sa 


Note that the standard sysstat history files on Debian and other modern .deb based distros such as Debian 10 (in  y.2021) is stored under /var/log/sysstat

Here is few useful uses of sysstat cmds


1. Check with sysstat machine history SWAP and RAM Memory use


To lets say check last 10 minutes SWAP memory use:

[hipo@server yum.repos.d] $ sar -W  |last -n 10
 

Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

12:00:00 AM  pswpin/s pswpout/s
12:00:01 AM      0.00      0.00
12:01:01 AM      0.00      0.00
12:02:01 AM      0.00      0.00
12:03:01 AM      0.00      0.00
12:04:01 AM      0.00      0.00
12:05:01 AM      0.00      0.00
12:06:01 AM      0.00      0.00

[root@ccnrlb01 ~]# sar -r | tail -n 10
14:00:01        93008   1788832     95.06         0   1357700    725740      9.02    795168    683484        32
14:10:01        78756   1803084     95.81         0   1358780    725740      9.02    827660    652248        16
14:20:01        92844   1788996     95.07         0   1344332    725740      9.02    813912    651620        28
14:30:01        92408   1789432     95.09         0   1344612    725740      9.02    816392    649544        24
14:40:01        91740   1790100     95.12         0   1344876    725740      9.02    816948    649436        36
14:50:01        91688   1790152     95.13         0   1345144    725740      9.02    817136    649448        36
15:00:02        91544   1790296     95.14         0   1345448    725740      9.02    817472    649448        36
15:10:01        91108   1790732     95.16         0   1345724    725740      9.02    817732    649340        36
15:20:01        90844   1790996     95.17         0   1346000    725740      9.02    818016    649332        28
Average:        93473   1788367     95.03         0   1369583    725074      9.02    800965    671266        29

 

2. Check system load? Are my processes waiting too long to run on the CPU?

[root@server ~ ]# sar -q |head -n 10
Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

12:00:00 AM   runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
12:00:01 AM         0       272      0.00      0.02      0.00         0
12:01:01 AM         1       271      0.00      0.02      0.00         0
12:02:01 AM         0       268      0.00      0.01      0.00         0
12:03:01 AM         0       268      0.00      0.00      0.00         0
12:04:01 AM         1       271      0.00      0.00      0.00         0
12:05:01 AM         1       271      0.00      0.00      0.00         0
12:06:01 AM         1       265      0.00      0.00      0.00         0


3. Show various CPU statistics per CPU use
 

On a multiprocessor, multi core server sometimes for scripting it is useful to fetch processor per use historic data, 
this can be attained with:

 

[hipo@server ~ ] $ mpstat -P ALL
Linux 4.18.0-240.el8.x86_64 (server)       09/28/2021      _x86_64_        (8 CPU)

06:08:38 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest  %gnice   %idle
06:08:38 PM  all    0.17    0.02    0.25    0.00    0.05    0.02    0.00    0.00    0.00   99.49
06:08:38 PM    0    0.22    0.02    0.28    0.00    0.06    0.03    0.00    0.00    0.00   99.39
06:08:38 PM    1    0.28    0.02    0.36    0.00    0.08    0.02    0.00    0.00    0.00   99.23
06:08:38 PM    2    0.27    0.02    0.31    0.00    0.06    0.01    0.00    0.00    0.00   99.33
06:08:38 PM    3    0.15    0.02    0.22    0.00    0.03    0.01    0.00    0.00    0.00   99.57
06:08:38 PM    4    0.13    0.02    0.20    0.01    0.03    0.01    0.00    0.00    0.00   99.60
06:08:38 PM    5    0.14    0.02    0.27    0.00    0.04    0.06    0.01    0.00    0.00   99.47
06:08:38 PM    6    0.10    0.02    0.17    0.00    0.04    0.02    0.00    0.00    0.00   99.65
06:08:38 PM    7    0.09    0.02    0.15    0.00    0.02    0.01    0.00    0.00    0.00   99.70


 

sar-sysstat-cpu-statistics-screenshot

Monitor processes and threads currently being managed by the Linux kernel.

[hipo@server ~ ] $ pidstat

pidstat-various-random-process-statistics

[hipo@server ~ ] $ pidstat -d 2


pidstat-show-processes-with-most-io-activities-linux-screenshot

This report tells us that there is few processes with heave I/O use Filesystem system journalling daemon jbd2, apache, mysqld and supervise, in 3rd column you see their respective PID IDs.

To show threads used inside a process (like if you press SHIFT + H) inside Linux top command:

[hipo@server ~ ] $ pidstat -t -p 10765 1 3

Linux 4.19.0-14-amd64 (server)     28.09.2021     _x86_64_    (10 CPU)

21:41:22      UID      TGID       TID    %usr %system  %guest   %wait    %CPU   CPU  Command
21:41:23      108     10765         –    1,98    0,99    0,00    0,00    2,97     1  mysqld
21:41:23      108         –     10765    0,00    0,00    0,00    0,00    0,00     1  |__mysqld
21:41:23      108         –     10768    0,00    0,00    0,00    0,00    0,00     0  |__mysqld
21:41:23      108         –     10771    0,00    0,00    0,00    0,00    0,00     5  |__mysqld
21:41:23      108         –     10784    0,00    0,00    0,00    0,00    0,00     7  |__mysqld
21:41:23      108         –     10785    0,00    0,00    0,00    0,00    0,00     6  |__mysqld
21:41:23      108         –     10786    0,00    0,00    0,00    0,00    0,00     2  |__mysqld

10765 – is the Process ID whose threads you would like to list

With pidstat, you can further monitor processes for memory leaks with:

[hipo@server ~ ] $ pidstat -r 2

 

4. Report paging statistics for some old period

 

[root@server ~ ]# sar -B -f /var/log/sa/sa27 |head -n 10
Linux 4.18.0-240.el8.x86_64 (server)       09/27/2021      _x86_64_        (8 CPU)

15:42:26     LINUX RESTART      (8 CPU)

15:55:30     LINUX RESTART      (8 CPU)

04:00:01 PM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
04:01:01 PM      0.00     14.47    629.17      0.00    502.53      0.00      0.00      0.00      0.00
04:02:01 PM      0.00     13.07    553.75      0.00    419.98      0.00      0.00      0.00      0.00
04:03:01 PM      0.00     11.67    548.13      0.00    411.80      0.00      0.00      0.00      0.00

 

5.  Monitor Received RX and Transmitted TX network traffic perl Network interface real time
 

To print out Received and Send traffic per network interface 4 times in a raw

sar-sysstats-network-traffic-statistics-screenshot
 

[hipo@server ~ ] $ sar -n DEV 1 4


To continusly monitor all network interfaces I/O traffic

[hipo@server ~ ] $ sar -n DEV 1


To only monitor a certain network interface lets say loopback interface (127.0.0.1) received / transmitted bytes

[hipo@server yum.repos.d] $  sar -n DEV 1 2|grep -i lo
06:29:53 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
06:29:54 PM        lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00
Average:           lo      0.00      0.00      0.00      0.00      0.00      0.00      0.00      0.00


6. Monitor block devices use
 

To check block devices use 3 times in a raw
 

[hipo@server yum.repos.d] $ sar -d 1 3


sar-sysstats-blockdevice-statistics-screenshot
 

7. Output server monitoring data in CSV database structured format


For preparing a nice graphs with Excel from CSV strucuted file format, you can dump the collected data as so:

 [root@server yum.repos.d]# sadf -d /var/log/sa/sa27 — -n DEV | grep -v lo|head -n 10
server-name-fqdn;-1;2021-09-27 13:42:26 UTC;LINUX-RESTART    (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;-1;2021-09-27 13:55:30 UTC;LINUX-RESTART    (8 CPU)
# hostname;interval;timestamp;IFACE;rxpck/s;txpck/s;rxkB/s;txkB/s;rxcmp/s;txcmp/s;rxmcst/s;%ifutil
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth1;19.42;16.12;1.94;1.68;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth0;7.18;9.65;0.55;0.78;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:01:01 UTC;eth2;5.65;5.13;0.42;0.39;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth1;18.90;15.55;1.89;1.60;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth0;7.15;9.63;0.55;0.74;0.00;0.00;0.00;0.00
server-name-fqdn;60;2021-09-27 14:02:01 UTC;eth2;5.67;5.15;0.42;0.39;0.00;0.00;0.00;0.00

To graph the output data you can use Excel / LibreOffice's Excel equivalent Calc or if you need to dump a CSV sar output and generate it on the fly from a script  use gnuplot 


What we've learned?


How to install and enable on cron sysstats on Redhat and CentOS 8 Linux ? 
How to continuously monitor CPU / Disk and Network, block devices, paging use and processes and threads used by the kernel per process ?  
As well as how to export previously collected data to CSV to import to database or for later use inrder to generate graphic presentation of data.
Cheers ! 🙂

 

How to configure bond0 bonding and network bridging for KVM Virtual machines on Redhat / CentOS / Fedora Linux

Tuesday, February 16th, 2021

configure-bond0-bonding-channel-with-bridges-on-hypervisor-host-for-guest-KVM-virtual-machines-howto-sample-Hypervisor-Virtual-machines-pic
 1. Intro to Redhat RPM based distro /etc/sysconfig/network-scripts/* config vars shortly explained

On RPM based Linux distributions configuring network has a very specific structure. As a sysadmin just recently I had a task to configure Networking on 2 Machines to be used as Hypervisors so the servers could communicate normally to other Networks via some different intelligent switches that are connected to each of the interfaces of the server. The idea is the 2 redhat 8.3 machines to be used as  Hypervisor (HV) and each of the 2 HVs to each be hosting 2 Virtual guest Machines with preinstalled another set of Redhat 8.3 Ootpa. I've recently blogged on how to automate a bit installing the KVM Virtual machines with using predefined kickstart.cfg file.

The next step after install was setting up the network. Redhat has a very specific network configuration well known under /etc/sysconfig/network-scripts/ifcfg-eno*# or if you have configured the Redhats to fix the changing LAN card naming ens, eno, em1 to legacy eth0, eth1, eth2 on CentOS Linux – e.g. to be named as /etc/sysconfig/network-scripts/{ifcfg-eth0,1,2,3}.

The first step to configure the network from that point is to come up with some network infrastrcture that will be ready on the HV nodes server-node1 server-node2 for the Virtual Machines to be used by server-vm1, server-vm2.

Thus for the sake of myself and some others I decide to give here the most important recognized variables that can be placed inside each of the ifcfg-eth0,ifcfg-eth1,ifcfg-eth2 …

A standard ifcfg-eth0 confing would look something this:
 

[root@redhat1 :~ ]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
TYPE=Ethernet
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV4_FAILURE_FATAL=no
NAME=eth0
UUID=…
ONBOOT=yes
HWADDR=0e:a4:1a:b6:fc:86
IPADDR0=10.31.24.10
PREFIX0=23
GATEWAY0=10.31.24.1
DNS1=192.168.50.3
DNS2=10.215.105.3
DOMAIN=example.com
IPV6_PEERDNS=yes
IPV6_PEERROUTES=yes


Lets say few words to each of the variables to make it more clear to people who never configured Newtork on redhat without the help of some of the console ncurses graphical like tools such as nmtui or want to completely stop the Network-Manager to manage the network and thus cannot take the advantage of using nmcli (a command-line tool for controlling NetworkManager).

Here is a short description of each of above configuration parameters:

TYPE=device_type: The type of network interface device
BOOTPROTO=protocol: Where protocol is one of the following:

  • none: No boot-time protocol is used.
  • bootp: Use BOOTP (bootstrap protocol).
  • dhcp: Use DHCP (Dynamic Host Configuration Protocol).
  • static: if configuring static IP

EFROUTE|IPV6_DEFROUTE=answer

  • yes: This interface is set as the default route for IPv4|IPv6 traffic.
  • no: This interface is not set as the default route.

Usually most people still don't use IPV6 so better to disable that

IPV6INIT=answer: Where answer is one of the following:

  • yes: Enable IPv6 on this interface. If IPV6INIT=yes, the following parameters could also be set in this file:

IPV6ADDR=IPv6 address

IPV6_DEFAULTGW=The default route through the specified gateway

  • no: Disable IPv6 on this interface.

IPV4_FAILURE_FATAL|IPV6_FAILURE_FATAL=answer: Where answer is one of the following:

  • yes: This interface is disabled if IPv4 or IPv6 configuration fails.
  • no: This interface is not disabled if configuration fails.

ONBOOT=answer: Where answer is one of the following:

  • yes: This interface is activated at boot time.
  • no: This interface is not activated at boot time.

HWADDR=MAC-address: The hardware address of the Ethernet device
IPADDRN=address: The IPv4 address assigned to the interface
PREFIXN=N: Length of the IPv4 netmask value
GATEWAYN=address: The IPv4 gateway address assigned to the interface. Because an interface can be associated with several combinations of IP address, network mask prefix length, and gateway address, these are numbered starting from 0.
DNSN=address: The address of the Domain Name Servers (DNS)
DOMAIN=DNS_search_domain: The DNS search domain (this is the search Domain-name.com you usually find in /etc/resolv.conf)

Other interesting file that affects how routing is handled on a Redhat Linux is

/etc/sysconfig/network

[root@redhat1 :~ ]# cat /etc/sysconfig/network
# Created by anaconda
GATEWAY=10.215.105.

Having this gateway defined does add a default gateway

This file specifies global network settings. For example, you can specify the default gateway, if you want to apply some network settings such as routings, Alias IPs etc, that will be valid for all configured and active configuration red by systemctl start network scripts or the (the network-manager if such is used), just place it in that file.

Other files of intesresting to control how resolving is being handled on the server worthy to check are 

/etc/nsswitch.conf

and

/etc/hosts

If you want to set a preference of /etc/hosts being red before /etc/resolv.conf and DNS resolving for example you need to have inside it, below is default behavior of it.
 

root@redhat1 :~ ]#   grep -i hosts /etc/nsswitch.conf
#     hosts: files dns
#     hosts: files dns  # from user file
# Valid databases are: aliases, ethers, group, gshadow, hosts,
hosts:      files dns myhostname

As you can see the default order is to read first files (meaning /etc/hosts) and then the dns (/etc/resolv.conf)
hosts: files dns

Now with this short intro description on basic values accepted by Redhat's /etc/sysconfig/network-scripts/ifcfg* prepared configurations.


I will give a practical example of configuring a bond0 interface with 2 members which were prepared based on Redhat's Official documentation found in above URLs:

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/8/html/configuring_and_managing_networking/configuring-network-bonding_configuring-and-managing-networking
 

# Bonding on RHEL 7 documentation
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-network_bonding_using_the_command_line_interface

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/7/html/networking_guide/sec-verifying_network_configuration_bonding_for_redundancy

https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/s2-networkscripts-interfaces_network-bridge

# Network Bridge with Bond documentation
https://access.redhat.com/documentation/en-us/red_hat_enterprise_linux/6/html/deployment_guide/sec-Configuring_a_VLAN_over_a_Bond

https://docs.fedoraproject.org/en-US/Fedora/24/html/Networking_Guide/sec-Network_Bridge_with_Bond.html


2. Configuring a single bond connection on eth0 / eth2 and setting 3 bridge interfaces bond -> br0, br1 -> eth1, br2 -> eth2

The task on my machines was to set up from 4 lan cards one bonded interface as active-backup type of bond with bonded lines on eth0, eth2 and 3 other 2 eth1, eth2 which will be used for private communication network that is connected via a special dedicated Switches and Separate VLAN 50, 51 over a tagged dedicated gigabit ports.

As said the 2 Servers had each 4 Broadcom Network CARD interfaces each 2 of which are paired (into a single card) and 2 of which are a solid Broadcom NetXtreme Dual Port 10GbE SFP+ and Dell Broadcom 5720 Dual Port 1Gigabit Network​.

2-ports-broadcom-netxtreme-dual-port-10GBe-spf-plus

On each of server-node1 and server-node2 we had 4 Ethernet Adapters properly detected on the Redhat

root@redhat1 :~ ]# lspci |grep -i net
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries NetXtreme BCM5720 2-port Gigabit Ethernet PCIe
19:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)
19:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57412 NetXtreme-E 10Gb RDMA Ethernet Controller (rev 01)


I've already configured as prerogative net.ifnames=0 to /etc/grub2/boot.cfg and Network-Manager service disabled on the host (hence to not use Network Manager you'll see in below configuration NM_CONTROLLED="no" is telling the Redhat servers is not to be trying NetworkManager for more on that check my previous article Disable NetworkManager automatic Ethernet Interface Management on Redhat Linux , CentOS 6 / 7 / 8.

3. Types of Network Bonding

mode=0 (balance-rr)

This mode is based on Round-robin policy and it is the default mode. This mode offers fault tolerance and load balancing features. It transmits the packets in Round robin fashion that is from the first available slave through the last.

mode-1 (active-backup)

This mode is based on Active-backup policy. Only one slave is active in this band, and another one will act only when the other fails. The MAC address of this bond is available only on the network adapter part to avoid confusing the switch. This mode also provides fault tolerance.

mode=2 (balance-xor)

This mode sets an XOR (exclusive or) mode that is the source MAC address is XOR’d with destination MAC address for providing load balancing and fault tolerance. Each destination MAC address the same slave is selected.

mode=3 (broadcast)

This method is based on broadcast policy that is it transmitted everything on all slave interfaces. It provides fault tolerance. This can be used only for specific purposes.

mode=4 (802.3ad)

This mode is known as a Dynamic Link Aggregation mode that has it created aggregation groups having same speed. It requires a switch that supports IEEE 802.3ad dynamic link. The slave selection for outgoing traffic is done based on a transmit hashing method. This may be changed from the XOR method via the xmit_hash_policy option.

mode=5 (balance-tlb)

This mode is called Adaptive transmit load balancing. The outgoing traffic is distributed based on the current load on each slave and the incoming traffic is received by the current slave. If the incoming traffic fails, the failed receiving slave is replaced by the MAC address of another slave. This mode does not require any special switch support.

mode=6 (balance-alb)

This mode is called adaptive load balancing. This mode does not require any special switch support.

Lets create the necessery configuration for the bond and bridges

[root@redhat1 :~ ]# cat ifcfg-bond0
DEVICE=bond0
NAME=bond0
TYPE=Bond
BONDING_MASTER=yes
#IPADDR=10.50.21.16
#PREFIX=26
#GATEWAY=10.50.0.1
#DNS1=172.20.88.2
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=1 miimon=100 primary=eth0"
NM_CONTROLLED="no"
BRIDGE=br0


[root@redhat1 :~ ]# cat ifcfg-bond0.10
DEVICE=bond0.10
BOOTPROTO=none
ONPARENT=yes
#IPADDR=10.50.21.17
#NETMASK=255.255.255.0
VLAN=yes

[root@redhat1 :~ ]# cat ifcfg-br0
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br0
UUID=4451286d-e40c-4d8c-915f-7fc12a16d595
DEVICE=br0
ONBOOT=yes
IPADDR=10.50.50.16
PREFIX=26
GATEWAY=10.50.0.1
DNS1=172.20.0.2
NM_CONTROLLED=no

[root@redhat1 :~ ]# cat ifcfg-br1
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br1
UUID=40360c3c-47f5-44ac-bbeb-77f203390d29
DEVICE=br1
ONBOOT=yes
##IPADDR=10.50.51.241
PREFIX=28
##GATEWAY=10.50.0.1
##DNS1=172.20.0.2
NM_CONTROLLED=no

[root@redhat1 :~ ]# cat ifcfg-br2
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br2
UUID=fbd5c257-2f66-4f2b-9372-881b783276e0
DEVICE=br2
ONBOOT=yes
##IPADDR=10.50.51.243
PREFIX=28
##GATEWAY=10.50.0.1
##DNS1=172.20.10.1
NM_CONTROLLED=no
NM_CONTROLLED=no
BRIDGE=br0

[root@redhat1 :~ ]# cat ifcfg-eth0
TYPE=Ethernet
NAME=bond0-slaveeth0
BOOTPROTO=none
#UUID=61065574-2a9d-4f16-b16e-00f495e2ee2b
DEVICE=eth0
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no

[root@redhat1 :~ ]# cat ifcfg-eth1
TYPE=Ethernet
NAME=eth1
UUID=b4c359ae-7a13-436b-a904-beafb4edee94
DEVICE=eth1
ONBOOT=yes
BRIDGE=br1
NM_CONTROLLED=no

[root@redhat1 :~ ]#  cat ifcfg-eth2
TYPE=Ethernet
NAME=bond0-slaveeth2
BOOTPROTO=none
#UUID=821d711d-47b9-490a-afe7-190811578ef7
DEVICE=eth2
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no

[root@redhat1 :~ ]#  cat ifcfg-eth3
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
#BOOTPROTO=dhcp
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
BRIDGE=br2
NAME=eth3
UUID=61065574-2a9d-4f16-b16e-00f495e2ee2b
DEVICE=eth3
ONBOOT=yes
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-bond0
DEVICE=bond0
NAME=bond0
TYPE=Bond
BONDING_MASTER=yes
#IPADDR=10.50.21.16
#PREFIX=26
#GATEWAY=10.50.21.1
#DNS1=172.20.88.2
ONBOOT=yes
BOOTPROTO=none
BONDING_OPTS="mode=1 miimon=100 primary=eth0"
NM_CONTROLLED="no"
BRIDGE=br0

# cat ifcfg-bond0.10
DEVICE=bond0.10
BOOTPROTO=none
ONPARENT=yes
#IPADDR=10.50.21.17
#NETMASK=255.255.255.0
VLAN=yes
NM_CONTROLLED=no
BRIDGE=br0

[root@redhat2 :~ ]# cat ifcfg-br0
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
#IPV6INIT=yes
#IPV6_AUTOCONF=yes
#IPV6_DEFROUTE=yes
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br0
#UUID=f87e55a8-0fb4-4197-8ccc-0d8a671f30d0
UUID=4451286d-e40c-4d8c-915f-7fc12a16d595
DEVICE=br0
ONBOOT=yes
IPADDR=10.50.21.17
PREFIX=26
GATEWAY=10.50.21.1
DNS1=172.20.88.2
NM_CONTROLLED=no

[root@redhat2 :~ ]#  cat ifcfg-br1
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=no
#IPV6_AUTOCONF=no
#IPV6_DEFROUTE=no
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br1
UUID=40360c3c-47f5-44ac-bbeb-77f203390d29
DEVICE=br1
ONBOOT=yes
##IPADDR=10.50.21.242
PREFIX=28
##GATEWAY=10.50.21.1
##DNS1=172.20.88.2
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-br2
STP=yes
BRIDGING_OPTS=priority=32768
TYPE=Bridge
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=none
DEFROUTE=no
IPV4_FAILURE_FATAL=no
#IPV6INIT=no
#IPV6_AUTOCONF=no
#IPV6_DEFROUTE=no
#IPV6_FAILURE_FATAL=no
#IPV6_ADDR_GEN_MODE=stable-privacy
IPV6INIT=no
IPV6_AUTOCONF=no
IPV6_DEFROUTE=no
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=br2
UUID=fbd5c257-2f66-4f2b-9372-881b783276e0
DEVICE=br2
ONBOOT=yes
##IPADDR=10.50.21.244
PREFIX=28
##GATEWAY=10.50.21.1
##DNS1=172.20.88.2
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-eth0
TYPE=Ethernet
NAME=bond0-slaveeth0
BOOTPROTO=none
#UUID=ee950c07-7eb2-463b-be6e-f97e7ad9d476
DEVICE=eth0
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-eth1
TYPE=Ethernet
NAME=eth1
UUID=ffec8039-58f0-494a-b335-7a423207c7e6
DEVICE=eth1
ONBOOT=yes
BRIDGE=br1
NM_CONTROLLED=no

[root@redhat2 :~ ]# cat ifcfg-eth2
TYPE=Ethernet
NAME=bond0-slaveeth2
BOOTPROTO=none
#UUID=2c097475-4bef-47c3-b241-f5e7f02b3395
DEVICE=eth2
ONBOOT=yes
MASTER=bond0
SLAVE=yes
NM_CONTROLLED=no


Notice that the bond0 configuration does not have an IP assigned this is done on purpose as we're using the interface channel bonding together with attached bridge for the VM. Usual bonding on a normal physical hardware hosts where no virtualization use is planned is perhaps a better choice. If you however try to set up an IP address in that specific configuration shown here and you try to reboot the machine, you will end up with inacessible machine over the network like I did and you will need to resolve configuration via some kind of ILO / IDRAC interface.

4. Generating UUID for ethernet devices bridges and bonds

One thing to note is the command uuidgen you might need that to generate UID identificators to fit in the new network config files.

Example:
 

[root@redhat2 :~ ]#uuidgen br2
e7995e15-7f23-4ea2-80d6-411add78d703
[root@redhat2 :~ ]# uuidgen br1
05e0c339-5998-414b-b720-7adf91a90103
[root@redhat2 :~ ]# uuidgen br0
e6d7ff74-4c15-4d93-a150-ff01b7ced5fb


5. How to make KVM Virtual Machines see configured Network bridges (modify VM XML)

To make the Virtual machines installed see the bridges I had to

[root@redhat1 :~ ]#virsh edit VM_name1
[root@redhat1 :~ ]#virsh edit VM_name2

[root@redhat2 :~ ]#virsh edit VM_name1
[root@redhat2 :~ ]#virsh edit VM_name2

Find the interface network configuration and change it to something like:

    <interface type='bridge'>
      <mac address='22:53:00:56:5d:ac'/>
      <source bridge='br0'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x00' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='22:53:00:2a:5f:01'/>
      <source bridge='br1'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x07' slot='0x00' function='0x0'/>
    </interface>
    <interface type='bridge'>
      <mac address='22:34:00:4a:1b:6c'/>
      <source bridge='br2'/>
      <model type='virtio'/>
      <address type='pci' domain='0x0000' bus='0x08' slot='0x00' function='0x0'/>
    </interface>


6. Testing the bond  is up and works fine

# ip addr show bond0
The result is the following:

 

4: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
    link/ether 52:54:00:cb:25:82 brd ff:ff:ff:ff:ff:ff


The bond should be visible in the normal network interfaces with ip address show or /sbin/ifconfig

 

# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: eth0
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: eth2
MII Status: up
Speed: 10000 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:0c:29:ab:2a:fa
Slave queue ID: 0

 

According to the output eth0 is the active slave.

The active slaves device files (eth0 in this case) is found in virtual file system /sys/

# find /sys -name *eth0
/sys/devices/pci0000:00/0000:00:15.0/0000:03:00.0/net/eth0
/sys/devices/virtual/net/bond0/lower_eth0
/sys/class/net/eth0


You can remove a bond member say eth0 by 

 

 cd to the pci* directory
Example: /sys/devices/pci000:00/000:00:15.0

 

# echo 1 > remove


At this point the eth0 device directory structure that was previously located under /sys/devices/pci000:00/000:00:15.0 is no longer there.  It was removed and the device no longer exists as seen by the OS.

You can verify this is the case with a simple ifconfig which will no longer list the eth0 device.
You can also repeat the cat /proc/net/bonding/bond0 command from Step 1 to see that eth0 is no longer listed as active or available.
You can also see the change in the messages file.  It might look something like this:

2021-02-12T14:13:23.363414-06:00 redhat1  device eth0: device has been deleted
2021-02-12T14:13:23.368745-06:00 redhat1 kernel: [81594.846099] bonding: bond0: releasing active interface eth0
2021-02-12T14:13:23.368763-06:00 redhat1 kernel: [81594.846105] bonding: bond0: Warning: the permanent HWaddr of eth0 – 00:0c:29:ab:2a:f0 – is still in use by bond0. Set the HWaddr of eth0 to a different address to avoid conflicts.
2021-02-12T14:13:23.368765-06:00 redhat1 kernel: [81594.846132] bonding: bond0: making interface eth1 the new active one.

 

Another way to test the bonding is correctly switching between LAN cards on case of ethernet hardware failure is to bring down one of the 2 or more bonded interfaces, lets say you want to switch from active-backup from eth1 to eth2, do:
 

# ip link set dev eth0 down


That concludes the test for fail over on active slave failure.

7. Bringing bond updown (rescan) bond with no need for server reboot

You know bonding is a tedious stuff that sometimes breaks up badly so only way to fix the broken bond seems to be a init 6 (reboot) cmd but no actually that is not so.

You can also get the deleted device back with a simple pci rescan command:

# echo 1 > /sys/bus/pci/rescan


The eth0 interface should now be back
You can see that it is back with an ifconfig command, and you can verify that the bond sees it with this command:

# cat /proc/net/bonding/bond0


That concludes the test of the bond code seeing the device when it comes back again.

The same steps can be repeated only this time using the eth1 device and file structure to fail the active slave in the bond back over to eth0.

8. Testing the bond with ifenslave command (ifenslave command examples)

Below is a set of useful information to test the bonding works as expected with ifenslave command  comes from "iputils-20071127" package

– To show information of all the inerfaces

                  # ifenslave -a
                  # ifenslave –all-interfaces 

 

– To change the active slave

                  # ifenslave -c bond0 eth1
                  # ifenslave –change-active bond0 eth1 

 

– To remove the slave interface from the bonding device

                  # ifenslave -d eth1
                  # ifenslave –detach bond0 eth1 

 

– To show master interface info

                  # ifenslave bond0 

 

– To set the bond device down and automatically release all the slaves

                  # ifenslave bond1 down 

– To get the usage info

                  # ifenslave -u
                  # ifenslave –usage 

– To set to verbose mode

                  # ifenslave -v
                  # ifenslave –verbose 

9. Testing the bridge works fine

Historically over the years all kind of bridges are being handled with the brctl part of bridge-utils .deb / .rpm installable package.

The classical way to check a bridge is working is to do

# brctl show
# brctl show br0; brctl show br1; brctl show br2

# brctl showmacs br0
 

etc.

Unfortunately with redhat 8 this command is no longer available so to get information about configured bridges you need to use instead:

 

# bridge link show
3:eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state forwarding priority 32 cost 100
4:eth2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 master bridge0 state listening priority 32 cost 100


10. Troubleshooting network connectivity issues on bond bridges and LAN cards

Testing the bond connection and bridges can route proper traffic sometimes is a real hassle so here comes at help the good old tcpdump

If you end up with issues with some of the ethernet interfaces between HV1 and HV2 to be unable to talk to each other and you have some suspiciousness that some colleague from the network team has messed up a copper (UTP) cable or there is a connectivity fiber optics issues. To check the VLAN tagged traffic headers on the switch you can listen to each and every bond0 and br0, br1, br2 eth0, eth1, eth2, eth3 configured on the server like so:

# tcpdump -i bond0 -nn -e vlan


Some further investigation on where does a normal ICMP traffic flows once everything is setup is a normal thing to do, hence just try to route a normal ping via the different server interfaces:

# ping -I bond0 DSTADDR

# ping -i eth0 DSTADDR

# ping -i eth1 DSTADDR

# ping -i eth2 DSTADDR


After conducting the ping do the normal for network testing big ICMP packages (64k) ping to make sure there are no packet losses etc., e.g:

# ping -I eth3 -s 64536  DSTADDR


If for 10 – 20 seconds the ping does not return package losses then you should be good.

Update reverse sshd config with cronjob to revert if sshd reload issues

Friday, February 12th, 2021

Update-reverse-sshd-config-with-cronjob-to-revert-if-sshd-reload-issues

Say you're doing ssh hardening modifying /etc/ssh/sshd_config for better system security or just changing options in sshd due to some requirements. But you follow the wrong guide and you placed some ssh variable which is working normally on newer SSH versions ssh OpenSSH_8.0p1 / or 7 but the options are applied on older SSH server and due to that restarting sshd via /etc/init.d/… or systemctl restart sshd cuts your access to remote server located in a DC and not attached to Admin LAN port, and does not have a working ILO or IDRAC configured and you have to wait for a couple of hours for some Support to go to the server Room / Rack / line location to have access to a Linux physical tty console and fix it by reverting the last changes you made to sshd and restarting.

Thus logical question comes what can you do to assure yourself you would not cut your network access to remote machine after modifying OpenSSHD and normal SSHD restart?

There is an old trick, I'm using for years now but perhaps if you're just starting with Linux as a novice system administrator or a server support guy you would not know it, it is as simple as setting a cron job for some minutes to periodically overwrite the sshd configuration with a copy of the old working version of sshd before modification.

Here is this nice nify trick which saved me headache of call on technical support line to ValueWeb when I was administering some old Linux servers back in the 2000s

root@server:~# crontab -u root -e

# create /etc/ssh/sshd_config backup file
cp -rpf /etc/ssh/sshd_config /etc/ssh/sshd_config_$(date +%d-%m-%y)
# add to cronjob to execute every 15 minutes and ovewrite sshd with the working version just in case
*/15 * * * * /bin/cp -rpf /etc/ssh/sshd_config_$(date +%d-%m-%y) /etc/ssh/sshd_config && /bin/systemctl restart sshd
# restart sshd 
cp -rpf /etc/ssh/sshd_config_$(date +%d-%m-%y) /etc/ssh/sshd_config && /bin/systemctl restart sshd


Copy paste above cron definitions and leave them on for some time. Do the /etc/ssh/sshd_config modifications and once you're done restart sshd by lets say

root@server:~#  killall -HUP sshd 


If the ssh connectivity continues to work edit the cron job again and delete all lines and save again.
If you're not feeling confortable with vim as a text editor (in case you're a complete newbie and you don't know) how to get out of vim. Before doing all little steps you can do on the shell with  export EDITOR=nano or export EDITOR=mcedit cmds,this will change the default text editor on the shell. 

Hope this helps someone… Enjoy 🙂

Disable NetworkManager automatic Ethernet Interface Management on Redhat Linux , CentOS 6 / 7 / 8

Friday, February 5th, 2021

rhel-centos-fedora-network-manager-disable-automatic-lan-interface-management

Most of Linux distributions had introduced the NetworkManager service and are slowly trying to push out the old ways and use entirely it to manage network configs. Though at times this is very helpful stuff especially if you have Linux running on Laptop on servers is a guarantee for troubles.

If you are a system administrator like me and you need that needs to configure a New server with lets say 8 (Ethernet interface) LAN cards each to be configured with different IPs and you have a mixture of configuration where some eth1,eth2 etc. (4 of the interface IPs has to be static IPs and others has to be taken from a DHCP lease. NetworkManager is not something that you will want as usually you don't expect soon a network IP topology change. Below is example from a Living Hypervisor server machine that has 8 Network Interfaces configured together with few Virtual Interfaces used by the running KVM Virtual Machines.
 

[root@redhat :~ ]# ip address show |grep ": <"
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
2: ens1f0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
3: eno1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master team0 state UP group default qlen 1000
4: ens1f1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br2 state UP group default qlen 1000
5: ens1f2: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
6: eno2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq master br1 state UP group default qlen 1000
7: ens1f3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
8: eno3: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
9: eno4: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
10: venet0: <BROADCAST,POINTOPOINT,NOARP,UP,LOWER_UP> mtu 1500 qdisc noqueue state UNKNOWN group default
11: br1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
12: br2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
13: team0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue master br0 state UP group default qlen 1000
14: br0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default qlen 1000
15: host-routed: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN group default qlen 1000
16: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
17: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN group default qlen 1000
18: virbr1: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
19: virbr1-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr1 state DOWN group default qlen 1000
26: vme52540019e701: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br0 state UNKNOWN group default qlen 1000
27: vme52540081868b: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br1 state UNKNOWN group default qlen 1000
28: vme525400a13f03: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast master br2 state UNKNOWN group default qlen 1000


Having a NM managing so many LAN connected Ethernets can create you A LOT of surprises even if your servers are in a Highly Secured data center where chance of sudden IP change or network misbehaves are minimal. Even minimal some in Housing might do something wrong on the Rack mixing up with another server or switch andyour server might end up easily with unexplainable Network problems because of this NM service which is trying 'to balance' any network issues according to some algorithms …

Thus to save yourlself the troubles and completely disable NetworkManager (NM) Ethernets handling.

As a hint some of the troubles you might get especially if the System Hardware has issues with the Integrated Motherboard LAN Controllers such as of Dell PowerEdge R640 Rack Server.
I've recently observed one such Dell Rack mounted machine I had to configure from scratch which has out of the box 
NM preinstalled by a colleague and was doing strange stuff with the routings causing it to become remotely inacessible after reboot.
Even though I have started configuring the IPs and have double and triple check the configuration and machine had proper set of /etc/sysconfig/network-scripts/ifcfg-* configuration it still failed to boot with a network properly brought up and become unreachable via remote SSH connection immediately after sending machine to init 6 with /usr/sbin/init 6 (alias for shutdown -r now or reboot -f now :)

On Redhat 8 / CentOS 8 to Disabling permanently NM you have to disable NM systemd services permanently and add NM_CONTROLLED=no to each of the Ethernet configurations listed in network-scripts/ifcfg-eno3 eno4 eno1np0 etc. ifaces.

1. Disable completely Network Manager service and mask it

[root@redhat :~ ]# systemctl mask NetworkManager.service
[root@redhat :~ ]# systemctl stop NetworkManager.service
[root@redhat :~ ]# systemctl disable NetworkManager.service

2. Check if all systemd networkmanager components scripts are really disabled

# systemctl list-unit-files | grep NetworkManager

NetworkManager-dispatcher.service disabled
NetworkManager-wait-online.service enabled
NetworkManager.service disabled


NetworkManager-wait-online.service seems to be also enabled so we have to disable it.

[root@redhat :~ ]#  systemctl mask NetworkManager-wait-online.service
[root@redhat :~ ]#  systemctl disable NetworkManager-wait-online.service

Double check NM services

[root@redhat :~ ]#  systemctl list-unit-files | grep NetworkManager
  …

3. Install / Enable old (legacy) network-scripts 


network-scripts is disabled by default due to it doesn't play well with NM.
Install the rpm package to enable it back
 

[root@redhat :~ ]#  yum install -y network-scripts 

4. Test if network-scripts is really enabled


Use Redhat's nmcli command for controlling network manager if it reports NM not running then you're fine

[root@redhat :~ ]#  nmcli device
Error: NetworkManager is not running.

5. Disable legacy use network-scripts print outs


Bring down some interface with ifdown Redhat script frontend to ifconfig and bring it up with ifup iface-name
 

# ifup eno4
WARN      : [ifup] You are using 'ifup' script provided by 'network-scripts', which are now deprecated.
WARN      : [ifup] 'network-scripts' will be removed in one of the next major releases of RHEL.
WARN      : [ifup] It is advised to switch to 'NetworkManager' instead – it provides 'ifup/ifdown' scripts as well.


Notice the warnings they're harmless and safe to ignore however it is pretty annoying to see them, to disable them:

[root@redhat :~ ]#  touch /etc/sysconfig/disable-deprecation-warnings

6. Use network.service old-fashioned systemd service


From now on you can start using the good old well known and properly working network.service

[root@redhat :~ ]#  systemctl status network


To enable the network service to start after boot:

[root@redhat :~ ]#  systemctl enable network

7. Disable NetworkManager use from Network configuration scripts ifcfg-* for all server available configured ethernet cards


Open with text editor every network script and append NM_CONTROLLED="no" to the end of the file.
 

[root@redhat :~ ]#  vi /etc/sysconfig/network-scripts/ifcfg-ethernetX
NM_CONTROLLED="no"

To save yourself the time if you want to disable NetworkManager use for all /etc/sysconfig/network-scripts/ifcfg-* use a simple shell loop:
 

[root@redhat :~ ]# cd /etc/sysconfig/network-scripts/
[root@redhat :/etc/sysconfig/network-scripts ]# for i in *ifcfg*; do echo NM_CONTROLLED="no" >> $i; done


To load the new network settings do another network reload / restart
 

[root@redhat :~ ]# systemctl restart network


To disable NetworkManager on older CentOS 6 / Redhat 6 / SuSE / Fedora Linux where the OS still not systemd enabled instead of using systemctl you can straight do it with old and well known chkconfig redhat script.
 

[root@centos6 :~ ]# service NetworkManager stop
[root@centos6 :~ ]# chkconfig NetworkManager off

Howto Upgrade IBM Spectrum Protect Backup Client TSM 7.X to 8.1.8, Update Tivoli 8.1.8 to 8.1.11 on CentOS and Redhat Linux

Thursday, December 3rd, 2020

 

IBM-spectrum-protect-backup-logo-tivoli-tsm-logo

Having another day of a system administrator boredom, we had a task to upgrade some Tivoli TSM Backup clients running on a 20+ machines powered by CentOS and RHEL Linux to prepare the systems to be on the latest patched IBM Spectrum Backup client version available from IBM. For the task of patching I've used a central server where, I've initially downloaded the provided TSM client binaries archives. From this machine, we have copied TivSM*.tar to each and every system that needs to be patched and then patched. The task is not too complex as the running TSM in the machines are all at the same version and all running a recent patched version of Linux. Hence to make sure all works as expected we have tested TSM is upgraded from 7.X.X to 8.X.X on one machine and then test 8.1.8 to 8.1.11 upgrade on another one. Once having confirmed that Backups works as expected after upgrade. We have proceeded to do it massively on each of the rest 20+ hosts.
Below article's goal is to help some lazy sysadmin with the task to prepare an TSM Backup upgrade procedure to standartize TSM Upgrade, which as many of the IBM's softwares is very specific and its upgrade requires, a bit of manual work and extra cautious as there seems to be no easy way (or at least I don't know it), to do the upgrade by simply adding an RPM repository and doing, something like yum install tivsm*.


0. Check if there is at least 2G free of space

According to documentation the minimum space you need to a functional install without having it half installed or filling up your filesystem is 2 Gigabytes of Free Memory on a filesystem where the .tar and rpms will be living.

Thus check what is the situation with your filesystem where you wills store the .tar archice and extract .RPM files / install the RPM files.

# df -h

1. Download the correct tarball with 8.1 Client

On one central machine you would need to download the Tivoli you can do that via wget / curl / lynx whatever is at hand on the Linux server.

As of time of writting this article TSM's 8.1.11 location is at
URL:

http://public.dhe.ibm.com/storage/tivoli-storage-management/maintenance/client/v8r1/Linux/LinuxX86/BA/v8111/

I've made a local download mirror of Tivoli TSM 8.1.11 here.
In case you need to install IBM Spectrum Backup Client to a PCI secured environment to a DMZ-ed LAN network from a work PC you can Download it first from your local PC and via Citrix client upload program or WinSCP upload it to a central replication host from where you will later copy to each of the other server nodes that needs to be upgraded.

Lets Copy archive to all Server hosts where you want it later installed, using a small hack

Assuming you already have an Excel document or a Plain text document with all the IPs of the affected hosts where you will need to get TSM upgraded. Extract this data and from it create a plain text file /home/user/hosts.txt containing all the machine IPs lined up separated with carriage return separations (\n), so you can loop over each one and use scp to send the files.

– Replicate Tivoli tar to all machine hosts where you want to get IBM Spectrum installed or upgraded.
Do it with a loop like this:

# for i in $(cat hosts.txt); do scp 8.1.11.0-TIV-TSMBAC-LinuxX86.tar user@$i:/home/user/; done

 Copy to a Copy buffer temporary your server password assuming all your passwords to each machine are identical and paste your login user pass for each host to initiate transfer
 

2. SSH to each of the Machine hosts IPs

Once you login to the host you want to upgrade
Go to your user $HOME /home/user and create files where we'll temporary store Tivoli archive files and extract RPMs

[root@linux-server user]# mkdir -p ~/tsm/TSM_BCK/
[root@linux-server user]# mv 8.1.11.0-TIV-TSMBAC-LinuxX86.tar ~/tsm
[root@linux-server user]# cd tsm
[root@linux-server user]# tar -xvvf 8.1.11.0-TIV-TSMBAC-LinuxX86.tar
gskcrypt64-8.0.55.17.linux.x86_64.rpm
GSKit.pub.pgp
gskssl64-8.0.55.17.linux.x86_64.rpm
README_api.htm
README.htm
RPM-GPG-KEY-ibmpkg
TIVsm-API64.x86_64.rpm
TIVsm-APIcit.x86_64.rpm
TIVsm-BAcit.x86_64.rpm
TIVsm-BAhdw.x86_64.rpm
TIVsm-BA.x86_64.rpm
TIVsm-filepath-source.tar.gz
TIVsm-JBB.x86_64.rpm
TIVsm-WEBGUI.x86_64.rpm
update.txt

3. Create backup of old backup files

It is always a good idea to keep old backup files

[root@linux-server tsm]# cp -av /opt/tivoli/tsm/client/ba/bin/dsm.opt ~/tsm/TSM_BCK/dsm.opt_bak_$(date +'%Y_%M_%H')
[root@linux-server tsm]# cp -av /opt/tivoli/tsm/client/ba/bin/dsm.sys ~/tsm/TSM_BCK/dsm.sys_bak_$(date +'%Y_%M_%H')

[root@linux-server tsm]# [[ -f /etc/adsm/TSM.PWD ]] && cp -av /etc/adsm/TSM.PWD ~/TSM_BCK/ || echo 'file doesnt exist'

/etc/adsm/TSM.PWD this file is only there as legacy for TSM it contained encrypted passwords inver 7 for updates. In TSM v.8 encryption file is not there as new mechanism for sensitive data was introduced.
Be aware that from Tivoli 8.X it will return error
exist'

!! Note – if dsm.opt , dsm.sys files are on different locations – please use correct full path locations !!

4. Stop  dsmcad – TSM Service daemon

[root@linux-server tsm]# systemctl stop dsmcad

5. Locate and deinstall all old Clients

Depending on the version to upgrade if you're upgrading from TSM version 7 to 8, you will get output like.

[root@linux-server tsm]# rpm -qa | grep 'TIVsm-'
TIVsm-BA-7.1.6-2.x86_64
TIVsm-API64-7.1.6-2.x86_64

If you're one of this paranoid admins you can remove TIVsm packs  one by one.

[root@linux-server tsm]# rpm -e TIVsm-BA-7.1.6-2.x86_64
[root@linux-server tsm]# rpm -e TIVsm-API64-7.1.6-2.x86_64

Instead if upgrading from version 8.1.8 to 8.1.11 due to the Security CVE advisory recently published by IBM e.g. (IBM Runtime Vulnerability affects IBM Spectrum Backup archive Client) and  vulnerability in Apache Commons Log4J affecting IBM Spectrum Protect Backup Archive Client.

[root@linux-server tsm]# rpm -qa | grep 'TIVsm-'
TIVsm-API64-8.1.8-0.x86_64
TIVsm-BA-8.1.8-0.x86_64

Assuming you're not scared of a bit automation you can straight do it with below one liner too 🙂

# rpm -e $(rpm -qa | grep TIVsm)

[root@linux-server tsm]# rpm -qa | grep gsk
[root@linux-server tsm]# rpm -e gskcrypt64 gskssl64

6. Check uninstallation success:

[root@linux-server tsm]# rpm -qa | grep TIVsm
[root@linux-server tsm]# rpm -qa | grep gsk

Here you should an Empty output, if packages are not on the system, e.g. Empty output is good output ! 🙂

7. Install new client IBM Spectrum Client (Tivoli Storage Manager) and lib dependencies

[root@linux-server tsm]# rpm -ivh gskcrypt64-8.0.55.4.linux.x86_64.rpm
[root@linux-server tsm]# rpm -ivh gskssl64-8.0.55.4.linux.x86_64.rpm

 If you're lazy to type you can do as well

[root@linux-server tsm]# rpm -Uvh gsk*

Next step is to install main Tivoli SM components the the API files and BA (The Backup Archive Client)

[root@linux-server tsm]# rpm -ivh TIVsm-API64.x86_64.rpm
[root@linux-server tsm]# rpm -ivh TIVsm-BA.x86_64.rpm

If you have to do it on multiple servers and you do it manually following a guide like this, you might instead want to install them with one liner.

[root@linux-server tsm]# rpm -ivh TIVsm-API64.x86_64.rpm TIVsm-BA.x86_64.rpm

There are some Not mandatory "Common Inventory Technology" components (at some cases if you're using the API install it we did not need that), just for the sake if you need them on your servers due to backup architecture, install also below commented rpm files.

## rpm -ivh TIVsm-APIcit.x86_64.rpm

## rpm -ivh TIVsm-BAcit.x86_64.rpm

These packages not needed only for operation WebGUI TSM GUI management, (JBB) Journal Based Backup, BAhdw (the ONTAP library)


— TIVsm-WEBGUI.x86_64.rpm
— TIVsm-JBB.x86_64.rpm
— TIVsm-BAhdw.x86_64.rpm

8. Start and enable dsmcad service

[root@linux-server tsm]# systemctl stop dsmcad

You will get

##Warning: dsmcad.service changed on disk. Run 'systemctl daemon-reload' to reload units.

[root@linux-server tsm]# systemctl daemon-reload

[root@linux-server tsm]# systemctl start dsmcad


## enable dsmcad – it is disabled by default after install

[root@linux-server ~]# systemctl enable dsmcad

[root@linux-server tsm]# systemctl status dsmcad

9. Check dmscad service is really running

Once enabled IBM TSM will spawn a process in the bacground dmscad if it started properly you should have the process backgrounded.

[root@linux-server tsm]# ps -ef|grep -i dsm|grep -v grep
root      2881     1  0 18:05 ?        00:00:01 /usr/bin/dsmcad

If process is not there there might be some library or something not at place preventing the process to start …

10. Check DSMCAD /var/tsm logs for errors

After having dsmcad process enabled and running in background

[root@linux-server tsm]# grep -i Version /var/tsm/sched.log|tail -1
12/03/2020 18:06:29   Server Version 8, Release 1, Level 10.000

 

[root@linux-server tsm]# cat /var/tsm/dsmerror.log

To see the current TSM configuration files we can  grep out comments *

[root@linux-server tsm]# grep -v '*' /opt/tivoli/tsm/client/ba/bin/dsm.sys

Example Configuration of the agent:
—————————————————-
   *TSM SERVER NODE Location
   Servername           tsm_server
   COMMmethod           TCPip
   TCPPort              1400
   TCPServeraddress     tsmserver2.backuphost.com
   NodeName             NODE.SERVER-TO-BACKUP-HOSTNAME.COM
   Passwordaccess       generate
   SCHEDLOGNAME         /var/tsm/sched.log
   SCHEDLOGRETENTION    21 D
   SCHEDMODE            POLLING
   MANAGEDServices      schedule
   ERRORLOGNAME         /var/tsm/dsmerror.log
   ERRORLOGRETENTION    30 D
   INCLEXCL             /opt/tivoli/tsm/client/ba/bin/inclexcl.tsm

11. Remove tsm install directory tar ball and rpms to save space on system

The current version of Tivoli service manager is 586 Megabytes.

[root@linux-server tsm]# du -hsc 8.1.11.0-TIV-TSMBAC-LinuxX86.tar
586M    8.1.11.0-TIV-TSMBAC-LinuxX86.tar

Some systems are on purpose configured to have less space under their /home directory,
hence it is a good idea to clear up unnecessery files after completion.

Lets get rid of all the IBM Spectrum archive source files and the rest of RPMs used for installation.

[root@linux-server tsm]# rm -rf ~/tsm/{*.tar,*.rpm,*.gpg,*.htm,*.txt}

12. Check backups are really created on the configured remote Central backup server

To make sure after the upgrade the backups are continuously created and properly stored on the IBM Tivoly remote central backup server, either manually initiate a backup or wait for lets say a day and run dsmc client to show all created backups from previous day. To make sure you'll not get empty output you can on purpose modify some file by simply opening it and writting over without chaning anything e.g. modify your ~/.bashrc or ~/.bash_profile

## List all backups for '/' root directory from -fromdate='DD/MM/YY'

[root@linux-server tsm]# dsmc
Protect>
IBM Spectrum Protect
Command Line Backup-Archive Client Interface
  Client Version 8, Release 1, Level 11.0
  Client date/time: 12/03/2020 18:14:03
(c) Copyright by IBM Corporation and other(s) 1990, 2020. All Rights Reserved.

Node Name: NODE.SERVER-TO-BACKUP-HOSTNAME.COM
Session established with server TSM2_SERVER: AIX
  Server Version 8, Release 1, Level 10.000
  Server date/time: 12/03/2020 18:14:04  Last access: 12/03/2020 18:06:29
 
Protect> query backup -subdir=yes "/" -fromdate=12/3/2020
           Size        Backup Date                Mgmt Class           A/I File
           —-        ———–                ———-           — —-
         6,776  B  12/03/2020 01:26:53             DEFAULT              A  /etc/freshclam.conf
         6,685  B  12/03/2020 01:26:53             DEFAULT              A  /etc/freshclam.conf-2020-12-02
         5,602  B  12/03/2020 01:26:53             DEFAULT              A  /etc/hosts
         5,506  B  12/03/2020 01:26:53             DEFAULT              A  /etc/hosts-2020-12-02
           398  B  12/03/2020 01:26:53             DEFAULT              A  /opt/tivoli/tsm/client/ba/bin/tsmstats.ini
       114,328  B  12/03/2020 01:26:53             DEFAULT              A  /root/.bash_history
           403  B  12/03/2020 01:26:53             DEFAULT              A  /root/.lesshst